Known crawler IPs?

I've noticed shortly after posting about my video game music stream, there's been a connection streaming for nearly 20 hours straight.

I can disconnect it, and figure it it's an actual person they'll reconnect after some time. But I suspect it may be a crawler that doesn't check robots.txt, doesn't have logic to abort if a request takes too long or has too many bytes, etc.

I know one crawler (Kennedy) publishes their IP addresses. Totally Legit Gemini Search doesn't but the description makes it sound like it doesn't fetch binary files anyway.

I'm going to make a note of the IP and disconnect - but if this is a crawler I'd really like to give the owner a heads-up.

Posted in: s/Geminispace

🎮 jprjr

Jan 26 · 3 months ago

7 Comments ↓

🎮 jprjr [OP] · Jan 26 at 20:49:

So after disconnecting - that IP started streaming the Vorbis URL, which is listed right after the MP3 URL that was previously streamed for 20 hours. So I'm now strongly suspecting this is a crawler that doesn't have good limits defined.

🎮 jprjr [OP] · Jan 26 at 21:00:

Figured it out. Used openssl s_client to connect on port 443 and pulled the domain from the returned certificate. Turns out that IP is also serving Gemini but under a different domain. And when you visit on Gemini - it's a search engine (alvus.nl)

So anyways, now I can hopefully try to get in contact and let them know.

🛸 bluesman · Jan 26 at 22:32:

I had a similar issue with the Alvus search engine. It connected to my streaming opus station for hours on end. When I kicked the connection, it immediately started indexing other pages on my capsule. That's a big download. I couldn't find any contact info so I posted about it and then blocked the IP.

🎮 jprjr [OP] · Jan 26 at 23:07:

@bluesman yeah, I did something a little more nefarious. If I get a request from that IP I return some gemtext saying that it's been blocked and why, with a link at the end.

Said link just returns the message again, with a new link - it goes on forever.

🛸 bluesman · Jan 26 at 23:38:

Ha! Nice. I'm just glad I caught it. It might still be going otherwise.

🚀 Remy · Jan 27 at 06:35:

I have seen the alvus.nl crawler on my gemini server. I tried the search engine, it was not working 2 days ago.

Here a some crawler ips seen on my gemini server:

193.70.85.11 lupa

23.88.52.182 tlgs

116.202.128.144 freeshell.de

64.149.155.184 135.148.41.168 kennedy

77.161.107.142 alvus.nl

There are more crawlers but I haven't identified them.

🚀 MikeK · Mar 09 at 04:32:

Alvus was easily 1/4 of the traffic on my site before I firewalled it. There was an 'observable gopher project' that had a similar level of requests on gopher before I blocked them too. Worst two by a long way.