Known crawler IPs?
I've noticed shortly after posting about my video game music stream, there's been a connection streaming for nearly 20 hours straight.
I can disconnect it, and figure it it's an actual person they'll reconnect after some time. But I suspect it may be a crawler that doesn't check robots.txt, doesn't have logic to abort if a request takes too long or has too many bytes, etc.
I know one crawler (Kennedy) publishes their IP addresses. Totally Legit Gemini Search doesn't but the description makes it sound like it doesn't fetch binary files anyway.
I'm going to make a note of the IP and disconnect - but if this is a crawler I'd really like to give the owner a heads-up.
Jan 26 · 3 months ago
7 Comments ↓
🎮 jprjr [OP] · Jan 26 at 20:49:
So after disconnecting - that IP started streaming the Vorbis URL, which is listed right after the MP3 URL that was previously streamed for 20 hours. So I'm now strongly suspecting this is a crawler that doesn't have good limits defined.
🎮 jprjr [OP] · Jan 26 at 21:00:
Figured it out. Used openssl s_client to connect on port 443 and pulled the domain from the returned certificate. Turns out that IP is also serving Gemini but under a different domain. And when you visit on Gemini - it's a search engine (alvus.nl)
So anyways, now I can hopefully try to get in contact and let them know.
I had a similar issue with the Alvus search engine. It connected to my streaming opus station for hours on end. When I kicked the connection, it immediately started indexing other pages on my capsule. That's a big download. I couldn't find any contact info so I posted about it and then blocked the IP.
🎮 jprjr [OP] · Jan 26 at 23:07:
@bluesman yeah, I did something a little more nefarious. If I get a request from that IP I return some gemtext saying that it's been blocked and why, with a link at the end.
Said link just returns the message again, with a new link - it goes on forever.
Ha! Nice. I'm just glad I caught it. It might still be going otherwise.
I have seen the alvus.nl crawler on my gemini server. I tried the search engine, it was not working 2 days ago.
Here a some crawler ips seen on my gemini server:
193.70.85.11 lupa
23.88.52.182 tlgs
116.202.128.144 freeshell.de
64.149.155.184 135.148.41.168 kennedy
77.161.107.142 alvus.nl
There are more crawlers but I haven't identified them.
Alvus was easily 1/4 of the traffic on my site before I firewalled it. There was an 'observable gopher project' that had a similar level of requests on gopher before I blocked them too. Worst two by a long way.