[discussion] The matter of Robots.txt
- 📧 Messages: 2
- 🗣️ Authors: 2
- 📅 First Message: 2021-10-21 12:41
- 📅 Last Message: 2021-10-21 13:05
1. Andrew Singleton (singletona082 (a) gmail.com)
- 📅 Sent: 2021-10-21 12:41
- 📧 Message 1 of 2
I'm going to lead in with a question prompted by Sean's experiences.
Do we even need a robots.txt?
-- -----
http://singletona082.flounder.online
gemini://singletona082.flounder.online
My online presence
Link to individual message.
2. Alan Bunbury (gemini (a) bunburya.eu)
- Subject Changed! New Subject: Re: [discussion] The matter of Robots.txt
- 📅 Sent: 2021-10-21 13:05
- 📧 Message 2 of 2
Why wouldn't we? We certainly have a lot of bots so it seems reasonable to
have robots.txt.
I learned the value of robots.txt soon after setting up Remini, my Gemini
proxy for Reddit. Many Reddit pages tend to link to a lot of other Reddit
pages, so crawlers that visited Remini were sent down a rabbit hole which
ultimately led to them trying to index all of Reddit (which is huge) via the proxy.
That's obviously not a usual case but I don't think it's *that* unusual
either, in Geminispace. More generally, it seems obvious to me that there
should be a (mostly) agreed-upon way to direct the behaviour of bots that
visit one's capsule, so if there are good arguments against robots.txt I'd
be interested in hearing them. I don't think this is strictly speaking a
Gemini question though, as the robots exclusion standard is something
quite separate to Gemini (or HTTP).
On 21/10/2021 13:41, Andrew Singleton wrote:
>
> I'm going to lead in with a question prompted by Sean's experiences.
>
> Do we even need a robots.txt?
>
> -- -----
> http://singletona082.flounder.online
> gemini://singletona082.flounder.online
> My online presence
Link to individual message.
---
Previous Thread: [off-topic_ann] Publishing As Protocol
Next Thread: Gemini on Sourcehut (was Re: News----good, bad, ugly? You decide (was Re: [spec] comments on the proposed gemini spec revisions))