2025-06-21 Trying to understand the bots
I changed the limit of my automatic ASN ban from 1000 hits in 2h to 500 hits in 2h.
automatic ASN ban
That's because the two biggest autonomous systems hitting my sites are from Vietnam and China and they're currently keeping below that 1000 hits per 2h limit:
site-log !^social | log-ip | asncounter --no-prefixes 2>/dev/null
count percent ASN AS
749 4.37 45899 VNPT-AS-VN VNPT Corp, VN
539 3.15 45102 ALIBABA-CN-NET Alibaba US Technology Co., Ltd., CN
448 2.61 24940 HETZNER-AS, DE
427 2.49 16276 OVH, FR
367 2.14 28573 Claro NXT Telecomunicacoes Ltda, BR
331 1.93 9009 M247, RO
312 1.82 62610 ZEN-DPS, US
303 1.77 7922 COMCAST-7922, US
299 1.74 212238 CDNEXT, GB
264 1.54 7018 ATT-INTERNET4, US
total: 17135
The numbers themselves are not that big, but I am annoyed. I live in an English/German world and I don't see a reason for service providers from Vietnam, China, Brazil and Romania crawling my sites.
(You can find all the fish functions I use in the admin directory.)
admin
Let's take a look at what they are requesting!
The Vietnamese bots:
site-log !^social | asn-access-log 45899 | log-request | rank-lines
74 /nobots
3 /emacs/?action=translate%3Bid%3Dmon-utils.el%3Bmissing%3Dde_es_fr_it_ja_ko_pt_ru_se_uk_zh
3 /emacs/?action=translate%3Bid%3DComments_on_SuperCollider%3Bmissing%3Dde_en_es_fr_it_ja_ko_pt_ru_se_uk_zh
3 /emacs/?action=translate%3Bid%3DComintModes%3Bmissing%3Dde_es_fr_it_ja_ko_pt_ru_se_uk_zh
3 /emacs/?action=edit%3Bid%3DCustomizeNewGUI
3 /emacs/?action=browse%3Bdiff%3D2%3Bid%3DDialog
3 /emacs/?action=admin%3Bid%3DApplyingPatches
2 /emacs/wang1zhen
2 /emacs/Comments_on_zenburn.el
2 /emacs?action=translate%3Bid%3DVbsReplMode%3Bmissing%3Dde_en_es_fr_it_ja_ko_pt_ru_se_uk_zh
Looks like they're following all the links, so a misbehaved bot, if you ask me. They're "hitting all the buttons" on the web app. The relevant part of `robots.txt`:
User-agent: *
Crawl-delay: 240
Disallow: /emacs?action=
The Chinese bots:
site-log !^social | asn-access-log 45102 | log-request | rank-lines
306 /robots.txt
12 /wiki?action=rc%3Brcfilteronly%3D%222005-10-06%22
12 /wiki?action=history%3Bid%3D2005-10-06
12 /wiki?action=edit%3Bid%3DMoneyPooling
12 /wiki?action=edit%3Bid%3DDorfWiki
12 /wiki?action=edit%3Bid%3DBarnstarSharing
12 /wiki?action=edit%3Bid%3D2005-10-06
12 /wiki?action=define%3Bname%3DMoneyPooling
12 /wiki?action=define%3Bname%3DBarnstarSharing
12 /wiki?action=browse%3Bdiff%3D2%3Bid%3D2005-10-06
Looks like they're following all the links, so a misbehaved bot as well.
Again, the relevant part of `robots.txt`:
User-agent: *
Crawl-delay: 240
Disallow: /wiki
The German bots actually make reasonable requests:
site-log !^social | asn-access-log 24940 | log-request | rank-lines
90 /view/2025-06-16-ban-asn
52 /
50 /view/index
41 /rpg/feed.xml
29 /admin/ban-cidr
16 /view/index.rss
15 /emacs?action=rss
10 /robots.txt
8 /wiki/feed/full/
8 /osr/feed.xml
Let's see what sort of user agents we see. I'm expecting feed readers.
site-log !^social | asn-access-log 24940 | log-user-agent | rank-lines
46 NewsBlur Page Fetcher
36 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36
32 NewsBlur Feed Fetcher
23 fiperbot/0.1 (+https://www.fiper.net)
21 Mozilla/5.0 (compatible; DataForSeoBot/1.0; +https://dataforseo.com/dataforseo-bot)
15 AwarioSmartBot/1.0 (+https://awario.com/bots.html; bots@awario.com)
9 Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0
8 MyNewspaper Agent 1.0
8 Akkoma 3.9.3-0-g9d7c877; https://social.raccoon.college <admin@raccoon.college>, Akkoma 3.9.3-0-g9d7c877; https://social.raccoon.college <admin@raccoon.college>; Bot
7 Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.64 Safari/537.36 Edg/101.0.1210.47
The one that stands out is "DataForSeoBot". But it seems that this is not a problem. I already have this bot in my Apache config (as seen on 2025-03-21 A summary of my bot defence systems). Still, *booo!* Hetzner for hosting this bot.
2025-03-21 A summary of my bot defence systems
site-log !^social | grep DataForSeoBot | leech-detector
Total hits: 21
IP | Hits | Bandw. | Rel. | Interv. | Status
------------------------------:|-----------:|-------:|-----:|--------:|-------
136.243.228.177 | 18 | 1K | 85% | -289.2s | 410 (55%), 301 (44%)
136.243.220.213 | 1 | 0K | 4% | | 301 (100%)
136.243.228.178 | 1 | 0K | 4% | | 301 (100%)
136.243.228.193 | 1 | 3K | 4% | | 410 (100%)
The French bots also seem to be reasonable:
site-log !^social | asn-access-log 16276 | log-request | rank-lines
153 /view/2025-06-16-ban-asn
96 /
87 /view/index
45 /admin/ban-cidr
17 /robots.txt
17 /files/internet-office-hours.xml
8 /rpg/feed.xml
2 /wiki?action=rss;rcidonly=Page_Synchronization
2 /wiki?action=rss;match=%5EPingback_Server_Extension%24
2 /view/RPG.rss
The Brazilian bots seems to download the entire site:
site-log !^social | asn-access-log 28573 | log-request | rank-lines
72 /nobots
1 /wiki/Year_of_the_Copper_Titan/Comments_on_Character_sheet_template
1 /wiki/WilderlandsOfSwordsAndDevilry/Comments_on_Cutthroat_Inn_Hooks
1 /wiki/WerdnaWorld?search=%222019-02-17%22
1 /wiki/Waterdeep/Recap_April_18,_2020
1 /wiki/TheRoadToDwimmermount/Comments_on_Alia
1 /wiki/SmoothPointsofPride?action=history;id=Melee
1 /wiki/SmallHuman
1 /wiki?search=%22MicroPayment%22
1 /wiki?search=%22Gemini+Wiki+on+the+Internet%22
Look at the requests:
site-log !^social | asn-access-log 28573 | log-request | rank-lines
62 /nobots
1 /wiki/Year_of_the_Copper_Titan/Comments_on_Character_sheet_template
1 /wiki/WonderfulBreadIncrease
1 /wiki/Waterdeep/Recap_February_8,_2020
1 /wiki/Waterdeep/Recap_April_18,_2020
1 /wiki/TheRoadToDwimmermount/Comments_on_Alia
1 /wiki/TheBrokenLands/Comments_on_Symbol_of_Truth
1 /wiki/SmallHuman
1 /wiki?search=%22MyMacros%22
1 /wiki?search=%22ModularWiki%22
Specially those searches at the bottom! The relevant part of `robots.txt`:
User-agent: *
Crawl-delay: 20
Disallow: /wiki?
Same for the Romanian one:
site-log !^social | asn-access-log 9009 | log-request | rank-lines
5 /nobots
3 /
2 /emacs/Comments_on_SiteMap/
2 /diff/2021-07-29_Creative_projects%2C_perpetually_work_in_progress
2 /cw/2006-04-30
1 /wiki/Waterdeep/Comments_on_imp
1 /wiki/Unter_Piraten/Comments_on_Numqu'am_Solus
1 /wiki/Unter_Piraten/Comments_on_2023-04-07
1 /wiki/Unter_Piraten/Comments_on_2023-03-03
1 /wiki/TravellerTheSalamanderCrew/Comments_on_Yandee
And what I really hate are those random user agent strings.
site-log !^social | asn-access-log 9009 | log-user-agent | rank-lines
13 Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.3
12 Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/132.0.0.0 Safari/537.36 OPR/117.0.0.0
12 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.3
11 Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.3
11 Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36 Trailer/93.3.8652.5
9 Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.3
8 Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36 Edg/134.0.0.0
8 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.3
6 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.10 Safari/605.1.1
2 Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36
Do I really need to go to to the The Ultimate Apache Bad Bot & Referrer Blocker?
The Ultimate Apache Bad Bot & Referrer Blocker
#ButlerianJihad