2024-09-18 Emacs Wiki and China

**2024-09-14**. I'm somewhere in the Italian-speaking parts of Switzerland with my wife. There is a lot of running, hiking, hugging, kissing, eating and drinking involved. 🥰

**2024-09-15**. Still on the trip but late at night I spent more than an hour trying to figure out why my server had a load of nearly 40. 💻

All I discovered is that load went down when I shut down Emacs Wiki. See also 2024-09-16 on Emacs Wiki.

2024-09-16

Well, I needed to sleep and I’ve got plans for the next few days so I shut it down while I slept hoping that the misconfigured spider is fixed or the inept programmer discovers their mistake. Just another day in the Butlerian Jihad. Some misguided soul probably wanted to download it all and wrote a broken web crawler and when that got blocked they bought some nice scaling infrastructure from Amazon, Hetzner, OHV or Alibaba Cloud or whatever they are called, allowing them to use a gazillion different IP numbers that will eventually lead me to implement some sort of cloud service provider block.

Load shoots up to nearly 40 around midnight. The graph is for an entire week so the peaks are not shown. It just goes up to 30 multiple times.

**2024-09-16**. Switched Emacs Wiki back on after a few hours of sleep and it did fine. But then it restarted again... at 18:00, 19:00, 21:00, 22:00... and so I switched Emacs Wiki off again. Time to ban some networks!

Anybody interested in my banning of IP ranges and possibly interested in me reverting any of these, take a look at ban-cidr ... from a network that isn't banned, I guess. 😏

ban-cidr

**2024-09-17**. This continues to keep me busy and angry every evening. Too bad I don't have a real fast network-lookup to firewall ban pipeline. I'm using this script instead of carefully checking IP numbers and networks. I'm also sick and tired of the same networks popping up again and again.

network-lookup

I added over a hundred Chinese networks to the firewall rules and I'm seriously considering blocking the whole country for a week. It seems that most of the offenders are networks run by China Telecoms and China Mobile.

**2024-09-18**. So far, so good. Load stays below two.

Here's example usage for `network-lookup`, filtering for Emacs Wiki and a URL parameter used when requesting recent changes or a RSS feed for a single page only. That would count as suspicious misbehaving crawler behaviour in my book.

As far as I am concerned, all deserve to be banned. Over-banning? Maybe. What do you think?

For demonstration purposes, this is what I ran:

So now I'm ready to ban them all:

​#Emacs ​#Butlerian Jihad ​#Administration

**2024-10-01**. Today I came back to the server with load at 40. Again! Here's me filtering the log for requests that got a server error status code (500–509).

I think the only one I didn't ban was the unknown one, Gwene and Bing (Microsoft).

Load is now at around 31.

**2024-10-01**. Getting lazier… Pick a suspicious pattern and check for China…

2024-09-15-emacs-china-1.jpg

**2024-10-02**. The situation is under control again, but since I'm vindictive, I'll block some more.

**2024-10-10**. Load is at 30 again. Another round!

Munin showing system load. Another surge started at 6 in the morning.

**2024-10-14**. Here we go again.

Load going up to 30 and more.

Same procedure as every week.

Then look at the list, append it to `ban-cidr`, pipe it to the shell, and post it here for all to see.

And again. Oops, and Google got caught up in it. They are probably following all the per-page RSS feeds again. Oh well.

Load remains between 30 and 35.

And more…

**2024-10-18**. And they are at it again.

Load going up to 40. Again.

And I keep adding to the ban hammer.

the ban hammer

A few hours later and I'm still feeling vindictive. Let's ban some more even though load is back at around 2.5.

**2024-11-04**. Load is above 36 again. I guess it's Alibaba, now?

**2024-11-06**. It's getting tiresome, but they are at it again.

Sometimes I wonder whether there's even a point in posting the exact networks banned. After all, the ban-cidr script is the authoritative source. Perhaps I do it because it feels like shaming them in public. And showing how distributed these slurping efforts are. Some days it looks like every single Chinese ISP is part of it.

ban-cidr

I count 860 lines mentioning China.

In any case, load was going up to 12 but is coming down again.

**2024-11-25**. Continued two months later, unfortunately. See 2024-11-25 Emacs Wiki and it's still China for more.

2024-11-25 Emacs Wiki and it's still China