2017-01-25 Finding Encoding Bugs

When I look at one of my log files, I see the following:

What could be the problem? Let’s find the pages containing this byte.

Can we reproduce the problem? Apparently, simply opening the file is not a problem. Those sequences appear to be valid UTF-8.

How about simply opening all the files?

Interesting! And `less` finds these:

Thus, I edited 2007-01-13 and RssInterwikiTranslate, removing anything that looked weird in a major edit and from now on I hope to no longer see these warnings.

Alternatively, consider this little script written by CapnDan on the ​#oddmuse channel, Freenode:

​#Oddmuse