Comment by ๐Ÿš€ stack

Re: "Unicode shenanigans"

In: u/LucasMW

Unicode was a terrible mistake. Localism makes more sense than globalism in every way. ASCII and one extra code page for local language would suffice. We are minimalists, right?

๐Ÿš€ stack

2025-10-06 ยท 7 months ago

8 Later Comments โ†“

๐Ÿš€ LucasMW [OP] ยท 2025-10-06 at 13:44:

Depending on the local language, it would not.

Regardless, unicode seems like both a way to have programs interface with all languages, and a bug & vulnerability fountain

๐Ÿ›ธ bluesman ยท 2025-10-06 at 14:01:

Alhena detects a single emoji ('\uD83D\uDE00') on a 322 byte line. That emoji is displayed as either a color sprite or a monochrome font depending on preferences. The remaining 320 bytes - your shenanigans, I assume - is passed to the rendering component where it gets ignored. So yes, I'd say the bytes are preserved in BBS but not rendered (at least in Alhena).

That said, Alhena will display proper ZWJ emojis if set to use color sprites:

๐Ÿฆน๐Ÿปโ€โ™‚๏ธ

๐Ÿš€ LucasMW [OP] ยท 2025-10-06 at 14:27:

@bluesman Thanks for the testing. I am experimenting with unicode and learning new things every day!

๐ŸŒ† skyjake [...] ยท 2025-10-06 at 15:00:

How does the gemini protocol handle unicode zero-width emoji manipulation?

The Gemini protocol just transports the response contents as-is. If you are using "text/gemini;charset=utf-8" (like virtually everyone is), then it's just regular UTF-8 text and the client will attempt to render it.

๐Ÿฆ‚ zzo38 ยท 2025-10-06 at 20:13:

I agree that Unicode was (and is) a mistake, although not due to localism and globalism. One character set cannot be suitable for all uses. Sometimes it is useful to have multiple languages and writing systems together, but even in the cases where that is appropriate, Unicode is not a good way to do it.

๐Ÿš€ stack ยท 2025-10-06 at 20:25:

I did not mean globalism in the political sense. It's just too many god damned codepoints with too many meanings, whereas 99.999% of the time you just need some 8-bit text. In the meantime you have a bloated display architecture, no good way to pre-render character sets without weird caching techiniques, and a loss of ability to count characters by counting bytes.

Also, weird ways to scam the users with similar-looking characters that would never be needed in real life.

๐Ÿฆ‚ zzo38 ยท 2025-10-06 at 22:04:

I agree with you, those are some of the problems with Unicode, although there are others as well. I write programs (and file formats, protocols, etc) that do not use Unicode (although many programs/etc won't and shouldn't care what character set you are using).

๐Ÿ‘ฝ TKurtBond ยท 2025-10-09 at 21:17:

I use Unicode regulary in text that I compose (I do it in Emacs, and have my own keyboard shortcuts for the characters that I use). It is more convenient than ASCII or Latin 1, etc for me. I agree it is horribly complicated, and wish there was a better way to do things, but it works for me, whether I'm writing reStructuredText, Markdown, Troff/Groff, LaTeX or ConTeXt. I regularly use characters from outside any 8 bit set of codes, and I'm not a heavy user of foreign languages, but I do use some, mostly names and occasional quotes. But there are people who regularly write documents with multiple languages, and 8 bit codes are too limited for them.

Original Post

๐Ÿš€ LucasMW

Unicode shenanigans โ€” How does the gemini protocol handle unicode zero-width emoji manipulation? ๐Ÿ˜€โ€‹โ€Œโ€‹โ€‹โ€Œโ€‹โ€‹โ€‹โ€‹โ€Œโ€Œโ€‹โ€Œโ€Œโ€Œโ€Œโ€‹โ€Œโ€Œโ€Œโ€‹โ€Œโ€Œโ€Œโ€‹โ€‹โ€Œโ€‹โ€‹โ€‹โ€‹โ€‹โ€‹โ€Œโ€Œโ€‹โ€‹โ€Œโ€‹โ€‹โ€‹โ€Œโ€Œโ€‹โ€Œโ€Œโ€Œโ€Œโ€‹โ€Œโ€Œโ€‹โ€‹โ€Œโ€‹โ€Œโ€‹โ€Œโ€Œโ€Œโ€‹โ€‹โ€Œโ€Œโ€‹โ€‹โ€Œโ€‹โ€‹โ€‹โ€‹โ€‹โ€‹โ€Œโ€Œโ€Œโ€‹โ€Œโ€‹โ€‹โ€‹โ€Œโ€Œโ€‹โ€Œโ€‹โ€‹โ€‹โ€‹โ€Œโ€Œโ€‹โ€‹โ€Œโ€‹โ€Œโ€‹โ€‹โ€Œโ€‹โ€‹โ€‹โ€‹โ€‹โ€‹โ€Œโ€Œโ€Œโ€‹โ€‹โ€‹โ€‹โ€‹โ€Œโ€Œโ€Œโ€‹โ€‹โ€Œโ€‹โ€‹โ€Œโ€Œโ€‹โ€Œโ€Œโ€Œโ€Œโ€‹โ€Œโ€Œโ€Œโ€‹โ€Œโ€‹โ€‹โ€‹โ€Œโ€Œโ€‹โ€Œโ€Œโ€Œโ€Œโ€‹โ€Œโ€Œโ€‹โ€‹โ€‹โ€Œโ€Œโ€‹โ€Œโ€Œโ€‹โ€Œโ€Œโ€Œโ€Œโ€‹โ€Œโ€Œโ€‹โ€Œโ€Œโ€‹โ€‹โ€‹โ€‹โ€Œโ€‹โ€‹โ€‹โ€‹โ€‹โ€‹โ€Œโ€Œโ€‹โ€Œโ€‹โ€‹โ€‹โ€‹โ€Œโ€Œโ€‹โ€‹โ€‹โ€‹โ€Œโ€‹โ€Œโ€Œโ€‹โ€Œโ€Œโ€Œโ€‹โ€‹โ€Œโ€Œโ€‹โ€‹โ€Œโ€‹โ€‹โ€‹โ€Œโ€Œโ€‹โ€Œโ€Œโ€‹โ€‹โ€‹โ€Œโ€Œโ€‹โ€‹โ€Œโ€‹โ€Œโ€‹โ€‹โ€Œโ€‹โ€‹โ€‹โ€‹โ€‹โ€‹โ€Œโ€Œโ€Œโ€‹โ€Œโ€‹โ€‹โ€‹โ€Œโ€Œโ€‹โ€Œโ€‹โ€‹โ€‹โ€‹โ€Œโ€Œโ€‹โ€Œโ€‹โ€‹โ€Œโ€‹โ€Œโ€Œโ€Œโ€‹โ€‹โ€Œโ€Œโ€‹โ€‹โ€Œโ€‹โ€‹โ€‹โ€‹โ€‹โ€‹โ€Œโ€Œโ€‹โ€‹โ€Œโ€‹โ€Œโ€‹โ€Œโ€Œโ€‹โ€Œโ€Œโ€‹โ€Œโ€‹โ€Œโ€Œโ€‹โ€Œโ€Œโ€Œโ€Œโ€‹โ€Œโ€Œโ€‹โ€Œโ€‹โ€Œโ€‹โ€‹โ€Œโ€Œโ€‹โ€Œโ€‹โ€‹โ€Œโ€‹โ€‹โ€Œโ€Œโ€Œโ€Œโ€Œโ€Œ This emoji above contains information. Can you decode it? Does gemini preserve...

๐Ÿ’ฌ 9 comments ยท 2025-10-06 ยท 7 months ago ยท #gemini #software #tech