Comment by 🍵 tacomanator

Re: "It seems that CJK (Chinese-Japanese-Korean) posts are…"

In: s/AskGemini

@MrSVCD thank you for the clarification. Japanese has three alphabets, and most of the characters across all three are 3 bytes each and 9 bytes each when percent encoded.

Japanese, and to a degree Korean, are the most affected by the byte limit as they use more characters than Chinese to express the same thing due to particles, conjugations, etc.

🍵 tacomanator [OP]

Mar 11 · 8 weeks ago

6 Later Comments ↓

🚀 ColonelThirtyTwo · Mar 12 at 02:25:

@MrSVCD UTF8 is max 4 bytes per character but they then get percent encoded, further driving up the bytes per character

🌆 skyjake [mod...] · Mar 12 at 08:00:

@tacomanator Bubble (that runs this site) supports Titan for making and editing long posts. This is documented in the Help:

— /help

Using the Bubble draft composer, you effectively can submit long posts and comments as multiple Gemini requests as well.

Station does not support Titan nor does it allow appending text to previously submitted entries.

Titan is used by some to edit their capsules, gemlogs, and/or tinylogs. I have no examples off the top of my head apart from my own skyjake.fi, where I've got a private Titan edit feature.

🚂 MrSVCD · Mar 12 at 10:39:

@ColonelThirtyTwo That is true but the most common C&K characters have their own entries in unicode.

I think that unicode is trying to go precent encoded to not go to 5 bytes of utf-8.

🍵 tacomanator [OP] · Mar 12 at 23:58:

@skyjake thank you for your help. From there I found a way to post long text from the draft page after enabling Titan in the BBS settings.

The help mentions a ":" command to enter long text mode. I haven't figured how to get that to work yet, but for now I'm happy to have least one have one working method!

🚬 sy · Mar 13 at 15:47:

Maybe this (RFC2718 §2.2.5) should be explicitly allowed in gemini specification:

Unless there is some compelling reason for a particular scheme to do otherwise, translating character sequences into UTF-8 and then subsequently using the %HH encoding for *unsafe* octets is recommended.

Apparently most servers –including BBS and station– already allow it.

— Test with more than 300 kanji characters

🚂 MrSVCD · Mar 13 at 18:04:

Thanks @sy, that explains the difference between what I thought and what op said.

Original Post

🌒 s/AskGemini

🍵 tacomanator:

It seems that CJK (Chinese-Japanese-Korean) posts are effectively limited to about 100 characters due to limit of 1024 bytes for URIs in Gemini (each character is 9 bytes after encoding). Has there been discussion on this matter? It constrains CJK posts to about 100 characters: a sentence or two.

💬 10 comments · Mar 11 · 8 weeks ago