Comment by ๐ ColonelThirtyTwo
Re: "It seems that CJK (Chinese-Japanese-Korean) posts areโฆ"
@MrSVCD UTF8 is max 4 bytes per character but they then get percent encoded, further driving up the bytes per character
Mar 12 ยท 8 weeks ago
5 Later Comments โ
๐ skyjake [mod...] ยท Mar 12 at 08:00:
@tacomanator Bubble (that runs this site) supports Titan for making and editing long posts. This is documented in the Help:
Using the Bubble draft composer, you effectively can submit long posts and comments as multiple Gemini requests as well.
Station does not support Titan nor does it allow appending text to previously submitted entries.
Titan is used by some to edit their capsules, gemlogs, and/or tinylogs. I have no examples off the top of my head apart from my own skyjake.fi, where I've got a private Titan edit feature.
๐ MrSVCD ยท Mar 12 at 10:39:
@ColonelThirtyTwo That is true but the most common C&K characters have their own entries in unicode.
I think that unicode is trying to go precent encoded to not go to 5 bytes of utf-8.
๐ต tacomanator [OP] ยท Mar 12 at 23:58:
@skyjake thank you for your help. From there I found a way to post long text from the draft page after enabling Titan in the BBS settings.
The help mentions a ":" command to enter long text mode. I haven't figured how to get that to work yet, but for now I'm happy to have least one have one working method!
Maybe this (RFC2718 ยง2.2.5) should be explicitly allowed in gemini specification:
Unless there is some compelling reason for a particular scheme to do otherwise, translating character sequences into UTF-8 and then subsequently using the %HH encoding for *unsafe* octets is recommended.
Apparently most servers โincluding BBS and stationโ already allow it.
โ Test with more than 300 kanji characters
๐ MrSVCD ยท Mar 13 at 18:04:
Thanks @sy, that explains the difference between what I thought and what op said.
Original Post
It seems that CJK (Chinese-Japanese-Korean) posts are effectively limited to about 100 characters due to limit of 1024 bytes for URIs in Gemini (each character is 9 bytes after encoding). Has there been discussion on this matter? It constrains CJK posts to about 100 characters: a sentence or two.