Comment by ๐Ÿ›ธ bluesman

Re: "Scriptonite Lagrange Workarounds"

In: u/bluesman

I need to test but it's entirely possible that &, =, and + can be left decoded now that semicolons work. Lagrange is decoding [ and ] before sending. In Zoe, I had an auto prompt line with something like "Enter [Y]es or [N]o" in the URL. This is what the server gets from Alhena and Lagrange.

Creating a URI in Java from the second url throws a URISyntaxException so it won't work unless I sanitize the input (which is okay but not strictly correct).

I'll do more testing on &, =, and +. A quick test suggests decoding those is fine but I'll let you know. If the spec says it's okay but it still doesn't work, then it's my bug.

๐Ÿ›ธ bluesman [OP]

2025-08-05 ยท 9 months ago

9 Later Comments โ†“

๐Ÿ›ธ bluesman [OP] ยท 2025-08-05 at 20:27:

The + is still an issue. If running a script that prompts for code and the user enters "2+2" in the type 10 dialog, Lagrange correctly sends 2%2B2 in the query. When the server then includes that in a redirect url, Lagrange converts it to 2+2. The server gets that redirect and since it assumes everything is already encoded, 2+2 becomes 2%202 or "2 2".

I can live with keeping the base64 encoding scheme so this works in Lagrange (or maybe I can sanitize for + specifically). I'm not sure though that the RFC suggests I'm doing something wrong. You say those characters are allowed in the path without encoding but how are they meant to be handled when they are encoded? If they MUST be decoded then my scheme is fundamentally broken to begin with and base64 encoding is mandatory.

๐Ÿ›ธ bluesman [OP] ยท 2025-08-06 at 00:56:

My reading of RFC 3986 section 2.4 is that url data (in path parameters) should not be decoded or re-encoded.

โ€” https://datatracker.ietf.org/doc/html/rfc3986#section-2.4

๐ŸŒ† skyjake [...] ยท 2025-08-06 at 04:17:

I was also taking a closer look at the RFC, and this stands out to me (from section 2.4):

When a URI is dereferenced, the components and subcomponents significant to the scheme-specific dereferencing process (if any) must be parsed and separated before the percent-encoded octets within those components can be safely decoded, as otherwise the data may be mistaken for component delimiters.

The key word being "scheme-specific". The Gemini URI scheme does not specify semantics for parameters in the path component. In other words, the path component does not further divide into path and parameter subcomponents in Gemini; it's just a path. Therefore, the client can decode the sub-delim characters if it wants.

I will ensure that [, ], and " always remain encoded in the path component, though, to adhere to the RFC.

๐Ÿ›ธ bluesman [OP] ยท 2025-08-06 at 06:09:

Path parameters may not be mentioned in the Gemini spec but I think it's reasonable to assume they would be supported as defined by the RFC. The Gemini spec specifically excludes fragments and userinfo but makes no mention of path/matrix parameters.

I find your choice perplexing given the fact that support basically requires a client to do nothing. I think it's an odd decision that may limit future development (and not just my derided project).

I'll continue to base64 the Scriptonite segment on redirect so it's "just a path".

๐ŸŒ† skyjake [...] ยท 2025-08-06 at 10:47:

Gemini is intended to be non-extensible, so I'm cautious to accidentally enable behaviors that are not in line with the specification. I am therefore inclined to make it more difficult to make use of obscure features like path subcomponents.

However, I noticed this in the RFC section 2.2:

URIs that differ in the replacement of a reserved character with its corresponding percent-encoded octet are not equivalent
[...]
characters in the reserved set are protected from normalization and are therefore safe to be used by scheme-specific and producer-specific algorithms for delimiting data subcomponents within a URI.

This pretty explicitly says not to decode or encode the sub-delims for normalization purposes, so I will adhere to that in the future, which should solve the underlying issue.

๐ŸŒ† skyjake [...] ยท 2025-08-06 at 13:16:

@bluesman Are you able to compile Lagrange locally? Would be interesting to know if these changes are sufficient to fix the remaining issues (dev branch):

โ€” https://github.com/skyjake/lagrange/commit/7284f5ee591f781eb911f96e87630c09a8ec64d3

๐Ÿ›ธ bluesman [OP] ยท 2025-08-06 at 13:52:

Looking at GitHub, it appears my best bet would be firing up Ubuntu in VirtualBox or use the Pi 5. (My MacOS laptop is ridiculously constrained when it comes to storage). I can certainly give it a shot when I have some time.

If we could figure out another way to share the binary (mac, windows or linux), I could probably get you an answer right away.

๐ŸŒ† skyjake [...] ยท 2025-08-06 at 14:45:

Here is a Linux x86_64 AppImage for testing:

โ€” https://etc.skyjake.fi/lagrange/Lagrange-1.18.7_testing-x86_64.AppImage

๐Ÿ›ธ bluesman [OP] ยท 2025-08-06 at 15:36:

I had to install fuse but then it ran fine on Windows Subsystem for Linux - much quicker than firing up Ubuntu in VirtualBox.

Every issue I was having with the auto-prompt system seems to be fixed. I was a little worried when I saw my "2 + 2" example become "2 %2B 2" in the address bar but it works fine and the copied link is percent-encoded.

Thanks for looking at this and apologies for any consternation.

Original Post

๐Ÿ›ธ bluesman

Scriptonite Lagrange Workarounds โ€” I put in workarounds for running Scriptonite in Lagrange. The issue is that Lagrange doesn't preserve certain percent-encoded characters in urls (whether on a page or in a redirect). The fix is to sanitize Scriptonite links coming in and base64 on redirect. The one thing I can't workaround is semicolons. If you want to use them in a pre-populated variable or auto-prompt, the Scriptonite segment must be base64 encoded in advance. That should be rare but there...

๐Ÿ’ฌ 13 comments ยท 2025-08-04 ยท 9 months ago