Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: news@zzo38computer.org.invalid Newsgroups: comp.infosystems.gemini Subject: Comments about 0.24.1 specification changes Date: Sat, 31 Aug 2024 15:15:54 -0700 Organization: A noiseless patient Spider Lines: 79 Message-ID: <1724962885.bystand@zzo38computer.org> MIME-Version: 1.0 Injection-Date: Sun, 01 Sep 2024 00:10:40 +0200 (CEST) Injection-Info: dont-email.me; posting-host="83766ab8c726e58d5a0575568a1ea430"; logging-data="1242239"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+cZaDWCz6R0K1sHeGiht7I" User-Agent: bystand/1.3.0pre1 Cancel-Lock: sha1:zYVag87NIizAnPQRQ5SGgrjBHOY= Bytes: 4392 This article comments about: gemini://geminiprotocol.net/news/2024_08_28.gmi > Clarification on encoding queries This part is good. > Clarification on error reporting This part is also good. > Correction to the example for multiple languages While I think that there are some problems with the mechanism (and with MIME in general), since that is what it is (and there is no point to change it now), it is good to make the specification to describe this mechanism correctly, so now that they fixed it to describe it correctly, it is good. > it is not and never has been true that MIME media types are big trucks > you can just dump parameters on whenever you feel like it, each type has > a registered set of defined parameters. But it's not uncommon for people > to think otherwise and I have had this pointed out more than once as a > place where Gemini can be illicitly extended, so it can't hurt to be > explicit about this. There are other places where extensions can be added anyways, such as adding the extensions into the X.509 certificate. I think that the "non-extensibility" does not actually work very well. > Because empty text lines are valid (and widely used) in Gemtext > documents. If a formulation like the above were used, it would be ambiguous > whether or not every document which did end with a CRLF did or did not also > include an empty text line after it which didn't include the optional final > newline. Since empty text lines are supposed to be rendered individually > each time they occur, this ambiguity actually has consequences. Absolutely > trivial consequences, it's true, but the problem of documents without final > newlines being ill-formed is trivial too. There are a few issues with such consequences. When viewing a document on the screen, an extra blank line might not be very significant, but it might be significant for paged media, that you might end up with an extra page which is blank (if the formatter does not detect and discard it due to this reason). I think that a final line break should be required (and does not result in an extra blank line), although if it is not present then implementations SHOULD treat it as though it is present, even though it is not valid. > Permit use of non-ASCII characters in text lines I think that the ABNF should only use ASCII and to define "non-ASCII characters" as bytes with the high bit set; which combinations of such bytes are valid depends on the character encoding in use but does not affect the structure of the document so does not need its own ABNF. There are character sets that cannot (and/or should not) be mapped to Unicode; writing the ABNF in terms of Unicode won't do, and I also think that writing the ABNF in terms of the "canonical form" is also unnecessary. My proposal would be to disallow character encodings that are not a superset of ASCII, but to allow any others, independently of whether or not they are subsets of Unicode. For example, UTF-16 would be disallowed, but EUC-JP would be allowed (and UTF-8 would still be allowed too). The ABNF can have its own definition for line breaks, instead of CRLF you can define one that can be either LF or CRLF. For example: gemtext-document = [bom] 1*gemtext-line linebreak = [CR] LF nonascii = %x80-FF VCHAR /= nonascii bom = %xEF %xBB %xBF ; if document character encoding is UTF-8 or unspecified bom = () ; if document character encoding is specified and is not UTF-8 -- Don't laugh at the moon when it is day time in France.