Deutsch English Français Italiano |
<vvghrg$18321$1@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail From: David Brown <david.brown@hesbynett.no> Newsgroups: comp.lang.c Subject: Re: Rationale for aligning data on even bytes in a Unix shell file? Date: Wed, 7 May 2025 23:03:44 +0200 Organization: A noiseless patient Spider Lines: 64 Message-ID: <vvghrg$18321$1@dont-email.me> References: <vuih43$2agfa$1@dont-email.me> <vuml73$1riea$1@dont-email.me> <vun04h$2fjrn$2@raubtier-asyl.eternal-september.org> <vun1nh$22hc5$3@dont-email.me> <vunak2$2p980$1@raubtier-asyl.eternal-september.org> <vunbgo$2q5u8$1@dont-email.me> <vunbjg$2q72n$1@raubtier-asyl.eternal-september.org> <vund1f$2rh3j$1@dont-email.me> <vungko$2uoa2$1@raubtier-asyl.eternal-september.org> <X9MPP.1383458$f81.819466@fx48.iad> <vuobri$3o38b$1@raubtier-asyl.eternal-september.org> <XtOPP.2986761$t84d.2537581@fx11.iad> <vuohq9$3tlhf$1@raubtier-asyl.eternal-september.org> <vuoig5$3ub4j$1@dont-email.me> <vuorpf$6tnn$1@raubtier-asyl.eternal-september.org> <vup2nt$bi1k$2@dont-email.me> <vupofl$13pg2$2@raubtier-asyl.eternal-september.org> <vuprce$15sqo$2@dont-email.me> <vvd6n5$353gs$1@raubtier-asyl.eternal-september.org> <vvfbnj$ulpc$1@dont-email.me> <vvflec$11b72$1@dont-email.me> <vvg8uq$1647n$2@dont-email.me> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Date: Wed, 07 May 2025 23:03:45 +0200 (CEST) Injection-Info: dont-email.me; posting-host="32b705cac158ab76c251e0573503a3c2"; logging-data="1313857"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19AnngUsEJ1YwhAfmNQP7yRfDA2q8HTUsM=" User-Agent: Mozilla Thunderbird Cancel-Lock: sha1:FPeoZ2c9xgnT6DgjrNowUjphxSs= Content-Language: en-GB In-Reply-To: <vvg8uq$1647n$2@dont-email.me> Bytes: 4672 On 07/05/2025 20:26, BGB wrote: > On 5/7/2025 7:58 AM, Janis Papanagnou wrote: >> On 07.05.2025 12:08, BGB wrote: >>> [...] >>> >>> Though, if someone really must make something case-insensitive, a case >>> could be made for only supporting it for maybe Latin, Greek, and >>> Cyrillic. >> >> I don't understand what you want to say here; it just sounds strange >> to me. - Mind to elaborate? >> > > Latin, Greek, and Cyrillic, are the main alphabets which actually have a > useful and reasonably well defined concept of "case", and thus "case > folding" actually makes sense for these. > > For most other places, it does not, and one can likely ignore rules for > things outside of these alphabets. Can eliminate a bunch of rules for > alphabets that don't actually have "case" as we would understand it. > > > By limiting rules in these ways, a simpler and more manageable set of > rules is possible. Vs, say, actual Unicode rules, which tend to have > stuff going on all over the place. > > > Ligatures pose an issue though, but presumably option is one of: > Case fold between ligatures, when both variants exist; > Treat the ligature as its own character; > Decompose and compare. > > > Though, FWIW, in my normalization code, I mostly ignored ligatures, as > while they could be decomposed in many cases, they could only be > recomposed for locales that actually use said ligature (like, in > English, if AE and IJ started spontaneously merging into new characters, > this would be weird and out of place; and having a filesystem layer that > merely decomposed any ligatures it encountered would not be ideal). > > >>> Ideally, this would be better handled in a file-browser or >>> similar, and not in the VFS or FS driver itself. >> >> Janis >> > No matter how you choose to do it, you will get it wrong sometimes. Case-insensitive comparison has language-specific details in addition to the character in the Unicode tables. Should the lower-case version of "SS" be "ss" or "ß" ? That depends on the language and the position of the letters. Should the capital of "ß" be "SS" or "ẞ"? Should the capital of "i" be "I" or "İ" ? Some languages have a letter "dz" - some of those capitalise it as "DZ", others as "Dz". About the only case-normalisation you can reasonably do without risk of getting things wrong (except for the Turkish i/ı) is for the plain 26 letters in ASCII. For everything else you would provide little of help to anyone, and mistakes for some languages. Case normalisation, like ordering, is language-dependent and does not belong in a filesystem or other low-level parts of a system.