| Deutsch English Français Italiano |
|
<20250429201119.736dc05c@blackbird.dehmel-lan.de> View for Bookmarking (what is this?) Look up another Usenet article |
Path: news.eternal-september.org!eternal-september.org!feeder3.eternal-september.org!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail From: Andreas Dehmel <blackhole.8.zarquon42@spamgourmet.com> Newsgroups: comp.os.linux.misc,comp.os.linux.advocacy Subject: Re: Case Insensitive File Systems -- Torvalds Hates Them Date: Tue, 29 Apr 2025 20:11:19 +0200 Organization: Zarquon's Den Lines: 51 Message-ID: <20250429201119.736dc05c@blackbird.dehmel-lan.de> References: <pan$4068a$3910f4f1$8cbecede$9e42905e@linux.rocks> <20250428080014.0000347f@gmail.com> <m79tdsF2bf6U1@mid.individual.net> <20250428111242.00007426@gmail.com> <pan$c046d$e87ef491$a3427b7a$ac576dbc@linux.rocks> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Trace: individual.net aM1+22KpPf7gDUenZEQzFA/7XpJshv3rWZPSPMwJKEmI/mhqTG X-Orig-Path: user-311788.user.individual.de!not-for-mail Cancel-Lock: sha1:gMLbgsi0JY9MnS2Tcgh2bAAWIos= sha256:GHxvbcURkdE1bYepv0r++CgwsOOTXDVsCuAluaK+Hvg= X-Newsreader: Claws Mail 3.17.8 (GTK+ 2.24.33; x86_64-pc-linux-gnu) On Mon, 28 Apr 2025 18:56:18 +0000 Farley Flud <ff@linux.rocks> wrote: > On Mon, 28 Apr 2025 11:12:42 -0700, John Ames wrote: >=20 > >=20 > > Just so, it seems to me. Of course it's many years too late for > > *nix to course-correct on this, but it was a stupid design decision > > in 1970 and it remains stupid now. Well, such is the nature of > > things in this vale of sin and tears... > > =20 >=20 > Case insensitivity was only idiotic at the beginning, but now, in the > age of Unicode, it is supremely idiotic. >=20 > Consider the German "sharp s," which I cannot enter as UTF-8 here. >=20 > But the lower case sharp s maps into TWO DIFFERENT upper case chars: > <can't enter> and "SS," e.g. STRASSE or <can't enter>. That merely illustrates the point that whoever decided to model it like this in Unicode was truly a numbskull. For two reasons: 1) just because the result _looks_ like SS doesn't mean it has to be two characters. A Unicode character can look like anything, even a full word (and beyond). The only reason to use two characters would be hyphenation, which in this case is explicitly forbidden. Someone didn't understand the difference between syntax and semantics. 2) this transformation is not trivially inversible. No, you can't just translate every SS back to =C3=9F, you'd pretty much need an AI to invert this. Whenever you're introducing a transformation that's trivial in one direction and extremely hard in the other, and you're not working in cryptography, you're doing something extremely, horribly wrong. > There are special rules on case folding for thousands of Unicode chars > and the "sharp s" example is one of the simplest. I seriously doubt that, especially since many (most?) languages don't even know what "case" is supposed to be in the first place (such as Japanese, I'm pretty sure it's the same in Chinese and most other asian languages, which incidentally take up the most code points). And even if it were true, that'd mean we'd need a couple of thousand additional code points for these special cases, out of several million -- who cares, the gender-neutral-smileys-crowd? Andreas Dehmel