Deutsch   English   Français   Italiano  
<20250429201119.736dc05c@blackbird.dehmel-lan.de>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: news.eternal-september.org!eternal-september.org!feeder3.eternal-september.org!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: Andreas Dehmel <blackhole.8.zarquon42@spamgourmet.com>
Newsgroups: comp.os.linux.misc,comp.os.linux.advocacy
Subject: Re: Case Insensitive File Systems -- Torvalds Hates Them
Date: Tue, 29 Apr 2025 20:11:19 +0200
Organization: Zarquon's Den
Lines: 51
Message-ID: <20250429201119.736dc05c@blackbird.dehmel-lan.de>
References: <pan$4068a$3910f4f1$8cbecede$9e42905e@linux.rocks>
	<20250428080014.0000347f@gmail.com>
	<m79tdsF2bf6U1@mid.individual.net>
	<20250428111242.00007426@gmail.com>
	<pan$c046d$e87ef491$a3427b7a$ac576dbc@linux.rocks>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-Trace: individual.net aM1+22KpPf7gDUenZEQzFA/7XpJshv3rWZPSPMwJKEmI/mhqTG
X-Orig-Path: user-311788.user.individual.de!not-for-mail
Cancel-Lock: sha1:gMLbgsi0JY9MnS2Tcgh2bAAWIos= sha256:GHxvbcURkdE1bYepv0r++CgwsOOTXDVsCuAluaK+Hvg=
X-Newsreader: Claws Mail 3.17.8 (GTK+ 2.24.33; x86_64-pc-linux-gnu)

On Mon, 28 Apr 2025 18:56:18 +0000
Farley Flud <ff@linux.rocks> wrote:

> On Mon, 28 Apr 2025 11:12:42 -0700, John Ames wrote:
>=20
> >=20
> > Just so, it seems to me. Of course it's many years too late for
> > *nix to course-correct on this, but it was a stupid design decision
> > in 1970 and it remains stupid now. Well, such is the nature of
> > things in this vale of sin and tears...
> > =20
>=20
> Case insensitivity was only idiotic at the beginning, but now, in the
> age of Unicode, it is supremely idiotic.
>=20
> Consider the German "sharp s," which I cannot enter as UTF-8 here.
>=20
> But the lower case sharp s maps into TWO DIFFERENT upper case chars:
> <can't enter> and "SS," e.g. STRASSE or <can't enter>.

That merely illustrates the point that whoever decided to model it like
this in Unicode was truly a numbskull. For two reasons:

1) just because the result _looks_ like SS doesn't mean it has to be
two characters. A Unicode character can look like anything, even a full
word (and beyond). The only reason to use two characters would be
hyphenation, which in this case is explicitly forbidden. Someone didn't
understand the difference between syntax and semantics.

2) this transformation is not trivially inversible. No, you can't just
translate every SS back to =C3=9F, you'd pretty much need an AI to invert
this. Whenever you're introducing a transformation that's trivial in
one direction and extremely hard in the other, and you're not working
in cryptography, you're doing something extremely, horribly wrong.


> There are special rules on case folding for thousands of Unicode chars
> and the "sharp s" example is one of the simplest.

I seriously doubt that, especially since many (most?) languages don't
even know what "case" is supposed to be in the first place (such as
Japanese, I'm pretty sure it's the same in Chinese and most other asian
languages, which incidentally take up the most code points). And even
if it were true, that'd mean we'd need a couple of thousand additional
code points for these special cases, out of several million -- who
cares, the gender-neutral-smileys-crowd?



Andreas Dehmel