Warning: mysqli::__construct(): (HY000/1203): User howardkn already has more than 'max_user_connections' active connections in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\includes\artfuncs.php on line 21
Failed to connect to MySQL: (1203) User howardkn already has more than 'max_user_connections' active connectionsPath: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail
From: Lem Novantotto
Newsgroups: comp.unix.shell
Subject: Re: Sorting problem with Unix sort(1) with UTF-8 punctuation
characters - locale issue
Date: Thu, 20 Feb 2025 11:14:42 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 58
Message-ID:
References:
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 20 Feb 2025 12:14:42 +0100 (CET)
Injection-Info: dont-email.me; posting-host="0baac5ddd7aa045ad2b1697eaedd31f4";
logging-data="2935293"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18HBdhjrdJvvN72UA8pC731UyE/TIm39ns="
User-Agent: Pan/0.160 (Toresk; )
Cancel-Lock: sha1:eRPE1c5Tl8A0AmIIyDVT30/251M=
Bytes: 2530
Il Wed, 19 Feb 2025 12:27:18 +0100, Janis Papanagnou ha scritto:
> I've been sorting punctuation characters on one Unix system and it did
> not produce the expected result. Switching to another system did it as
> expected.
The second system (not working "properly") is treating all dots as equal,
so it sorts just the letters.
Also my system doesn't sort properly. In my system:
$ locale
LANG=it_IT.UTF-8
LANGUAGE=it_IT
LC_CTYPE="it_IT.UTF-8"
LC_NUMERIC="it_IT.UTF-8"
LC_TIME="it_IT.UTF-8"
LC_COLLATE="it_IT.UTF-8"
LC_MONETARY="it_IT.UTF-8"
LC_MESSAGES="it_IT.UTF-8"
LC_PAPER="it_IT.UTF-8"
LC_NAME="it_IT.UTF-8"
LC_ADDRESS="it_IT.UTF-8"
LC_TELEPHONE="it_IT.UTF-8"
LC_MEASUREMENT="it_IT.UTF-8"
LC_IDENTIFICATION="it_IT.UTF-8"
LC_ALL=
Let's see. In my /usr/share/i18n/locales/it_IT, I have yhis section:
LC_COLLATE
copy "iso14651_t1"
END LC_COLLATE
In your second system, you have LC_COLLATE=en_US or de_DE. It's the same:
in the relative files there is always the same section:
LC_COLLATE
copy "iso14651_t1"
END LC_COLLATE
But in /usr/share/i18n/locales/C there is:
LC_COLLATE
% The keyword 'codepoint_collation' in any part of any LC_COLLATE
% immediately discards all collation information and causes the
% locale to use strcmp/wcscmp for collation comparison. This is
% exactly what is needed for C (ASCII) or C.UTF-8.
codepoint_collation
END LC_COLLATE
And here it is:
$ LC_COLLATE=C sort yada yada
gives the correct sorting.
--
Bye, Lem
Talis erit dies qualem egeris