Deutsch   English   Français   Italiano  
<vp4f6o$288ui$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail
From: Janis Papanagnou <janis_papanagnou+ng@hotmail.com>
Newsgroups: comp.unix.shell
Subject: Sorting problem with Unix sort(1) with UTF-8 punctuation characters -
 locale issue
Date: Wed, 19 Feb 2025 12:27:18 +0100
Organization: A noiseless patient Spider
Lines: 71
Message-ID: <vp4f6o$288ui$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 19 Feb 2025 12:27:20 +0100 (CET)
Injection-Info: dont-email.me; posting-host="8d21927c3b252a23dbfdb299b6fc0a86";
	logging-data="2368466"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX19T03XMHflDGJ/lU2JjhzgU"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
 Thunderbird/45.8.0
Cancel-Lock: sha1:npSCApMah44Q2VmGIzMJQRDxzBU=
X-Mozilla-News-Host: news://news.eternal-september.org:119
X-Enigmail-Draft-Status: N1110
Bytes: 4155

I've been sorting punctuation characters on one Unix system and it
did not produce the expected result. Switching to another system did
it as expected.

The test program (it contains non-ASCII middle-dot characters) was

sort -t $'\t' <<EOT
>····**·······**················<	abc1
>···········**······**··········<	efg2
>·**·························**·<	hij3
>············**·················<	klm4
>···**····················**····<	nop5
>···**···················**·**··<	qrs6
>··**··········**·········**····<	tuv7
>**·····························<	wxy8
EOT


Run on an older system - with sort (GNU coreutils) 8.13 - produced

>**·····························<	wxy8
>·**·························**·<	hij3
>··**··········**·········**····<	tuv7
>···**···················**·**··<	qrs6
>···**····················**····<	nop5
>····**·······**················<	abc1
>···········**······**··········<	efg2
>············**·················<	klm4


On a newer system - with sort (GNU coreutils) 8.28 - it produced no
sorting at all (of these lines[*]).

>····**·······**················<	abc1
>···········**······**··········<	efg2
>·**·························**·<	hij3
>············**·················<	klm4
>···**····················**····<	nop5
>···**···················**·**··<	qrs6
>··**··········**·········**····<	tuv7
>**·····························<	wxy8


One hypothesis was that it's some locale issue. So I've copied the
LC_* settings to the newer system and disabled them one by one.
Strangely, the one that was responsible for the effect was LC_TIME!

On the correct sorting system it was defined as
  LC_TIME=de_DE.UTF-8@isodate
and the one that worked improperly had
  LC_TIME=de_DE.UTF-8

Now I'm puzzled in many ways...
If anything, I'd expected LC_COLLATE to have an effect on sorting.
Then there's no locale with @isodate on that sort-defunct system.
And clearing that LC_TIME locale or removing the "@isodate" part
did not change anything; it needs that setting to a non-existing
locale file to work correctly on the otherwise not correctly
sorting system.

Does anyone have an idea what's going on here?

I'm reluctant to globally set  LC_TIME=de_DE.UTF-8@isodate
(since there is no file with that name in the locale directories).

Thanks.

Janis

[*] Lines with additional other contents than the depicted payload
were sorted correctly.