Deutsch English Français Italiano |
<vp4f6o$288ui$1@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail From: Janis Papanagnou <janis_papanagnou+ng@hotmail.com> Newsgroups: comp.unix.shell Subject: Sorting problem with Unix sort(1) with UTF-8 punctuation characters - locale issue Date: Wed, 19 Feb 2025 12:27:18 +0100 Organization: A noiseless patient Spider Lines: 71 Message-ID: <vp4f6o$288ui$1@dont-email.me> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Date: Wed, 19 Feb 2025 12:27:20 +0100 (CET) Injection-Info: dont-email.me; posting-host="8d21927c3b252a23dbfdb299b6fc0a86"; logging-data="2368466"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19T03XMHflDGJ/lU2JjhzgU" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 Cancel-Lock: sha1:npSCApMah44Q2VmGIzMJQRDxzBU= X-Mozilla-News-Host: news://news.eternal-september.org:119 X-Enigmail-Draft-Status: N1110 Bytes: 4155 I've been sorting punctuation characters on one Unix system and it did not produce the expected result. Switching to another system did it as expected. The test program (it contains non-ASCII middle-dot characters) was sort -t $'\t' <<EOT >····**·······**················< abc1 >···········**······**··········< efg2 >·**·························**·< hij3 >············**·················< klm4 >···**····················**····< nop5 >···**···················**·**··< qrs6 >··**··········**·········**····< tuv7 >**·····························< wxy8 EOT Run on an older system - with sort (GNU coreutils) 8.13 - produced >**·····························< wxy8 >·**·························**·< hij3 >··**··········**·········**····< tuv7 >···**···················**·**··< qrs6 >···**····················**····< nop5 >····**·······**················< abc1 >···········**······**··········< efg2 >············**·················< klm4 On a newer system - with sort (GNU coreutils) 8.28 - it produced no sorting at all (of these lines[*]). >····**·······**················< abc1 >···········**······**··········< efg2 >·**·························**·< hij3 >············**·················< klm4 >···**····················**····< nop5 >···**···················**·**··< qrs6 >··**··········**·········**····< tuv7 >**·····························< wxy8 One hypothesis was that it's some locale issue. So I've copied the LC_* settings to the newer system and disabled them one by one. Strangely, the one that was responsible for the effect was LC_TIME! On the correct sorting system it was defined as LC_TIME=de_DE.UTF-8@isodate and the one that worked improperly had LC_TIME=de_DE.UTF-8 Now I'm puzzled in many ways... If anything, I'd expected LC_COLLATE to have an effect on sorting. Then there's no locale with @isodate on that sort-defunct system. And clearing that LC_TIME locale or removing the "@isodate" part did not change anything; it needs that setting to a non-existing locale file to work correctly on the otherwise not correctly sorting system. Does anyone have an idea what's going on here? I'm reluctant to globally set LC_TIME=de_DE.UTF-8@isodate (since there is no file with that name in the locale directories). Thanks. Janis [*] Lines with additional other contents than the depicted payload were sorted correctly.