| Deutsch English Français Italiano |
|
<vp5ufo$2h4ql$1@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail From: Janis Papanagnou <janis_papanagnou+ng@hotmail.com> Newsgroups: comp.unix.shell Subject: Re: Sorting problem with Unix sort(1) with UTF-8 punctuation characters - locale issue Date: Thu, 20 Feb 2025 01:54:15 +0100 Organization: A noiseless patient Spider Lines: 70 Message-ID: <vp5ufo$2h4ql$1@dont-email.me> References: <vp4f6o$288ui$1@dont-email.me> <slrnvrcfcl.3e0.naddy@lorvorc.mips.inka.de> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Date: Thu, 20 Feb 2025 01:54:16 +0100 (CET) Injection-Info: dont-email.me; posting-host="21d22e278c11729412f6eed56de0f37b"; logging-data="2659157"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+QUwiI6AEK6TTIJdZqSv6i" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 Cancel-Lock: sha1:JyKNRFraGpjDqSQN2ByxQLmZwnA= In-Reply-To: <slrnvrcfcl.3e0.naddy@lorvorc.mips.inka.de> X-Enigmail-Draft-Status: N1110 Bytes: 4192 On 19.02.2025 21:22, Christian Weisgerber wrote: > On 2025-02-19, Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote: > >> If anything, I'd expected LC_COLLATE to have an effect on sorting. >> Then there's no locale with @isodate on that sort-defunct system. >> And clearing that LC_TIME locale or removing the "@isodate" part >> did not change anything; it needs that setting to a non-existing >> locale file to work correctly on the otherwise not correctly >> sorting system. > > My working hypothesis would be that setting LC_TIME to a nonexistent > locale causes an error that invalidates the _whole_ locale setting > and causes a fallback to a default setting, likely the "C" locale. > You can check that sorting with LC_ALL=C or an invalid value like > LC_ALL=foobar will produce your "correct" result. That was actually also my own first locale-based hypothesis, and setting LC_ALL=C was the first thing I tried (before identifying the strange LC_TIME "solution"). But that setting did not change that strange behavior. (But see below.) > > A corollary from this would be that your "sort-defunct" system uses > a different collation order than your "correctly" sorting system > for the de_DE.UTF-8 locale. Right. The point is that the two systems I'm using are handled by me in different ways. The old system is one where I changed on a system level all deficiencies I encountered; the @isodate locale is such a beast. (It works on that system.) The newer system is one that got standard updates and less (or hardy any) "fixes" by me, so that I'd expect to work better "as designed". (But the opposite is the case.) On the old system I've explicitly defined LC_TIME=de_DE.UTF-8@isodate LC_COLLATE=C.UTF-8 and on the new system the collation is LC_TIME=de_DE.UTF-8 LC_COLLATE=en_US.UTF-8 I'm sure there was a reason why the setting is now "en_US" instead of "de_DE" (like almost all others LC-settings), so I'm reluctant to change that. (But setting LC_COLLATE to "C.UTF-8" works as well.) I think I'll have to use a local (not system wide) LC-change to fix the issue to behave as I'd expect without touching the rest. > > On the FreeBSD 14-STABLE system I'm typing this on, sorting your > example data with my typical C.UTF-8 locale produces your expected > result, sorting with de_DE.UTF-8 (or en_US.UTF-8) produces a different > order. > >>> ····**·······**················< abc1 >>> ···········**······**··········< efg2 >>> ·**·························**·< hij3 > > Also, I have no idea what could be considered the "correct" sorting > order for this. Unless all used punctuation characters are disregarded or treated as having all the same sorting order it should IMO be obvious that the original unsorted form is not correct. Thanks for your reply. It helped to find another setting that produces the desired result. Janis