| Deutsch English Français Italiano |
|
<20250303173359.6c3af31a@wibble.sysadmininc.com> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!weretis.net!feeder9.news.weretis.net!newsfeed.endofthelinebbs.com!.POSTED.47.186.35.85!not-for-mail From: Nigel Reed <sysop@endofthelinebbs.com> Newsgroups: news.admin.peering Subject: Re: Newsgroups files Date: Mon, 3 Mar 2025 17:33:59 -0600 Organization: End Of The Line BBS Sender: nelgin@47.186.35.85 Message-ID: <20250303173359.6c3af31a@wibble.sysadmininc.com> References: <20250303133017.7b629d4a@wibble.sysadmininc.com> <vq58g3$1nji2$3@news.trigofacile.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Injection-Info: newsfeed.endofthelinebbs.com; posting-host="47.186.35.85"; logging-data="797777"; mail-complaints-to="abuse@endofthelinebbs.com" X-Newsreader: Claws Mail 4.3.1git13 (GTK 3.24.41; x86_64-pc-linux-gnu) Bytes: 2699 Lines: 48 On Mon, 3 Mar 2025 22:55:15 +0100 Julien =C3=89LIE <iulius@nom-de-mon-site.com.invalid> wrote: > Hi Nigel, >=20 > > One sample group from 16 peers. the first thing, so many different > > encodings. I've got ASCII, UTF-8, ISO-8859-1, WINDOWS-1252, even one > > identifying as GB18030. > >=20 > > Next, 8 servers agree on one description, 3 on another, 2 more on > > yet another, and finally 3 think the group is moderated. > >=20 > > How did things get in such a mixed up state? =20 >=20 > Because there originally wasn't any standard for the encoding of > control articles. Most of them did not declare anything (the usual > encoding locally used by the sender was assumed - like gb18030 for > cn.*, koi8-u for ukr.* [my sympathy to them!], big5 for tw.*, > iso-8859-15 for fr.*, cp1252 for most of the others, etc.). > Only "recently" a new version of the standard recommended the use of > UTF-8. >=20 > That why you end up seeing mixed and incoherent encodings in existing=20 > news servers. Not all of them run a version which implements the new=20 > interoperable state of art (UTF-8) to parse control articles. And if=20 > the descriptions pre-date the receival of new control articles, not > all the news administrators have manually homogenized the > descriptions to UTF-8. (No blame in my sentence, just a fact.) >=20 >=20 > > What is even worse when trying to automate this, is when the > > majority of servers have the wrong description or it's half and > > half. =20 >=20 > Just use > https://raw.githubusercontent.com/Julien-Elie/usenet-hierarchies/refs/hea= ds/main/website/data/newsgroups.utf8=20 > :) >=20 That's a good start but I still have 36,519 groups in my active file that aren't in your list. --=20 End Of The Line BBS - Plano, TX telnet endofthelinebbs.com 23