Deutsch   English   Français   Italiano  
<20250303173359.6c3af31a@wibble.sysadmininc.com>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!weretis.net!feeder9.news.weretis.net!newsfeed.endofthelinebbs.com!.POSTED.47.186.35.85!not-for-mail
From: Nigel Reed <sysop@endofthelinebbs.com>
Newsgroups: news.admin.peering
Subject: Re: Newsgroups files
Date: Mon, 3 Mar 2025 17:33:59 -0600
Organization: End Of The Line BBS
Sender: nelgin@47.186.35.85
Message-ID: <20250303173359.6c3af31a@wibble.sysadmininc.com>
References: <20250303133017.7b629d4a@wibble.sysadmininc.com>
	<vq58g3$1nji2$3@news.trigofacile.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Injection-Info: newsfeed.endofthelinebbs.com; posting-host="47.186.35.85";
	logging-data="797777"; mail-complaints-to="abuse@endofthelinebbs.com"
X-Newsreader: Claws Mail 4.3.1git13 (GTK 3.24.41; x86_64-pc-linux-gnu)
Bytes: 2699
Lines: 48

On Mon, 3 Mar 2025 22:55:15 +0100
Julien =C3=89LIE <iulius@nom-de-mon-site.com.invalid> wrote:

> Hi Nigel,
>=20
> > One sample group from 16 peers. the first thing, so many different
> > encodings. I've got ASCII, UTF-8, ISO-8859-1, WINDOWS-1252, even one
> > identifying as GB18030.
> >=20
> > Next, 8 servers agree on one description, 3 on another, 2 more on
> > yet another, and finally 3 think the group is moderated.
> >=20
> > How did things get in such a mixed up state? =20
>=20
> Because there originally wasn't any standard for the encoding of
> control articles.  Most of them did not declare anything (the usual
> encoding locally used by the sender was assumed - like gb18030 for
> cn.*, koi8-u for ukr.* [my sympathy to them!], big5 for tw.*,
> iso-8859-15 for fr.*, cp1252 for most of the others, etc.).
> Only "recently" a new version of the standard recommended the use of
> UTF-8.
>=20
> That why you end up seeing mixed and incoherent encodings in existing=20
> news servers.  Not all of them run a version which implements the new=20
> interoperable state of art (UTF-8) to parse control articles.  And if=20
> the descriptions pre-date the receival of new control articles, not
> all the news administrators have manually homogenized the
> descriptions to UTF-8.  (No blame in my sentence, just a fact.)
>=20
>=20
> > What is even worse when trying to automate this, is when the
> > majority of servers have the wrong description or it's half and
> > half. =20
>=20
> Just use
> https://raw.githubusercontent.com/Julien-Elie/usenet-hierarchies/refs/hea=
ds/main/website/data/newsgroups.utf8=20
> :)
>=20

That's a good start but I still have 36,519 groups in my active file
that aren't in your list.


--=20
End Of The Line BBS - Plano, TX
telnet endofthelinebbs.com 23