Deutsch   English   Français   Italiano  
<20250303171334.785ee79e@wibble.sysadmininc.com>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: news.eternal-september.org!eternal-september.org!feeder3.eternal-september.org!tncsrv06.tnetconsulting.net!newsfeed.endofthelinebbs.com!.POSTED.47.186.35.85!not-for-mail
From: Nigel Reed <sysop@endofthelinebbs.com>
Newsgroups: news.admin.peering
Subject: Re: Newsgroups files
Date: Mon, 3 Mar 2025 17:13:34 -0600
Organization: End Of The Line BBS
Sender: nelgin@47.186.35.85
Message-ID: <20250303171334.785ee79e@wibble.sysadmininc.com>
References: <20250303133017.7b629d4a@wibble.sysadmininc.com>
	<8mwmd5x3c6.fsf@raybanana.net>
	<20250303143634.5f78bc54@wibble.sysadmininc.com>
	<vq57l9$1nji2$2@news.trigofacile.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Injection-Info: newsfeed.endofthelinebbs.com; posting-host="47.186.35.85";
	logging-data="797777"; mail-complaints-to="abuse@endofthelinebbs.com"
X-Newsreader: Claws Mail 4.3.1git13 (GTK 3.24.41; x86_64-pc-linux-gnu)

On Mon, 3 Mar 2025 22:40:57 +0100
Julien =C3=89LIE <iulius@nom-de-mon-site.com.invalid> wrote:

> Hi Nigel,
>=20
> > I'm probably just going to get a script to pull the most popular of
> > the descriptions for the list and ignore the moderated part unless
> > the group has moderated in its name or a majority think its
> > moderated when do a manual check on those. =20
>=20
> I would suggest to instead just use the latest known descriptions
> (from checkgroups when they are sent).
> I maintain the list encoded in UTF-8 (the standard according to RFCs)
> here:=20
> https://raw.githubusercontent.com/Julien-Elie/usenet-hierarchies/refs/hea=
ds/main/website/data/newsgroups.utf8
>=20
> Also, FWIW, the same list in pure ASCII:
>  =20
> https://raw.githubusercontent.com/Julien-Elie/usenet-hierarchies/refs/hea=
ds/main/website/data/newsgroups.ascii
>=20
>=20
> The usual master file for these descriptions has unfortunately mixed=20
> charsets (like windows-1252 for some descriptions, UTF-8 for others,=20
> ISO-8859-xx variants, etc.):
>      https://ftp.isc.org/pub/usenet/CONFIG/newsgroups
>=20
> That's why I generate the above first two lists :)
> Feel free to use!
>=20

Yes, we've sort of had this discussion before about encoding. This one
is more about the inconsistency of the labeling of the groups.=20

In the newsgroups list above, pretty much every group that contains
non-standard A-Z letters is garbled.

Probably because it's ISO-8859 when I'm using UTF-8. The cn.* groups
are definitely garbled.

I'll just do my best to make a valid UTF-8 file for my server.

--=20
End Of The Line BBS - Plano, TX
telnet endofthelinebbs.com 23