Deutsch   English   Français   Italiano  
<slrn103olo7.gbt.jsevans@jbsd1.home.local>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: news.eternal-september.org!eternal-september.org!feeder3.eternal-september.org!border-1.nntp.ord.giganews.com!border-4.nntp.ord.giganews.com!nntp.giganews.com!s1-2.netnews.com!news-out.netnews.com!postmaster.netnews.com!us1.netnews.com!not-for-mail
X-Trace: DXC=S]iAG]LGmWe8@dMS8Q2e`loWTEARJ3gbjge02JMngm^n=aHS]UU?AToF8`VW2jL]UlE^B]dK>X:ekO0>60V7d>]na:;6XjaOfMiJXZ1[d^mUNm
X-Complaints-To: support@usenetnow.net
Newsgroups: news.software.misc
From: Jason Evans <jsevans@sdf.org>
Subject: Archiving Usenet 2003-2025
Reply-To: tis.a.secret@pm.me
User-Agent: slrn/1.0.3 (OpenBSD)
Message-ID: <slrn103olo7.gbt.jsevans@jbsd1.home.local>
Date: 01 Jun 2025 13:34:55 GMT
Lines: 31
NNTP-Posting-Host: 127.0.0.1
X-Trace: 1748784895 reader.netnews.com 4122 127.0.0.1:33135

A few months ago, I posted about my Usenet archiver application. Since then, 
I have completely retooled it, rewrote it in Python, and it is now a very 
capable tool.

In January, I began a project that I had started many times before but never 
finished. That is, archiving Usenet Newsgroups from 2003 until the current 
year. To do this, I am using a paid Usenet provider and downloading all 
newsgroups in the mbox format and compressing them with gzip. I've been doing 
this since January. You might be wondering why I have been doing this since 
January and I'm still not done? That's because paid Usenet providers prioritize 
binary groups over text groups. I am not archiving binary groups, but when one 
slips under my radar, I can easily see that far more of it has been downloaded 
compared to other newsgroups in the same amount of time.

Anyway, since January, I have downloaded approximately 2TB of Newsgroups. What
newsgroups have I downloaded? The list so far is on my GitHub linked below. If 
there are any well-known groups that are missing, please let me know, and I 
will add them to my queue. You might be wondering where do I get my list of 
newsgroups. I began with the semi-official list from isc.org. 
(https://ftp.isc.org/usenet/CONFIG/newsgroups.gz) I have only omitted the 
following: test groups, e.g., misc.test, binary groups, and some alt groups 
that deal with pedophilia. Next, I got a list of newsgroups that are 
carried by eternal-september, and I started a new queue based on that, 
downloading all of the groups that are not in the isc list. There are a lot 
of them, and I'm hoping to have them done in the coming weeks. I am 
downloading approximately 95 newsgroups at a time in parallel. The limit 
from my Usenet provider is 100 downloads at a time.

I'll update again later when I begin uploading them to the Internet Archive.

https://github.com/tgeek77/usenet_archiver/blob/main/fetch_log.txt