| Deutsch English Français Italiano |
|
<slrn103olo7.gbt.jsevans@jbsd1.home.local> View for Bookmarking (what is this?) Look up another Usenet article |
Path: news.eternal-september.org!eternal-september.org!feeder3.eternal-september.org!border-1.nntp.ord.giganews.com!border-4.nntp.ord.giganews.com!nntp.giganews.com!s1-2.netnews.com!news-out.netnews.com!postmaster.netnews.com!us1.netnews.com!not-for-mail X-Trace: DXC=S]iAG]LGmWe8@dMS8Q2e`loWTEARJ3gbjge02JMngm^n=aHS]UU?AToF8`VW2jL]UlE^B]dK>X:ekO0>60V7d>]na:;6XjaOfMiJXZ1[d^mUNm X-Complaints-To: support@usenetnow.net Newsgroups: news.software.misc From: Jason Evans <jsevans@sdf.org> Subject: Archiving Usenet 2003-2025 Reply-To: tis.a.secret@pm.me User-Agent: slrn/1.0.3 (OpenBSD) Message-ID: <slrn103olo7.gbt.jsevans@jbsd1.home.local> Date: 01 Jun 2025 13:34:55 GMT Lines: 31 NNTP-Posting-Host: 127.0.0.1 X-Trace: 1748784895 reader.netnews.com 4122 127.0.0.1:33135 A few months ago, I posted about my Usenet archiver application. Since then, I have completely retooled it, rewrote it in Python, and it is now a very capable tool. In January, I began a project that I had started many times before but never finished. That is, archiving Usenet Newsgroups from 2003 until the current year. To do this, I am using a paid Usenet provider and downloading all newsgroups in the mbox format and compressing them with gzip. I've been doing this since January. You might be wondering why I have been doing this since January and I'm still not done? That's because paid Usenet providers prioritize binary groups over text groups. I am not archiving binary groups, but when one slips under my radar, I can easily see that far more of it has been downloaded compared to other newsgroups in the same amount of time. Anyway, since January, I have downloaded approximately 2TB of Newsgroups. What newsgroups have I downloaded? The list so far is on my GitHub linked below. If there are any well-known groups that are missing, please let me know, and I will add them to my queue. You might be wondering where do I get my list of newsgroups. I began with the semi-official list from isc.org. (https://ftp.isc.org/usenet/CONFIG/newsgroups.gz) I have only omitted the following: test groups, e.g., misc.test, binary groups, and some alt groups that deal with pedophilia. Next, I got a list of newsgroups that are carried by eternal-september, and I started a new queue based on that, downloading all of the groups that are not in the isc list. There are a lot of them, and I'm hoping to have them done in the coming weeks. I am downloading approximately 95 newsgroups at a time in parallel. The limit from my Usenet provider is 100 downloads at a time. I'll update again later when I begin uploading them to the Internet Archive. https://github.com/tgeek77/usenet_archiver/blob/main/fetch_log.txt