Path: Xl.tags.giganews.com!local-2.nntp.ord.giganews.com!news.giganews.com.POSTED!not-for-mail NNTP-Posting-Date: Tue, 12 Mar 2024 17:25:41 +0000 Subject: Re: Archive Any And All Text Usenet Newsgroups: news.admin.peering,news.software.nntp References: From: Ross Finlayson Date: Tue, 12 Mar 2024 10:25:42 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Message-ID: Lines: 126 X-Usenet-Provider: http://www.giganews.com X-Trace: sv3-ZheNKkwAKnIbqWxxrhV3WkYvr6MBkBT5gemazKd2n7ub7klFXKvkJTOk5wbu9g0kbXe/SBpwu9ADGCz!WFuGqVUZEjQFNYGUQMsNBgyKke4UHLd8G6NDvdICiXnZ9nMI5p16ty4Q6c58nzqhJZxwfYfArrEI!jw== X-Complaints-To: abuse@giganews.com X-DMCA-Notifications: http://www.giganews.com/info/dmca.html X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your complaint properly X-Postfilter: 1.3.40 Bytes: 6126 On 03/10/2024 10:48 PM, Ross Finlayson wrote: > On 03/10/2024 09:12 PM, David Chmelik wrote: >> On Sat, 9 Mar 2024 10:01:52 -0800, Ross Finlayson wrote: >> >>> Hello. I'd like to start with saying thanks to Usenet administrators >>> and originators, >>> Usenet has a lot of perceived value as a cultural artifact, and also a >>> great experiment in free speech, association, and press. >>> >>> Here I'm mostly interested in text Usenet, >>> not binaries, that text Usenet is a great artifact and experiment in >>> speech, association, >>> and press. >>> >>> When I saw this example that may have a lot of old Usenet, then it sort >>> of aligned with an idea that started as an idea of vanity press, about >>> an archive of a group. >>> Now though, I wonder how to define an "archive any and all text usenet", >>> AAATU, >>> filesystem convention, as a sort of "Library Filesystem Format", LFF. >>> [...] >> >> Sounds good; I'm interested in full archive of text newsgroups I use >> (1300+) but don't know free Usenet servers even go back to when I started >> (1996, though tried Internet in museum before Eternal September). I'm >> aware I could use commercial ones that may, but don't know which nor >> cost/ >> space. Is Google Groups the only going back to 1981? I hope other >> servers managed to save that before Google disconnected from peers or >> some >> might turn up back to 1979. >> >> Accessing some old binary ones would be nice also, but these days people >> use commercial servers for those, which probably didn't save even back to >> '90s... an archive of those (even though I'm uninterested in most rather >> than a few relating to history of science, some types of art/graphics & >> music) would presumably be too large except for data centres. >> > > > Hey, thanks for writing. > > Estimates and, you know, reliable estimates, > would help a lot to estimate the scope of > the scale of the order of, the things. > It seems perhaps the best way, or simplest way, to affect a group-date file contain the file entries, is to take the above and store it in a zip format file. The zip format file, supports random access to the files within it, given random access to the zip file, for example memory-mapping the file, seeking to the end and seeking back through the entries to a given path, and accessing that entry with the usual algorithm of compression named deflate. The idea then is a "group-date co-ordinate hour-minute granular message list", figuring that each message has either a more granular date in it or has synthesized an estimated date header, that should fit on any file system, then for zip files of those, and "virtual filesystem" or "synthetic filesystem", then for each a.b.c.yyyymmdd.zip and a.b.c.yyyy.zip the concatenation of those, figuring group names, or, mailbox names, are legal filenames, with regards to those being the most fungible way to result files, that aren't growing files, that can be validated to have well-formed messages in the coordinate of the group and date, as an archival format, and an interchange format, then for making it so to load and dump these, into and out of useful and usual backing stores, either filesystem or DB. So, what this involves to "specify the LFF", is for the limits of the filesystem and the limits of the packaging file or zip file, that "any and all text Usenet" messages can be be in files this way, with "reference routines" and "reference" algorithms, to result for NNTP, a 100% instructions, that results downloading the LFF files, and generating from it groups-files and overview-files and so on, "write-append-growing" files where here these are otherwise "write-once-read-many", files, to accommodate both being an archival form, with entirely open specification for digital preservation, and having reference routines into and out of, the backing stores of usual implementations of servers. Is it sort of the same thing with regards to any old kind of Internet messages besides as with regards to especially Usenet NNTP Internet messages? Yeah, sort of. Here though what I'd hope to find is, especially, or here are my questions: 1) how many Usenet groups are there? text-only, Big 8, then national, institutional, corp 2) what's the most messages a group ever had in one day? 3) is there a list of the birth-dates of the groups? 4) about before the great-renaming, can you describe that? Well thanks for reading, I've been tapping away at more of this sort of idea on sci.math "Meta: a usenet server just for sci.math", about, "BFF backing file format", "SFF summary/search file formats", and the runtime and protocols, then here the idea is about "LFF library/lifetime file formats".