References: Message-Id: <20240910.163522.c0b2fc24@mixmin.net> Date: Tue, 10 Sep 2024 16:35:22 +0100 Subject: Re: Post DB Content-Transfer-Encoding: 7bit From: D Newsgroups: news.software.readers Path: ...!news.mixmin.net!news2.arglkargh.de!alphared!sewer!news.dizum.net!not-for-mail Organization: dizum.com - The Internet Problem Provider X-Abuse: abuse@dizum.com Injection-Info: sewer.dizum.com - 2001::1/128 Bytes: 9028 Lines: 233 On 9 Sep 2024 10:44:36 GMT, ram@zedat.fu-berlin.de (Stefan Ram) wrote: > I'm not 100% sure I'm barking up the right tree (newsgroup) here, > but whatever. > So, yesterday I was chewing the fat about how to whip up a database > for posts retrieved from newsservers. > I'm picturing some program that pulls newsgroups from newsservers > and dumps them into a database. > In my mind's eye, a post looks something like this, give or take: >Path: A >Message-ID: B >Body: C > . But if you snag the same post from a different server, it might > look like this: >Message-ID: B >Path: D >Body: C > . At first blush, you'd end up with the same body stored multiple > times in the database. Talk about a waste of space! > To trim the fat, we could rejigger these posts so all the variable > stuff is up front: >Path: A >Message-ID: B >Body: C > and >Path: D >Message-ID: B >Body: C > Now the tail end of both posts is identical, so we can toss that > in a separate table at position 0. > The posts themselves would then just contain the different parts > and a pointer to the shared bit that's only stored once: >Path: A >Rest: 0 >Path: D >Rest: 0 >0: >Message-ID: B >Body: C > . This way, you could store the same post from multiple newsservers > without eating up your hard drive space like it's In-N-Out fries. twelve server samples of your article headers show remarkable consistency: 1 path, 2 from, 3 newsgroups, 4 subject, 5 date, 6 organization, 7 lines, 8 expires, 9 message-id, 10 mime-version, 11 content-type, 12 content- transfer-encoding, 13 x-trace, 14 cancel-lock, 15 x-copyright, 16 x-no- archive, 17 archive, 18 x-no-archive-readme, 19 x-no-html, 20 content- language, 21 xref (first sample full headers, then snipped for brevity): news:news.alphared.net >Path: alphared!3.eu.feeder.erje.net!feeder.erje.net!fu-berlin.de!uni-berlin.de!not-for-mail >From: ram@zedat.fu-berlin.de (Stefan Ram) >Newsgroups: news.software.readers >Subject: Post DB >Date: 9 Sep 2024 10:44:36 GMT >Organization: Stefan Ram >Lines: 63 >Expires: 1 Jul 2025 11:59:58 GMT >Message-ID: >Mime-Version: 1.0 >Content-Type: text/plain; charset=UTF-8 >Content-Transfer-Encoding: 8bit >X-Trace: news.uni-berlin.de b02EqmO53gQ7jbmmMP85UgkDHjtKodMUvyU6kuS12ifm6t >Cancel-Lock: sha1:BiPE/2gBrIau46RUtTtIhXqrOSQ= sha256:ciJQo1bvZST9PNWeu73aWJv3mxLLHhWyjI7ehRRUSH4= >X-Copyright: (C) Copyright 2024 Stefan Ram. All rights reserved. > Distribution through any means other than regular usenet > channels is forbidden. It is forbidden to publish this > article in the Web, to change URIs of this article into links, > and to transfer the body without this notice, but quotations > of parts in other Usenet posts are allowed. >X-No-Archive: Yes >Archive: no >X-No-Archive-Readme: "X-No-Archive" is set, because this prevents some > services to mirror the article in the web. But the article may > be kept on a Usenet archive server with only NNTP access. >X-No-Html: yes >Content-Language: en-US >Xref: alphared news.software.readers:11775 > > I'm not 100% sure I'm barking up the right tree (newsgroup) here, news:news.alt119.net >Path: news.alt119.net!peer.alt119.net!news.samoylyk.net!fu-berlin.de!uni-berlin.de!not-for-mail >From: ram@zedat.fu-berlin.de (Stefan Ram) >Newsgroups: news.software.readers >Subject: Post DB >Date: 9 Sep 2024 10:44:36 GMT >Organization: Stefan Ram >Lines: 63 >Expires: 1 Jul 2025 11:59:58 GMT >Message-ID: snip > > I'm not 100% sure I'm barking up the right tree (newsgroup) here, news:news.blueworldhosting.com >Path: nnrp.usenet.blueworldhosting.com!!spool1.usenet.blueworldhosting.com!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!fu-berlin.de!uni-berlin.de!not-for-mail >From: ram@zedat.fu-berlin.de (Stefan Ram) >Newsgroups: news.software.readers >Subject: Post DB >Date: 9 Sep 2024 10:44:36 GMT >Organization: Stefan Ram >Lines: 63 >Expires: 1 Jul 2025 11:59:58 GMT >Message-ID: snip > > I'm not 100% sure I'm barking up the right tree (newsgroup) here, news:news.dizum.net >Path: sewer!weretis.net!feeder8.news.weretis.net!fu-berlin.de!uni-berlin.de!not-for-mail >From: ram@zedat.fu-berlin.de (Stefan Ram) >Newsgroups: news.software.readers >Subject: Post DB >Date: 9 Sep 2024 10:44:36 GMT >Organization: Stefan Ram >Lines: 63 >Expires: 1 Jul 2025 11:59:58 GMT >Message-ID: snip > > I'm not 100% sure I'm barking up the right tree (newsgroup) here, news:freenews.netfront.net >Path: news.netfront.net!border-2.nntp.ord.giganews.com!nntp.giganews.com!weretis.net!feeder8.news.weretis.net!fu-berlin.de!uni-berlin.de!not-for-mail >From: ram@zedat.fu-berlin.de (Stefan Ram) >Newsgroups: news.software.readers >Subject: Post DB >Date: 9 Sep 2024 10:44:36 GMT >Organization: Stefan Ram >Lines: 63 >Expires: 1 Jul 2025 11:59:58 GMT >Message-ID: snip > > I'm not 100% sure I'm barking up the right tree (newsgroup) here, news:news.i2pn2.org >Path: i2pn2.org!rocksolid2!news.neodome.net!fu-berlin.de!uni-berlin.de!not-for-mail >From: ram@zedat.fu-berlin.de (Stefan Ram) >Newsgroups: news.software.readers >Subject: Post DB >Date: 9 Sep 2024 10:44:36 GMT >Organization: Stefan Ram >Lines: 63 >Expires: 1 Jul 2025 11:59:58 GMT >Message-ID: snip > > I'm not 100% sure I'm barking up the right tree (newsgroup) here, news:news.neodome.net >Path: news.neodome.net!fu-berlin.de!uni-berlin.de!not-for-mail >From: ram@zedat.fu-berlin.de (Stefan Ram) >Newsgroups: news.software.readers >Subject: Post DB >Date: 9 Sep 2024 10:44:36 GMT >Organization: Stefan Ram >Lines: 63 >Expires: 1 Jul 2025 11:59:58 GMT >Message-ID: snip > > I'm not 100% sure I'm barking up the right tree (newsgroup) here, news:news.mixmin.net >Path: news.mixmin.net!weretis.net!feeder8.news.weretis.net!fu-berlin.de!uni-berlin.de!not-for-mail >From: ram@zedat.fu-berlin.de (Stefan Ram) >Newsgroups: news.software.readers >Subject: Post DB >Date: 9 Sep 2024 10:44:36 GMT >Organization: Stefan Ram >Lines: 63 >Expires: 1 Jul 2025 11:59:58 GMT >Message-ID: snip > > I'm not 100% sure I'm barking up the right tree (newsgroup) here, news:news.novabbs.org >Path: rocksolid2!news.neodome.net!fu-berlin.de!uni-berlin.de!not-for-mail >From: ram@zedat.fu-berlin.de (Stefan Ram) >Newsgroups: news.software.readers >Subject: Post DB >Date: 9 Sep 2024 10:44:36 GMT >Organization: Stefan Ram >Lines: 63 ========== REMAINDER OF ARTICLE TRUNCATED ==========