Deutsch   English   Français   Italiano  
<20240910.163522.c0b2fc24@mixmin.net>

View for Bookmarking (what is this?)
Look up another Usenet article

References: <database-20240909114248@ram.dialup.fu-berlin.de>
Message-Id: <20240910.163522.c0b2fc24@mixmin.net>
Date: Tue, 10 Sep 2024 16:35:22 +0100
Subject: Re: Post DB
Content-Transfer-Encoding: 7bit
From: D <noreply@mixmin.net>
Newsgroups: news.software.readers
Path: ...!news.mixmin.net!news2.arglkargh.de!alphared!sewer!news.dizum.net!not-for-mail
Organization: dizum.com - The Internet Problem Provider
X-Abuse: abuse@dizum.com
Injection-Info: sewer.dizum.com - 2001::1/128
Bytes: 9028
Lines: 233

On 9 Sep 2024 10:44:36 GMT, ram@zedat.fu-berlin.de (Stefan Ram) wrote:
>  I'm not 100% sure I'm barking up the right tree (newsgroup) here,
>  but whatever.
>  So, yesterday I was chewing the fat about how to whip up a database
>  for posts retrieved from newsservers.
>  I'm picturing some program that pulls newsgroups from newsservers
>  and dumps them into a database.
>  In my mind's eye, a post looks something like this, give or take:
>Path: A
>Message-ID: B
>Body: C
>  . But if you snag the same post from a different server, it might
>  look like this:
>Message-ID: B
>Path: D
>Body: C
>  . At first blush, you'd end up with the same body stored multiple
>  times in the database. Talk about a waste of space!
>  To trim the fat, we could rejigger these posts so all the variable
>  stuff is up front:
>Path: A
>Message-ID: B
>Body: C
>  and
>Path: D
>Message-ID: B
>Body: C
>  Now the tail end of both posts is identical, so we can toss that
>  in a separate table at position 0.
>  The posts themselves would then just contain the different parts
>  and a pointer to the shared bit that's only stored once:
>Path: A
>Rest: 0
>Path: D
>Rest: 0
>0:
>Message-ID: B
>Body: C
>  . This way, you could store the same post from multiple newsservers
>  without eating up your hard drive space like it's In-N-Out fries.

twelve server samples of your article headers show remarkable consistency:
1 path, 2 from, 3 newsgroups, 4 subject, 5 date, 6 organization, 7 lines,
8 expires, 9 message-id, 10 mime-version, 11 content-type, 12 content-
transfer-encoding, 13 x-trace, 14 cancel-lock, 15 x-copyright, 16 x-no-
archive, 17 archive, 18 x-no-archive-readme, 19 x-no-html, 20 content-
language, 21 xref (first sample full headers, then snipped for brevity):

news:news.alphared.net
>Path: alphared!3.eu.feeder.erje.net!feeder.erje.net!fu-berlin.de!uni-berlin.de!not-for-mail
>From: ram@zedat.fu-berlin.de (Stefan Ram)
>Newsgroups: news.software.readers
>Subject: Post DB
>Date: 9 Sep 2024 10:44:36 GMT
>Organization: Stefan Ram
>Lines: 63
>Expires: 1 Jul 2025 11:59:58 GMT
>Message-ID: <database-20240909114248@ram.dialup.fu-berlin.de>
>Mime-Version: 1.0
>Content-Type: text/plain; charset=UTF-8
>Content-Transfer-Encoding: 8bit
>X-Trace: news.uni-berlin.de b02EqmO53gQ7jbmmMP85UgkDHjtKodMUvyU6kuS12ifm6t
>Cancel-Lock: sha1:BiPE/2gBrIau46RUtTtIhXqrOSQ= sha256:ciJQo1bvZST9PNWeu73aWJv3mxLLHhWyjI7ehRRUSH4=
>X-Copyright: (C) Copyright 2024 Stefan Ram. All rights reserved.
>	Distribution through any means other than regular usenet
>	channels is forbidden. It is forbidden to publish this
>	article in the Web, to change URIs of this article into links,
>        and to transfer the body without this notice, but quotations
>        of parts in other Usenet posts are allowed.
>X-No-Archive: Yes
>Archive: no
>X-No-Archive-Readme: "X-No-Archive" is set, because this prevents some
>	services to mirror the article in the web. But the article may
>	be kept on a Usenet archive server with only NNTP access.
>X-No-Html: yes
>Content-Language: en-US
>Xref: alphared news.software.readers:11775
>
>  I'm not 100% sure I'm barking up the right tree (newsgroup) here,

news:news.alt119.net
>Path: news.alt119.net!peer.alt119.net!news.samoylyk.net!fu-berlin.de!uni-berlin.de!not-for-mail
>From: ram@zedat.fu-berlin.de (Stefan Ram)
>Newsgroups: news.software.readers
>Subject: Post DB
>Date: 9 Sep 2024 10:44:36 GMT
>Organization: Stefan Ram
>Lines: 63
>Expires: 1 Jul 2025 11:59:58 GMT
>Message-ID: <database-20240909114248@ram.dialup.fu-berlin.de>
snip
>
>  I'm not 100% sure I'm barking up the right tree (newsgroup) here,

news:news.blueworldhosting.com
>Path: nnrp.usenet.blueworldhosting.com!!spool1.usenet.blueworldhosting.com!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!fu-berlin.de!uni-berlin.de!not-for-mail
>From: ram@zedat.fu-berlin.de (Stefan Ram)
>Newsgroups: news.software.readers
>Subject: Post DB
>Date: 9 Sep 2024 10:44:36 GMT
>Organization: Stefan Ram
>Lines: 63
>Expires: 1 Jul 2025 11:59:58 GMT
>Message-ID: <database-20240909114248@ram.dialup.fu-berlin.de>
snip
>
>  I'm not 100% sure I'm barking up the right tree (newsgroup) here,

news:news.dizum.net
>Path: sewer!weretis.net!feeder8.news.weretis.net!fu-berlin.de!uni-berlin.de!not-for-mail
>From: ram@zedat.fu-berlin.de (Stefan Ram)
>Newsgroups: news.software.readers
>Subject: Post DB
>Date: 9 Sep 2024 10:44:36 GMT
>Organization: Stefan Ram
>Lines: 63
>Expires: 1 Jul 2025 11:59:58 GMT
>Message-ID: <database-20240909114248@ram.dialup.fu-berlin.de>
snip
>
>  I'm not 100% sure I'm barking up the right tree (newsgroup) here,

news:freenews.netfront.net
>Path: news.netfront.net!border-2.nntp.ord.giganews.com!nntp.giganews.com!weretis.net!feeder8.news.weretis.net!fu-berlin.de!uni-berlin.de!not-for-mail
>From: ram@zedat.fu-berlin.de (Stefan Ram)
>Newsgroups: news.software.readers
>Subject: Post DB
>Date: 9 Sep 2024 10:44:36 GMT
>Organization: Stefan Ram
>Lines: 63
>Expires: 1 Jul 2025 11:59:58 GMT
>Message-ID: <database-20240909114248@ram.dialup.fu-berlin.de>
snip
>
>  I'm not 100% sure I'm barking up the right tree (newsgroup) here,

news:news.i2pn2.org
>Path: i2pn2.org!rocksolid2!news.neodome.net!fu-berlin.de!uni-berlin.de!not-for-mail
>From: ram@zedat.fu-berlin.de (Stefan Ram)
>Newsgroups: news.software.readers
>Subject: Post DB
>Date: 9 Sep 2024 10:44:36 GMT
>Organization: Stefan Ram
>Lines: 63
>Expires: 1 Jul 2025 11:59:58 GMT
>Message-ID: <database-20240909114248@ram.dialup.fu-berlin.de>
snip
>
>  I'm not 100% sure I'm barking up the right tree (newsgroup) here,

news:news.neodome.net
>Path: news.neodome.net!fu-berlin.de!uni-berlin.de!not-for-mail
>From: ram@zedat.fu-berlin.de (Stefan Ram)
>Newsgroups: news.software.readers
>Subject: Post DB
>Date: 9 Sep 2024 10:44:36 GMT
>Organization: Stefan Ram
>Lines: 63
>Expires: 1 Jul 2025 11:59:58 GMT
>Message-ID: <database-20240909114248@ram.dialup.fu-berlin.de>
snip
>
>  I'm not 100% sure I'm barking up the right tree (newsgroup) here,

news:news.mixmin.net
>Path: news.mixmin.net!weretis.net!feeder8.news.weretis.net!fu-berlin.de!uni-berlin.de!not-for-mail
>From: ram@zedat.fu-berlin.de (Stefan Ram)
>Newsgroups: news.software.readers
>Subject: Post DB
>Date: 9 Sep 2024 10:44:36 GMT
>Organization: Stefan Ram
>Lines: 63
>Expires: 1 Jul 2025 11:59:58 GMT
>Message-ID: <database-20240909114248@ram.dialup.fu-berlin.de>
snip
>
>  I'm not 100% sure I'm barking up the right tree (newsgroup) here,

news:news.novabbs.org
>Path: rocksolid2!news.neodome.net!fu-berlin.de!uni-berlin.de!not-for-mail
>From: ram@zedat.fu-berlin.de (Stefan Ram)
>Newsgroups: news.software.readers
>Subject: Post DB
>Date: 9 Sep 2024 10:44:36 GMT
>Organization: Stefan Ram
>Lines: 63
========== REMAINDER OF ARTICLE TRUNCATED ==========