Path: ...!news.nobody.at!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Don Y <blockedofcourse@foo.invalid>
Newsgroups: sci.electronics.design
Subject: Re: Predictive failures
Date: Thu, 18 Apr 2024 20:08:17 -0700
Organization: A noiseless patient Spider
Lines: 62
Message-ID: <uvsn7d$2n8f9$2@dont-email.me>
References: <uvjn74$d54b$1@dont-email.me> <uvrkki$2c9fl$1@dont-email.me>
 <uvs5eu$2g9e9$2@dont-email.me> <PLjUN.6944$59Pb.4425@fx16.iad>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 19 Apr 2024 05:08:33 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="a10f2b25949bbce175e159a01160a168";
	logging-data="2859497"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX187Fs3ssn2sSihx/oblDa30"
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:102.0) Gecko/20100101
 Thunderbird/102.2.2
Cancel-Lock: sha1:6AjZUAM128ykpVd8V8wMdDzCgpk=
Content-Language: en-US
In-Reply-To: <PLjUN.6944$59Pb.4425@fx16.iad>
Bytes: 4232

On 4/18/2024 6:27 PM, Glen Walpert wrote:
> On Thu, 18 Apr 2024 15:05:07 -0700, Don Y wrote:
> 
>> The same applies to secondary storage media.  How will you know if
>> some-rarely-accessed-file is intact and ready to be referenced WHEN
>> NEEDED -- if you aren't doing patrol reads/scrubbing to verify that it
>> is intact, NOW?
>>
>> [One common flaw with RAID implementations and naive reliance on that
>> technology]
> 
> RAID, even with backups, is unsuited to high reliability storage of large
> databases.  Distributed storage can be of much higher reliability:
> 
> https://telnyx.com/resources/what-is-distributed-storage
> 
> <https://towardsdatascience.com/introduction-to-distributed-data-
> storage-2ee03e02a11d>
> 
> This requires successful retrieval of any n of m data files, normally from
> different locations, where n can be arbitrarily smaller than m depending
> on your needs.  Overkill for small databases but required for high
> reliability storage of very large databases.

This is effectively how I maintain my archive.  Except that the
media are all "offline", requiring a human operator (me) to
fetch the required volumes in order to locate the desired files.

Unlike mirroring (or other RAID technologies), my scheme places
no constraints as to the "containers" holding the data.  E.g.,

DISK43  /somewhere/in/filesystem/   fileofinterest
DISK21  >some>other>place           anothernameforfile
CDROM77 \yet\another\place          archive.type       /where/in/archive  foo

Can all yield the same "content" (as verified by their prestored signatures).
Knowing the hash of each object means you can verify its contents from a
single instance instead of looking for confirmation via other instance(s)

[Hashes take up considerably less space than a duplicate copy would]

This makes it easy to create multiple instances of particular "content"
without imposing constraints on how it is named, stored, located, etc.

I.e., pull a disk out of a system, catalog its contents, slap an adhesive
label on it (to be human-readable) and add it to your store.

(If I could mount all of the volumes -- because I wouldn't know which volume
might be needed -- then access wouldn't require a human operator, regardless
of where the volumes were actually mounted or the peculiarities of the
systems on which they are mounted!  But, you can have a daemon that watches to
see WHICH volumes are presently accessible and have it initiate a patrol
read of their contents while the media are being accessed "for whatever OTHER
reason" -- and track the time/date of last "verification" so you know which
volumes haven't been checked, recently)

The inconvenience of requiring human intervention is offset by the lack of
wear on the media (as well as BTUs to keep it accessible) and the ease of
creating NEW content/copies.  NOT useful for data that needs to be accessed
frequently but excellent for "archives"/repositories -- that can be mounted,
accessed and DUPLICATED to online/nearline storage for normal use.