Path: ...!news.nobody.at!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Don Y Newsgroups: sci.electronics.design Subject: Re: Predictive failures Date: Thu, 18 Apr 2024 20:08:17 -0700 Organization: A noiseless patient Spider Lines: 62 Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Date: Fri, 19 Apr 2024 05:08:33 +0200 (CEST) Injection-Info: dont-email.me; posting-host="a10f2b25949bbce175e159a01160a168"; logging-data="2859497"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX187Fs3ssn2sSihx/oblDa30" User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.2.2 Cancel-Lock: sha1:6AjZUAM128ykpVd8V8wMdDzCgpk= Content-Language: en-US In-Reply-To: Bytes: 4232 On 4/18/2024 6:27 PM, Glen Walpert wrote: > On Thu, 18 Apr 2024 15:05:07 -0700, Don Y wrote: > >> The same applies to secondary storage media. How will you know if >> some-rarely-accessed-file is intact and ready to be referenced WHEN >> NEEDED -- if you aren't doing patrol reads/scrubbing to verify that it >> is intact, NOW? >> >> [One common flaw with RAID implementations and naive reliance on that >> technology] > > RAID, even with backups, is unsuited to high reliability storage of large > databases. Distributed storage can be of much higher reliability: > > https://telnyx.com/resources/what-is-distributed-storage > > storage-2ee03e02a11d> > > This requires successful retrieval of any n of m data files, normally from > different locations, where n can be arbitrarily smaller than m depending > on your needs. Overkill for small databases but required for high > reliability storage of very large databases. This is effectively how I maintain my archive. Except that the media are all "offline", requiring a human operator (me) to fetch the required volumes in order to locate the desired files. Unlike mirroring (or other RAID technologies), my scheme places no constraints as to the "containers" holding the data. E.g., DISK43 /somewhere/in/filesystem/ fileofinterest DISK21 >some>other>place anothernameforfile CDROM77 \yet\another\place archive.type /where/in/archive foo Can all yield the same "content" (as verified by their prestored signatures). Knowing the hash of each object means you can verify its contents from a single instance instead of looking for confirmation via other instance(s) [Hashes take up considerably less space than a duplicate copy would] This makes it easy to create multiple instances of particular "content" without imposing constraints on how it is named, stored, located, etc. I.e., pull a disk out of a system, catalog its contents, slap an adhesive label on it (to be human-readable) and add it to your store. (If I could mount all of the volumes -- because I wouldn't know which volume might be needed -- then access wouldn't require a human operator, regardless of where the volumes were actually mounted or the peculiarities of the systems on which they are mounted! But, you can have a daemon that watches to see WHICH volumes are presently accessible and have it initiate a patrol read of their contents while the media are being accessed "for whatever OTHER reason" -- and track the time/date of last "verification" so you know which volumes haven't been checked, recently) The inconvenience of requiring human intervention is offset by the lack of wear on the media (as well as BTUs to keep it accessible) and the ease of creating NEW content/copies. NOT useful for data that needs to be accessed frequently but excellent for "archives"/repositories -- that can be mounted, accessed and DUPLICATED to online/nearline storage for normal use.