Deutsch English Français Italiano |
<veummc$3gbqs$1@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!npeer.as286.net!npeer-ng0.as286.net!3.eu.feeder.erje.net!feeder.erje.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Don Y <blockedofcourse@foo.invalid> Newsgroups: comp.arch.embedded Subject: Re: Diagnostics Date: Fri, 18 Oct 2024 15:15:30 -0700 Organization: A noiseless patient Spider Lines: 73 Message-ID: <veummc$3gbqs$1@dont-email.me> References: <veekcp$9rsj$1@dont-email.me> <veuggc$1l5eo$1@paganini.bofh.team> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Date: Sat, 19 Oct 2024 00:15:41 +0200 (CEST) Injection-Info: dont-email.me; posting-host="89048b1778d1f63268abb85022497358"; logging-data="3682140"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+f0QXqYmjsj0xU7IR+hMQr" User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.2.2 Cancel-Lock: sha1:5mhFn4obu/Tyvg+jRgKIvJPcNJs= Content-Language: en-US In-Reply-To: <veuggc$1l5eo$1@paganini.bofh.team> Bytes: 4382 On 10/18/2024 1:30 PM, Waldek Hebisch wrote: > Don Y <blockedofcourse@foo.invalid> wrote: >> Typically, one performs some limited "confidence tests" >> at POST to catch gross failures. As this activity is >> "in series" with normal operation, it tends to be brief >> and not very thorough. >> >> Many products offer a BIST capability that the user can invoke >> for more thorough testing. This allows the user to decide >> when he can afford to live without the normal functioning of the >> device. >> >> And, if you are a "robust" designer, you often include invariants >> that verify hardware operations (esp to I/Os) are actually doing >> what they should -- e.g., verifying battery voltage increases >> when you activate the charging circuit, loopbacks on DIOs, etc. >> >> But, for 24/7/365 boxes, POST is a "once-in-a-lifetime" activity. >> And, BIST might not always be convenient (as well as requiring the >> user's consent and participation). >> >> There, runtime diagnostics are the only alternative for hardware >> revalidation, PFA and diagnostics. >> >> How commonly are such mechanisms implemented? And, how thoroughly? > > This is strange question. AFAIK automatically run diagnostics/checks > are part of safety regulations. Not all devices are covered by "regulations". And, the *extent* to which testing is done is the subject addressed; if I ensure "stuff" *WORKED* when the device was powered on (preventing it from continuing on to its normal functionality in the event that some failure was detected), what assurance does that give me that the device's integrity is still intact 8760 hours (1 yr) hours later? 720 hours (1 mo)? 168 hours (1 wk)? 24 hours? *1* hour???? [I.e., how long a device remains "up" is a function of the device, it's application, environment and user] Do you just *hope* the device "happens" to fail in a noticeable manner so a user is left with no doubt but that the device is no longer operational? > Even if some safety critical software > does not contain them, nobody is going to admit violationg regulations. > And things like PLC-s are "dual use", they may be used in non-safety > role, but vendors claim compliance to safety standards. So, if a bit in a RAM in said device *dies* some time after power on, is the device going to *know* that has happened? And, signal its unwillingness to continue operating? What is going to detect that failure? What if the bit's failure is inconsequential to the operation of the device? E.g., if the bit is part of some not-used feature? *Or*, if it has failed in the state it was *supposed* to be in??! With a "good" POST design, you can reassure the user that the device *appears* to be functional. That the data/code stored in it are intact (since last time they were accessed). That the memory is capable of storing any values that is called on to preserve. That the hardware I/Os can control and sense as intended, etc. /But, you have no guarantee that this condition will persist!/ If it WAS guaranteed to persist, then the simple way to make high reliability devices would be just to /never turn them off/ to take advantage of this "guarantee"!