Path: ...!feeds.phibee-telecom.net!3.eu.feeder.erje.net!feeder.erje.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: David Brown Newsgroups: comp.arch.embedded Subject: Re: Diagnostics Date: Sat, 19 Oct 2024 13:57:30 +0200 Organization: A noiseless patient Spider Lines: 73 Message-ID: References: <77k5hjprfq0ipjp6pcdd03lnph1i76ssuu@4ax.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Date: Sat, 19 Oct 2024 13:57:30 +0200 (CEST) Injection-Info: dont-email.me; posting-host="45d90c9824adc1c103d59af969ef1163"; logging-data="4068956"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19WKyUXXtbosMhLglSlPJWExCHNiYtyLUM=" User-Agent: Mozilla Thunderbird Cancel-Lock: sha1:IhfW70a05sC7ZVnrck0LKRTFDI0= In-Reply-To: <77k5hjprfq0ipjp6pcdd03lnph1i76ssuu@4ax.com> Content-Language: en-GB Bytes: 4652 On 18/10/2024 23:42, George Neuner wrote: > On Fri, 18 Oct 2024 20:30:06 -0000 (UTC), antispam@fricas.org (Waldek > Hebisch) wrote: > >> Don Y wrote: >>> Typically, one performs some limited "confidence tests" >>> at POST to catch gross failures. As this activity is >>> "in series" with normal operation, it tends to be brief >>> and not very thorough. >>> >>> Many products offer a BIST capability that the user can invoke >>> for more thorough testing. This allows the user to decide >>> when he can afford to live without the normal functioning of the >>> device. >>> >>> And, if you are a "robust" designer, you often include invariants >>> that verify hardware operations (esp to I/Os) are actually doing >>> what they should -- e.g., verifying battery voltage increases >>> when you activate the charging circuit, loopbacks on DIOs, etc. >>> >>> But, for 24/7/365 boxes, POST is a "once-in-a-lifetime" activity. >>> And, BIST might not always be convenient (as well as requiring the >>> user's consent and participation). >>> >>> There, runtime diagnostics are the only alternative for hardware >>> revalidation, PFA and diagnostics. >>> >>> How commonly are such mechanisms implemented? And, how thoroughly? >> >> This is strange question. AFAIK automatically run diagnostics/checks >> are part of safety regulations. Even if some safety critical software >> does not contain them, nobody is going to admit violationg regulations. >> And things like PLC-s are "dual use", they may be used in non-safety >> role, but vendors claim compliance to safety standards. > > However, only a minor percentage of all devices must comply with such > safety regulations. > > As I understand it, Don is working on tech for "smart home" > implementations ... devices that may be expected to run nearly > constantly (though perhaps not 365/24 with 6 9's reliability), but > which, for the most part, are /not/ safety critical. > > WRT Don's question, I don't know the answer, but I suspect runtime > diagnostics are /not/ routinely implemented for devices that are not > safety critical. Reason: diagnostics interfere with operation of > they happen to be testing. Even if the test is at low(est) > priority and is interruptible by any other activity, it still might > cause an unacceptable delay in a real time situation. To ensure 100% > functionality at all times effectively requires use of redundant > hardware - which generally is too expensive for a non safety critical > device. > That brings up one of the critical points about any kind of runtime diagnostics - what do you do if there is a failure? Until you can answer that question, any effort on diagnostics is not just pointless, but worse than useless because you are adding more stuff that could go wrong. I think bad or useless diagnostics are a more common problem than missing diagnostics. People feel pressured into having them when they can't measure anything useful and you can't do anything sensible with the results. I have seen first-hand how the insistence of having all sorts of diagnostics added to a product so that it could be "safety" certified actually result in a less reliable and less safe product. The only "safety" they provided was legal safety so that people could claim it wasn't their fault if it failed, because they had added all the self-tests required by the so-called safety experts.