Path: ...!feeds.phibee-telecom.net!3.eu.feeder.erje.net!feeder.erje.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: David Brown <david.brown@hesbynett.no>
Newsgroups: comp.arch.embedded
Subject: Re: Diagnostics
Date: Sat, 19 Oct 2024 13:57:30 +0200
Organization: A noiseless patient Spider
Lines: 73
Message-ID: <vf06ra$3s5is$1@dont-email.me>
References: <veekcp$9rsj$1@dont-email.me> <veuggc$1l5eo$1@paganini.bofh.team>
 <77k5hjprfq0ipjp6pcdd03lnph1i76ssuu@4ax.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 19 Oct 2024 13:57:30 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="45d90c9824adc1c103d59af969ef1163";
	logging-data="4068956"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX19WKyUXXtbosMhLglSlPJWExCHNiYtyLUM="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:IhfW70a05sC7ZVnrck0LKRTFDI0=
In-Reply-To: <77k5hjprfq0ipjp6pcdd03lnph1i76ssuu@4ax.com>
Content-Language: en-GB
Bytes: 4652

On 18/10/2024 23:42, George Neuner wrote:
> On Fri, 18 Oct 2024 20:30:06 -0000 (UTC), antispam@fricas.org (Waldek
> Hebisch) wrote:
> 
>> Don Y <blockedofcourse@foo.invalid> wrote:
>>> Typically, one performs some limited "confidence tests"
>>> at POST to catch gross failures.  As this activity is
>>> "in series" with normal operation, it tends to be brief
>>> and not very thorough.
>>>
>>> Many products offer a BIST capability that the user can invoke
>>> for more thorough testing.  This allows the user to decide
>>> when he can afford to live without the normal functioning of the
>>> device.
>>>
>>> And, if you are a "robust" designer, you often include invariants
>>> that verify hardware operations (esp to I/Os) are actually doing
>>> what they should -- e.g., verifying battery voltage increases
>>> when you activate the charging circuit, loopbacks on DIOs, etc.
>>>
>>> But, for 24/7/365 boxes, POST is a "once-in-a-lifetime" activity.
>>> And, BIST might not always be convenient (as well as requiring the
>>> user's consent and participation).
>>>
>>> There, runtime diagnostics are the only alternative for hardware
>>> revalidation, PFA and diagnostics.
>>>
>>> How commonly are such mechanisms implemented?  And, how thoroughly?
>>
>> This is strange question.  AFAIK automatically run diagnostics/checks
>> are part of safety regulations.  Even if some safety critical software
>> does not contain them, nobody is going to admit violationg regulations.
>> And things like PLC-s are "dual use", they may be used in non-safety
>> role, but vendors claim compliance to safety standards.
> 
> However, only a minor percentage of all devices must comply with such
> safety regulations.
> 
> As I understand it, Don is working on tech for "smart home"
> implementations ... devices that may be expected to run nearly
> constantly (though perhaps not 365/24 with 6 9's reliability), but
> which, for the most part, are /not/ safety critical.
> 
> WRT Don's question, I don't know the answer, but I suspect runtime
> diagnostics are /not/ routinely implemented for devices that are not
> safety critical.  Reason: diagnostics interfere with operation of
> <whatever> they happen to be testing.  Even if the test is at low(est)
> priority and is interruptible by any other activity, it still might
> cause an unacceptable delay in a real time situation.  To ensure 100%
> functionality at all times effectively requires use of redundant
> hardware - which generally is too expensive for a non safety critical
> device.
> 

That brings up one of the critical points about any kind of runtime 
diagnostics - what do you do if there is a failure?  Until you can 
answer that question, any effort on diagnostics is not just pointless, 
but worse than useless because you are adding more stuff that could go 
wrong.

I think bad or useless diagnostics are a more common problem than 
missing diagnostics.  People feel pressured into having them when they 
can't measure anything useful and you can't do anything sensible with 
the results.

I have seen first-hand how the insistence of having all sorts of 
diagnostics added to a product so that it could be "safety" certified 
actually result in a less reliable and less safe product.  The only 
"safety" they provided was legal safety so that people could claim it 
wasn't their fault if it failed, because they had added all the 
self-tests required by the so-called safety experts.