Deutsch   English   Français   Italiano  
<uvln9b$trln$2@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Don Y <blockedofcourse@foo.invalid>
Newsgroups: sci.electronics.design
Subject: Re: Predictive failures
Date: Tue, 16 Apr 2024 04:26:28 -0700
Organization: A noiseless patient Spider
Lines: 90
Message-ID: <uvln9b$trln$2@dont-email.me>
References: <uvjn74$d54b$1@dont-email.me> <uvldrf$rpnh$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 16 Apr 2024 13:26:36 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="8f873457a009428ae193cacdeebfb978";
	logging-data="978615"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1+kTaxaa2CLK83/FnADKd46"
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:102.0) Gecko/20100101
 Thunderbird/102.2.2
Cancel-Lock: sha1:QVbd4tUGHH4rVGENZvfhArkhcxQ=
Content-Language: en-US
In-Reply-To: <uvldrf$rpnh$1@dont-email.me>
Bytes: 4896

On 4/16/2024 1:45 AM, Martin Brown wrote:
> On 15/04/2024 18:13, Don Y wrote:
>> Is there a general rule of thumb for signalling the likelihood of
>> an "imminent" (for some value of "imminent") hardware failure?
> 
> You have to be very careful that the additional complexity doesn't itself 
> introduce new annoying failure modes.

*Or*, decrease the reliability of the device, in general.

> My previous car had filament bulb failure 
> sensors (new one is LED) of which the one for the parking light had itself 
> failed - the parking light still worked. However, the car would great me with 
> "parking light failure" every time I started the engine and the main dealer 
> refused to cancel it.

My goal is to provide *advisories*.  You don't want to constrain the
user.

Smoke detectors that nag you with "replace battery" alerts are nags.
A car that refuses to start unless the seat belts are fastened is a nag.

You shouldn't require a third party to enable you to ignore an
advisory.  But, it's OK to require the user to acknowledge that
advisory!

> Repair of parking light sensor failure required swapping out the *entire* front 
> light assembly since it was built in one time hot glue. That would be a very 
> expensive "repair" for a trivial fault.
> 
> The parking light is not even a required feature.
> 
>> I suspect most would involve *relative* changes that would be
>> suggestive of changing conditions in the components (and not
>> directly related to environmental influences).
>>
>> So, perhaps, a good strategy is to just "watch" everything and
>> notice the sorts of changes you "typically" encounter in the hope
>> that something of greater magnitude would be a harbinger...
> 
> Monitoring temperature, voltage supply and current consumption isn't a bad 
> idea. If they get unexpectedly out of line something is wrong.

Extremes are easy to detect -- but often indicate failures.
E.g., a short, an open.

The problem is sorting out what magnitude changes are significant
and which are normal variation.

I think being able to track history gives you a leg up in that
it gives you a better idea of what MIGHT be normal instead of
just looking at an instant in time.

> Likewise with 
> power on self tests you can catch some latent failures before they actually 
> affect normal operation.

POST is seldom executed as devices tend to run 24/7/365.
So, I have to design runtime BIST support that can, hopefully,
coax this information from a *running* system without interfering
with that operation.

This puts constraints on how you operate the hardware
(unless you want to add lots of EXTRA hardware to
extract these observations.

E.g., if you can control N loads, then individually (sequentially)
activating them and noticing the delta power consumption reveals
more than just enabling ALL that need to be enabled and only seeing
the aggregate of those loads.

This can also simplify gross failure detection if part of the
normal control strategy.

E.g., I designed a medical instrument many years ago that had an
external "sensor array".  As that could be unplugged at any time,
I had to continually monitor for it's disconnection.  At the same
time, individual sensors in the array could be "spoiled" by
spilled reagents.  Yet, the other sensors shouldn't be compromised
or voided just because of the failure of certain ones.

Recognizing that this sort of thing COULD happen in normal use
was the biggest part of the design; the hardware and software
to actually handle these exceptions was then straightforward.

Note that some failures may not be possible to recover from
without adding significant cost (and other failure modes).
So, it's a value decision as to what you support and what
you "tolerate".