Deutsch English Français Italiano |
<vevbss$3mr5m$2@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!news.mixmin.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Don Y <blockedofcourse@foo.invalid> Newsgroups: comp.arch.embedded Subject: Re: Diagnostics Date: Fri, 18 Oct 2024 21:17:21 -0700 Organization: A noiseless patient Spider Lines: 105 Message-ID: <vevbss$3mr5m$2@dont-email.me> References: <veekcp$9rsj$1@dont-email.me> <veuggc$1l5eo$1@paganini.bofh.team> <77k5hjprfq0ipjp6pcdd03lnph1i76ssuu@4ax.com> <veunj9$3gbqs$2@dont-email.me> <vev398$1r4v5$2@paganini.bofh.team> <vev635$3mf56$1@dont-email.me> <vevag2$1ricg$1@paganini.bofh.team> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Date: Sat, 19 Oct 2024 06:17:33 +0200 (CEST) Injection-Info: dont-email.me; posting-host="89048b1778d1f63268abb85022497358"; logging-data="3894454"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18YbfBe4X3PkxPeLIeHGVFN" User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.2.2 Cancel-Lock: sha1:n9OaHixBFeGrzGt13Pj+sWWSsuc= Content-Language: en-US In-Reply-To: <vevag2$1ricg$1@paganini.bofh.team> Bytes: 6075 On 10/18/2024 8:53 PM, Waldek Hebisch wrote: >> One of the FETs that controls the shifting of the automatic >> transmission as failed open. How do you detect that /and recover >> from it/? > > Detecting such thing looks easy. Recovery is tricky, because > if you have spare FET and activate it there is good chance that > it will fail due to the same reason that the first FET failed. > OTOH, if you have propely designed circuit around the FET, > disturbance strong enough to kill the FET is likely to kill > the controller too. The immediate goal is to *detect* that a problem exists. If you can't detect, then attempting to recover is a moot point. >> The camera/LIDAR that the self-drive feature uses is providing >> incorrect data... etc. > > Use 3 (or more) and voting. Of course, this increases cost and one > have to judge if increase of cost is worth increase in safety As well as the reliability of the additional "voting logic". If not a set of binary signals, determining what the *correct* signal may be can be problematic. > (in self-driving car using multiple sensors looks like no-brainer, > but if this is just an assist to increase driver comfort then > result may be different). It is different only in the sense of liability and exposure to loss. I am not assigning values to those consequences but, rather, looking to address the issue of run-time testing, in general. Even if NONE of the failures can result in injury or loss, it is unlikely that a user WANTS to have a defective product. If the user is technically unable to determine when the product is "at fault" (vs. his own misunderstanding of how it is *supposed* to work), then those failures contribute to the users' frustrations with the product. >> There are innumerable failures that can occur to compromise >> the "system" and no *easy*/inexpensive/reliable way to detect >> and recover from *all* of them. > > Sure. But for common failures or serious failures having non-negligible > pobability redundancy may offer cheap way to increase reliability. > >>> For critical functions a car could have 3 processors with >>> voting circuitry. With separate chips this would be more expensive >>> than single processor, but increase of cost probably would be >>> negligible compared to cost of the whole car. And when integrated >>> on a single chip cost difference would be tiny. >>> >>> IIUC car controller may "reboot" during a ride. Intead of >>> rebooting it could handle work to a backup controller. >> >> How do you know the circuitry (and other mechanisms) that >> implement this hand-over are operational? > > It does not matter if handover _always_ works. What matter is > if system with handover has lower chance of failure than > system without handover. Having statistics of actual failures > (which I do not have but manufacturers should have) and > after some testing one can estimate failure probablity of > different designs and possibly decide to use handover. Again, I am not interested in "recovery" as that varies with the application and risk assessment. What I want to concentrate on is reliably *detecting* faults before they lead to product failures. I contend that the hardware in many devices has that capability (to some extent) but that it is underutilized; that the issue of detecting faults *after* POST is one that doesn't see much attention. The likely thinking being that POST will flag it the next time the device is restarted. And, that's not acceptable in long-running devices. >> It is VERY difficult to design reliable systems. I am not >> attempting that. Rather, I am trying to address the fact that >> the reassurances POST (and, at the user's perogative, BIST) >> are not guaranteed when a device runs "for long periods of time". > > You may have tests essentially as part of normal operation. I suspect most folks have designed devices with UARTs. And, having written a driver for it, have noted that framing, parity and overrun errors are possible. Ask yourself how many of those systems ever *use* that information! Is there even a means of propagating it up out of the driver? > Of course, if you have single-tasked design with a task which > must be "always" ready to respond, then running test becomes > more complicated. But in most designs you can spare enough > time slots to run tests during normal operation. Tests may > interfere with normal operation, but here we are in domain > specific teritory: sometimes result of operation give enough > assurance that device is operating correctly. And if testing > for correct operation is impossible, then there is nothing to > do, I certainly do not promise to deliver impossible.