Path: Xl.tags.giganews.com!local-2.nntp.ord.giganews.com!news.giganews.com.POSTED!not-for-mail NNTP-Posting-Date: Wed, 17 Apr 2024 15:47:53 +0000 From: Joe Gwinn Newsgroups: sci.electronics.design Subject: Re: Predictive failures Date: Wed, 17 Apr 2024 11:47:53 -0400 Message-ID: <0drv1jht1ruo1k8n3p52l45iuj9b5m7i76@4ax.com> References: <0s1r1jhb5vfe7lvopuvfk4ndkbt54ud3d9@4ax.com> <5c4t1jt3er7a51macq5mnl8gfsuaipuv2b@4ax.com> User-Agent: ForteAgent/8.00.32.1272 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Lines: 187 X-Usenet-Provider: http://www.giganews.com X-Trace: sv3-kirXy8iPbAJiopQ/YUDftm8L1iB3uF8fYfRZyv5lDeNDLyf8hfjtDG1iH+Req219mR7v4XXpjcu8TQ2!NdZ3fQhnP1201qXUFQOu2O+lrRe+s4XNZCBsSCPZQL6l1gRAoH3Dh8Ttn60ftls0Qj2gMa0= X-Complaints-To: abuse@giganews.com X-DMCA-Notifications: http://www.giganews.com/info/dmca.html X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your complaint properly X-Postfilter: 1.3.40 Bytes: 8875 On Tue, 16 Apr 2024 17:48:19 -0700, John Larkin wrote: >On Tue, 16 Apr 2024 13:20:34 -0400, Joe Gwinn >wrote: > >>On Tue, 16 Apr 2024 08:16:04 -0700, John Larkin >> wrote: >> >>>On Tue, 16 Apr 2024 10:19:00 -0400, Joe Gwinn >>>wrote: >>> >>>>On Mon, 15 Apr 2024 16:26:35 -0700, john larkin wrote: >>>> >>>>>On Mon, 15 Apr 2024 18:03:23 -0400, Joe Gwinn >>>>>wrote: >>>>> >>>>>>On Mon, 15 Apr 2024 13:05:40 -0700, john larkin wrote: >>>>>> >>>>>>>On Mon, 15 Apr 2024 15:41:57 -0400, Joe Gwinn >>>>>>>wrote: >>>>>>> >>>>>>>>On Mon, 15 Apr 2024 10:13:02 -0700, Don Y >>>>>>>> wrote: >>>>>>>> >>>>>>>>>Is there a general rule of thumb for signalling the likelihood of >>>>>>>>>an "imminent" (for some value of "imminent") hardware failure? >>>>>>>>> >>>>>>>>>I suspect most would involve *relative* changes that would be >>>>>>>>>suggestive of changing conditions in the components (and not >>>>>>>>>directly related to environmental influences). >>>>>>>>> >>>>>>>>>So, perhaps, a good strategy is to just "watch" everything and >>>>>>>>>notice the sorts of changes you "typically" encounter in the hope >>>>>>>>>that something of greater magnitude would be a harbinger... >>>>>>>> >>>>>>>>There is a standard approach that may work: Measure the level and >>>>>>>>trend of very low frequency (around a tenth of a Hertz) flicker noise. >>>>>>>>When connections (perhaps within a package) start to fail, the flicker >>>>>>>>level rises. The actual frequency monitored isn't all that critical. >>>>>>>> >>>>>>>>Joe Gwinn >>>>>>> >>>>>>>Do connections "start to fail" ? >>>>>> >>>>>>Yes, they do, in things like vias. I went through a big drama where a >>>>>>critical bit of radar logic circuitry would slowly go nuts. >>>>>> >>>>>>It turned out that the copper plating on the walls of the vias was >>>>>>suffering from low-cycle fatigue during temperature cycling and slowly >>>>>>breaking, one little crack at a time, until it went open. If you >>>>>>measured the resistance to parts per million (6.5 digit DMM), sampling >>>>>>at 1 Hz, you could see the 1/f noise at 0.1 Hz rising. It's useful to >>>>>>also measure a copper line, and divide the via-chain resistance by the >>>>>>no-via resistance, to correct for temperature changes. >>>>> >>>>>But nobody is going to monitor every via on a PCB, even if it were >>>>>possible. >>>> >>>>It was not possible to test the vias on the failing logic board, but >>>>we knew from metallurgical cut, polish, and inspect studies of failed >>>>boards that it was the vias that were failing. >>>> >>>> >>>>>One could instrument a PCB fab test board, I guess. But DC tests would >>>>>be fine. >>>> >>>>What was being tested was a fab test board that had both the series >>>>via chain path and the no-via path of roughly the same DC resistance, >>>>set up so we could do 4-wire Kelvin resistance measurements of each >>>>path independent of the other path. >>> >>> >>>Yes, but the question was whether one could predict the failure of an >>>operating electronic gadget. The answer is mostly NO. >> >>Agree. >> >> >>>We had a visit from the quality team from a giant company that you >>>have heard of. They wanted us to trend analyze all the power supplies >>>on our boards and apply a complex algotithm to predict failures. It >>>was total nonsense, basically predicting the future by zooming in on >>>random noise with a big 1/f component, just like climate prediction. >> >>Hmm. My first instinct was that they were using MIL-HNBK-317 (?) or >>the like, but that does not measure noise. Do you recall any more of >>what they were doing? I might know what they were up to. The >>military were big on prognostics for a while, and still talk of this, >>but it never worked all that well in the field compared to what it was >>supposed to improve on. >> >> >>>>>We have one board with over 4000 vias, but they are mostly in >>>>>parallel. >>>> >>>>This can also be tested , but using a 6.5-digit DMM intended for >>>>measuring very low resistance values. A change of one part in 4,000 >>>>is huge to a 6.5-digit instrument. The conductivity will decline >>>>linearly as vias fail one by one. >>>> >>>> >>> >>>Millikelvin temperature changes would make more signal than a failing >>>via. >> >>Not at the currents in that logic card. Too much ambient thermal >>noise. >> >> >>>>>>The solution was to redesign the vias, mainly to increase the critical >>>>>>volume of copper. And modern SMD designs have less and less copper >>>>>>volume. >>>>>> >>>>>>I bet precision resistors can also be measured this way. >>>>>> >>>>>> >>>>>>>I don't think I've ever owned a piece of electronic equipment that >>>>>>>warned me of an impending failure. >>>>>> >>>>>>Onset of smoke emission is a common sign. >>>>>> >>>>>> >>>>>>>Cars do, for some failure modes, like low oil level. >>>>>> >>>>>>The industrial method for big stuff is accelerometers attached near >>>>>>the bearings, and listen for excessive rotation-correlated (not >>>>>>necessarily harmonic) noise. >>>>> >>>>>Big ships that I've worked on have a long propeller shaft in the shaft >>>>>alley, a long tunnel where nobody often goes. They have magnetic shaft >>>>>runout sensors and shaft bearing temperature monitors. >>>>> >>>>>They measure shaft torque and SHP too, from the shaft twist. >>>> >>>>Yep. And these kinds of things fail slowly. At first. >>> >>>They could repair a bearing at sea, given a heads-up about violent >>>failure. A serious bearing failure on a single-screw machine means >>>getting a seagoing tug. >>> >>>The main engine gearbox had padlocks on the covers. >>> >>>There was also a chem lab to analyze oil and water and such, looking >>>for contaminamts that might suggest something going on. >>> >>> >>>> >>>> >>>>>I liked hiding out in the shaft alley. It was private and cool, that >>>>>giant shaft slowly rotating. >>>> >>>>Probably had a calming flowing water sound as well. >>> >>>Yes, cool and beautiful and serene after the heat and noise and >>>vibration of the engine room. A quiet 32,000 horsepower. >>> >>>It was fun being an electronic guru on sea trials of a ship full of >>>big hairy Popeye types. I, skinny gawky kid, got my own stateroom when >>>other tech reps slept in cots in the hold. >>> >>>Have you noticed how many lumberjack types are afraid of electricity? >>>That can be funny. >> >>Oh yes. And EEs frightened by a 9-v battery. >> >>Joe Gwinn > >I had an intern, an EE senior, who was afraid of 3.3 volts. > >I told him to touch an FPGA to see how warm it was getting, and he >refused. Yeah. Not quite as dramatic, but in the last year I have been involved in some full-scale vibration tests, where a relay rack packed full of equipment is shaken and resulting phase noise is measured. People are ========== REMAINDER OF ARTICLE TRUNCATED ==========