Article <vf18ic$1jk1$1@dont-email.me>

Deutsch English Français Italiano
<vf18ic$1jk1$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Don Y <blockedofcourse@foo.invalid>
Newsgroups: comp.arch.embedded
Subject: Re: Diagnostics
Date: Sat, 19 Oct 2024 14:32:48 -0700
Organization: A noiseless patient Spider
Lines: 130
Message-ID: <vf18ic$1jk1$1@dont-email.me>
References: <veekcp$9rsj$1@dont-email.me> <veuggc$1l5eo$1@paganini.bofh.team>
 <77k5hjprfq0ipjp6pcdd03lnph1i76ssuu@4ax.com> <veunj9$3gbqs$2@dont-email.me>
 <gsv7hjtgnsm1edtkvafnr3jqqtjh47ck34@4ax.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 19 Oct 2024 23:33:01 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="89048b1778d1f63268abb85022497358";
	logging-data="52865"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1+owvMMj8niSld21Ssfk33r"
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:102.0) Gecko/20100101
 Thunderbird/102.2.2
Cancel-Lock: sha1:+P8Nu2wBUeyt7XZjQmywo+UPHRk=
In-Reply-To: <gsv7hjtgnsm1edtkvafnr3jqqtjh47ck34@4ax.com>
Content-Language: en-US
Bytes: 7601

On 10/19/2024 12:25 PM, George Neuner wrote:
> Same ol', same ol'.  Nothing much new to report.

No news is good news!

>> On 10/18/2024 2:42 PM, George Neuner wrote:
>> But, if you *know* when certain aspects of a device will be "called on",
>> you can take advantage of that to schedule diagnostics when the device is
>> not "needed".  And, in the event that some unexpected "need" arises,
>> can terminate or suspend the testing (possibly rendering the effort
>> moot if it hasn't yet run to a conclusion).
> 
> If you "know" a priori when some component will be needed, then you
> can do whatever you want when it is not.  The problem is that many
> uses can't be easily anticipated.

Granted, I can't know when a user *might* want to do some
asynchronous task.  But, the whole point of my system is to
watch and anticipate needs based on observed behaviors.

E.g., if the house is unoccupied, then its not likely that
anyone will want to watch TV -- unless they have *scheduled*
a recording of a broadcast (in which case, I would know it).

If the occupants are asleep, then its not likely they will be
going out for a drive.

> Which circles back to testing priority: if the test is interruptible
> and/or resumeable, then it may be done whenever the component is
> available ... as long as it won't tie up the component if and when it
> becomes needed for something else.

Exactly.  I already have to deal with that in my decisions to
power down nodes.  If my actions are incorrect, then it introduces
a delay in getting "back" to whatever state I should have been in.

>> E.g., I scrub freed memory pages (zero fill) so information doesn't
>> leak across protection domains.  As long as some minimum number
>> of *scrubbed* pages are available for use "on demand", why can't
>> I *test* the pages yet to be scrubbed?
> 
> If you're testing memory pages, most likely you are tying up bandwidth
> in the memory system and slowing progress of the real applications.

But, they wouldn't be scrubbed if there were higher "priority"
tasks demanding resources.  I.e., some other "lower priority"
task would have been accessing memory.

> Also because you can't accurately judge the "minimum" needed.  BSD and
> Linux both have this problem where a sudden burst of allocations
> exhausts the pool of zeroed pages, forcing demand zeroing of new pages
> prior to their re-assignment.  Slows the system to a crawl when it
> happens.

Yes, but you have live users arbitrarily deciding they "need" those
resources.  And, have considerably more pages at risk for use.
I've only got ~1G per node and (theoretically), a usage model of
what resources are needed, when (where).

*Not* clearing the pages leaves a side channel open for information
leakage so *that* isn't negotiable.  Having some "deliberately
dirty" could be an issue but, even "dirty", they are wiped of
their previous contents after a single pass through the test.

>> If there is no anticipated short term need for irrigation, why
>> can't I momentarily activate individual valves and watch to see that
>> the expected amount of water is flowing?
> 
> Because then you are watering (however briefly) when it is not
> expected.  What if there was a pesticide application that should not
> be wetted?  What if a person is there and gets sprayed by your test?

Irrigation, here, is not airborne.  The ground may be wetted in the
*immediate* vicinity of the emitters activated.  But, they operate at
very low flow rates (liters per HOUR).

Your goal is to verify the master valve(s) operate (I do that by opening
the purge valve(s) and letting water drain into a sump); the individual
valves are operable; and that water *flows* when commanded.

> Properly, valve testing should be done concurrently with a scheduled
> watering.  Check water is flowing when the valve should be open, and
> not flowing when the valve should be closed.

That happens as part of normal operation.  But, NOT knowing until that
time can lead to plant death.  E.g., if the roses don't get watered twice
a day, they are toast (in this environment).  If the cacti valves don't
*close*, they are toast.  If a line is "failed open", then you've
a geyser in the yard (and *no* irrigation to those plants)

Repairs of this nature can be time consuming, depending on the nature
of the failure (and cost thousands of dollars in labor).  The more I
can deduce about the nature of the failure, the quicker the service
can be brought back up to par and the less the "diagnostic cost"
of having someone do so, manually (digging up a yard to determine where
a line has been punctured; inspecting individual emitters to determine
which are blocked; visually monitoring for water flow per zone; etc.)

[Amazing how much these "minimum wage jobs" actually end up costing
when you have to hire someone!  E.g., $160/month to have your "yard
cleaned" -- *if* you can find someone to do it at that rate!  Irrigation
work starts at kilobucks and is relatively open-ended (as no one can
assess the nature of the job until they start on it)]

>>>   To ensure 100%
>>> functionality at all times effectively requires use of redundant
>>> hardware - which generally is too expensive for a non safety critical
>>> device.
>>
>> Apparently, there is noise about incorporating such hardware into
>> *automotive* designs (!).  I would have thought the time between
>> POSTs would have rendered that largely ineffective.  OTOH, if
>> you imagine a failure can occur ANY time, then "just after
>> putting the car in gear" is as good (bad!) a time as any!
> 
> Automotive is going the way of aircraft: standby running lockstep with
> the primary and monitoring its data flow - able to reset the system if
> they disagree, or take over if the primary fails.
> 
> The point here is that there is no "one fits all" philosophy you can
> follow ... what is proper to do depends on what the (sub)system does,
> its criticality, and on the components involved that may need to be
> tested.

I am, rather, looking for ideas as to how (others) may have approached
it.  Most of the research I've uncovered deals with servers and their
ilk.  Or, historical information (e.g., MULTICS' "computing as a service"
philosophy).  E.g., *scheduling* testing vs. opportunistic testing.