Article <vfarku$21lks$1@dont-email.me>

Deutsch English Français Italiano
<vfarku$21lks$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!2.eu.feeder.erje.net!3.eu.feeder.erje.net!feeder.erje.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Don Y <blockedofcourse@foo.invalid>
Newsgroups: comp.arch.embedded
Subject: Re: Diagnostics
Date: Wed, 23 Oct 2024 05:53:44 -0700
Organization: A noiseless patient Spider
Lines: 41
Message-ID: <vfarku$21lks$1@dont-email.me>
References: <veekcp$9rsj$1@dont-email.me> <veuggc$1l5eo$1@paganini.bofh.team>
 <77k5hjprfq0ipjp6pcdd03lnph1i76ssuu@4ax.com> <veunj9$3gbqs$2@dont-email.me>
 <gsv7hjtgnsm1edtkvafnr3jqqtjh47ck34@4ax.com> <vf18ic$1jk1$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 23 Oct 2024 14:53:50 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="606f6bb7ba3c6786ad892a84df8b9f34";
	logging-data="2152092"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1+FXRx8gudsZVn/ZOPMvGo6"
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:102.0) Gecko/20100101
 Thunderbird/102.2.2
Cancel-Lock: sha1:SSTX/yPwotix7GFInryjjAvpqqE=
In-Reply-To: <vf18ic$1jk1$1@dont-email.me>
Content-Language: en-US
Bytes: 3256

On 10/19/2024 2:32 PM, Don Y wrote:
>> The point here is that there is no "one fits all" philosophy you can
>> follow ... what is proper to do depends on what the (sub)system does,
>> its criticality, and on the components involved that may need to be
>> tested.
> 
> I am, rather, looking for ideas as to how (others) may have approached
> it.  Most of the research I've uncovered deals with servers and their
> ilk.  Or, historical information (e.g., MULTICS' "computing as a service"
> philosophy).  E.g., *scheduling* testing vs. opportunistic testing.

"Opportunistic" seems to work well -- *if* you declare the resources
you will need and wait until you can acquire them.

The downside is that you may NEVER be able to acquire them,
based on what processes are active on a node.  You wouldn't want
the diagnostic task to have to KNOW those things!

As different tests may require different resources, this
becomes problematic; do you request the largest set?  A
smaller set?  Or, design a mechanism to allow for arbitrarily
complex combinations to be specified <frown>

This became apparent when running the DRAM test using the
DRAM emulator (non-production board designed to validate the
DRAM test by allowing arbitrary fault injection, on demand).
While it was known that *some* tests could NOT be run out of
DRAM (which limits their efficacy in a running system), there
were other system resources that were "silently" called upon
that would have impacted other coexecuting tasks.  <frown>

The good news (wrt DRAM testing) is that checking for "stuck at"
faults -- the most prevalent described in published research -- makes
no special needs for resources, beyond access to DRAM!

Moral of story:  CAREFULLY enumerate (and declare) ALL such
resources.  And, consider how realistic it is to expect
ALL of them to be available serendipitously in a given node.

Else, resort to *scheduling* the diagnostic ("maintenance period")