Article <lcsvquFjr4aU1@mid.individual.net>

Warning: mysqli::__construct(): (HY000/1203): User howardkn already has more than 'max_user_connections' active connections in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\includes\artfuncs.php on line 21
Failed to connect to MySQL: (1203) User howardkn already has more than 'max_user_connections' active connections
Warning: mysqli::query(): Couldn't fetch mysqli in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\index.php on line 66
Article <lcsvquFjr4aU1@mid.individual.net>

Deutsch English Français Italiano

<lcsvquFjr4aU1@mid.individual.net>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!weretis.net!feeder8.news.weretis.net!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: Niklas Holsti <niklas.holsti@tidorum.invalid>
Newsgroups: comp.arch
Subject: Re: WCET analysis
Date: Wed, 12 Jun 2024 10:08:14 +0300
Organization: Tidorum Ltd
Lines: 83
Message-ID: <lcsvquFjr4aU1@mid.individual.net>
References: <jai66jd4ih4ejmek0abnl4gvg5td4obsqg@4ax.com>
 <h0ib6j576v8o37qu1ojrsmeb5o88f29upe@4ax.com>
 <2024Jun9.185245@mips.complang.tuwien.ac.at>
 <38ob6jl9sl3ceb0qugaf26cbv8lk7hmdil@4ax.com>
 <2024Jun10.091648@mips.complang.tuwien.ac.at>
 <o32f6jlq2qpi9s1u8giq521vv40uqrkiod@4ax.com>
 <lcq749F7sotU1@mid.individual.net>
 <jwvsexjfasn.fsf-monnier+comp.arch@gnu.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: individual.net 14KWfXVIuJzNckM+8jg9MwzYoGj3x8xYOLWOVdmFKaWLrSr5nn
Cancel-Lock: sha1:VmcDjaBLoZzjptDiP5zPMaBY0eI= sha256:1bidcaiYwgwPKSt7qtnUq5TikN/WR2anUCy2POVJnrM=
User-Agent: Mozilla Thunderbird
Content-Language: en-US
In-Reply-To: <jwvsexjfasn.fsf-monnier+comp.arch@gnu.org>
Bytes: 6151

On 2024-06-12 0:14, Stefan Monnier wrote:
>> Not always. If the mistakenly speculated cache-fetch /evicted/ some other
>> data from the (finite-sized) cache, and the evicted data are needed later on
>> the /true/ execution path, the mistakenly speculated fetch has a /negative/
>> effect on performance. (This kind of "timing anomaly" is very bothersome in
>> static WCET analysis.)
> 
> BTW, I heard that nowadays "static WCET analysis" is typically based on
> statistical models of the performance of various parts of the code.
> So they don't give an ironclad worst case any more (contrary to what
> you'd get back in the day where you could count cycles and there were no
> caches), but rather one that is "very unlikely" to be wrong.
> 
> I remember papers doing "real WCET" by analyzing the code to detect
> those memory accesses which were guaranteed to be cache hits.  Do you
> know if this is still done nowadays?

I haven't followed the WCET-analysis field closely for a few years, so I 
may not be up to date. But here is my understanding, with apologies for 
the long text:

Fully static, non-probabilistic WCET analysis is no longer possible, 
practical or profitable for most "big" processors because of their 
complex microarchitectures, with out-of-order execution and speculation 
being the most bothersome parts. The difficulty of getting accurate 
information about the microarchitectures is a large factor, too, as is 
the cost of creating a model of such a microarchitecture, compared to 
the small market for such WCET tools for a particular microarchitecture.

For in-order processors static analysis of cache hits and misses is very 
possible, and has a great deal of good theory behind it. It works quite 
well for I-caches, but for D-caches it is much harder and more 
pessimistic because of the difficulty of predicting the addresses of 
data accesses, which depends a lot on the nature of the application 
being analyzed. Cache analysis also becomes harder in the presence of 
preemption, multiple cores and multiple cache levels. However, any 
static WCET analysis tool with ambitions to give safe WCET bounds will 
certainly include static cache analysis, and can then give a safe upper 
bound on the WCET, but in some cases it may be quite pessimistic 
(over-estimated).

Some time ago, even before the complex microarchitectures became an 
acute problem, a class of "WCET" tools was developed that rely on a 
combination of execution-time measurements of the basic code blocks, 
static analysis of the control flow graphs and possible execution paths, 
and a statistical method called extreme-value statistics that aims to 
estimate the probability of outlier values (extreme and unlikely values) 
for the sum of the observed distributions of the execution times of the 
basic blocks. These methods do not need a microarchitecture model for 
the processor, but do need a way to measure basic-block execution times 
(which is doubtful in an OoO and speculative processor) and a "good 
enough" test suite. AIUI, these tools are successful and are much used. 
However, it is not clear that the probability they compute for exceeding 
their WCET "bound" is accurate, because the statistical theory depends 
on the distributions of the execution times of the individual basic code 
blocks being independent and uncorrelated, which is false for the 
ordinary sort of microarchitectures.

To overcome that problem, there has been work to create "randomized" HW 
components, for example randomized caches, where it is possible to make 
statistical models of each component and the statistics of different 
components are known, by design and construction, to be independent and 
uncorrelated. However, I don't know how far this work has progressed, 
and how completely a processor can be randomized in this sense. A couple 
of years ago I saw that one practical application of this work simply 
used ordinary hardware but proposed to pick a new random memory lay-out 
(link map) of the application every time the application was loaded and 
started, which seems a cop-out.

Recently a new type of "WCET" tool has appeared that does not claim to 
do more than "estimate" or "explore" the execution time of an 
application on a processor that is described only by its visible 
architecture (instruction set) and some capacity and performance 
numbers. I don't know exactly how these tools work, but it seems likely 
that they use a simplified, in-order, non-speculative model of the 
processor and can then apply the classical "static WCET" analysis 
methods. They don't claim to provide an accurate or even probabilistic 
WCET bound, but rather help the user understand which parts of the 
application are likely to use most of the time, and give some confidence 
that the application will be fast enough on the real processor. This may 
be what most developers want and are content with.