Article <2024Dec26.104621@mips.complang.tuwien.ac.at>

Deutsch English Français Italiano
<2024Dec26.104621@mips.complang.tuwien.ac.at>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Dealing with mispredictions (was: Microarchitectural support ...)
Date: Thu, 26 Dec 2024 09:46:21 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 66
Message-ID: <2024Dec26.104621@mips.complang.tuwien.ac.at>
References: <2024Oct3.160055@mips.complang.tuwien.ac.at> <vdmrk6$3rksr$1@dont-email.me> <LyELO.69485$2nv5.62232@fx39.iad> <TdWLO.282116$FzW1.158190@fx14.iad> <963a276fd8d43e4212477cefae7f6e46@www.novabbs.org> <8IcMO.249144$v8v2.147178@fx18.iad> <37dc69bc0327e6d56e452090424c80c9@www.novabbs.org>
Injection-Date: Thu, 26 Dec 2024 11:09:01 +0100 (CET)
Injection-Info: dont-email.me; posting-host="fe94df56a285672d25a16cc71e6c80f2";
	logging-data="3046381"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1/CLceiotDknCETK36/9+07"
Cancel-Lock: sha1:S5eIdqD3bWFy9QLSv/uf6WZYaq0=
X-newsreader: xrn 10.11
Bytes: 3904

mitchalsup@aol.com (MitchAlsup1) writes:
>Sooner or later, the pipeline designer needs to recognize the of
>occuring
>code sequence pictured as::
>
>     INST
>     INST
>     BC-------\
>     INST     |
>     INST     |
>     INST     |
>/----BR       |
>|    INST<----/
>|    INST
>|    INST
>\--->INST
>     INST
>
>So that the branch predictor predicts as usual, but DECODER recognizes
>the join point of this prediction, so if the prediction is wrong, one
>only nullifies the mispredicted instructions and then inserts the
>alternate instructions while holding the join point instructions until
>the alternate instruction complete.

Would this really save much?  The main penalty here would still be
fetching and decoding the alternate instructions.  Sure, the
instructions after the join point would not have to be fetched and
decoded, but they would still have to go through the renamer, which
typically is as narrow or narrower than instruction fetch and decode,
so avoiding fetch and decode only helps for power (ok, that's
something), but probably not performance.

And the kind of insertion you imagine makes things more complicated,
and only helps in the rare case of a misprediction.

What alternatives do we have?  There still are some branches that are
hard to predict and for which it would be helpful to optimize them.

Classically the programmer or compiler was supposed to turn
hard-to-predict branches into conditional execution (e.g., someone
(IIRC ARM) has an ITE instruction for that, and My 6600 has something
similar IIRC).  These kinds of instructions tend to turn the condition
from a control-flow dependency (free when predicted, costly when
mispredicted) into a data-flow dependency (usually some cost, but
usually much lower than a misprediction).

But programmers are not that great on predicting mispredictions (and
programming languages usually don't have ways to express them),
compilers are worse (even with feedback-directed optimization as it
exists, i.e., without prediction accuracy feedback), and
predictability might change between phases or callers.

So it seems to me that this is something that the hardware might use
history data to predict whether a branch is hard to predict (and maybe
also taking into account how the dependencies affect the cost), and to
switch between a branch-predicting implementation and a data-flow
implementation of the condition.

I have not followed ISCA and Micro proceedings in recent years, but I
would not be surprised if somebody has already done a paper on such an
idea.

- anton
-- 
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
  Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>