Article <jwvpln0qpel.fsf-monnier+comp.arch@gnu.org>

Deutsch English FranУЇais Italiano
<jwvpln0qpel.fsf-monnier+comp.arch@gnu.org>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!eternal-september.org!feeder2.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Stefan Monnier <monnier@iro.umontreal.ca>
Newsgroups: comp.arch
Subject: Re: Reverse engineering of Intel branch predictors
Date: Tue, 12 Nov 2024 14:00:02 -0500
Organization: A noiseless patient Spider
Lines: 29
Message-ID: <jwvpln0qpel.fsf-monnier+comp.arch@gnu.org>
References: <vfbfn0$256vo$1@dont-email.me>
	<c517f562a19a0db2f3d945a1c56ee2e6@www.novabbs.org>
	<jwv1q002k2s.fsf-monnier+comp.arch@gnu.org>
	<a3d81b5c64ce058ad21f42a8081162cd@www.novabbs.org>
	<jwvcyj1sefl.fsf-monnier+comp.arch@gnu.org>
	<abef7481ff0dd5d832cef0b9d3ea087a@www.novabbs.org>
	<jwv1pzhsahr.fsf-monnier+comp.arch@gnu.org>
	<8928500a87002966d6282465c037003e@www.novabbs.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 12 Nov 2024 20:00:05 +0100 (CET)
Injection-Info: dont-email.me; posting-host="a5e16a477ad67f5d6a5320d5393574c2";
	logging-data="1835924"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1/miaGRFQSRbuDHBWO5NYVXt/af/KV9u/s="
User-Agent: Gnus/5.13 (Gnus v5.13)
Cancel-Lock: sha1:OZKWmhPHoFuYYImKIIUTpw+KH8Y=
	sha1:xUlne0YCHycwxtSOF/VrsMUglBc=
Bytes: 2656

>>>> Hmm... but in order not to have bubbles, your prediction structure still
>>>> needs to give you a predicted target address (rather than a predicted
>>>> index number), right?
>>> Yes, but you use the predicted index number to find the predicted
>>> target IP.
>> Hmm... but that would require fetching that info from memory.
>> Can you do that without introducing bubbles?
>
> In many/most (dynamic) cases, they have already been fetched and all
> that is needed is muxing the indexed field out of Instruction Buffer.

I guess for small jump table that would work well, indeed, but for
something like a bytecode interpreter, even if you can compact it to
have only 16bit per entry, that still spans 512B.  Is your IB large
enough for that?

>> If you're lucky it's in the L1 Icache, but that still takes a couple
>> cycles to get, doesn't it?
> My 1-wide machine fetches 4-words per cycle.
> My 6-wide machine fetches 3 Н-cache-lines per cycle.

Even with a 256B cache line width, it would take 2 cycles to get a 512B
jump table into your IB, after which you still have to select (and
compute, if the table is compacted) the corresponding target address,
and only after that can you start fetching (which itself will suffer
the L1 latency), so we're up to a 5-6 cycle bubble, no?


        Stefan