Article <2025Mar11.190951@mips.complang.tuwien.ac.at>

Deutsch English Français Italiano
<2025Mar11.190951@mips.complang.tuwien.ac.at>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: An execution time puzzle
Date: Tue, 11 Mar 2025 18:09:51 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 44
Message-ID: <2025Mar11.190951@mips.complang.tuwien.ac.at>
References: <2025Mar10.083318@mips.complang.tuwien.ac.at> <2025Mar10.095420@mips.complang.tuwien.ac.at> <2025Mar10.181427@mips.complang.tuwien.ac.at> <2025Mar11.091817@mips.complang.tuwien.ac.at> <20250311132513.00003f2f@yahoo.com>
Injection-Date: Tue, 11 Mar 2025 19:17:17 +0100 (CET)
Injection-Info: dont-email.me; posting-host="c8a9ea49f90bdbf4e8fcda9a2dd6bc1f";
	logging-data="2216077"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1/ffN1cnY+V55Sm/mV8hXQm"
Cancel-Lock: sha1:fR1286S8K+6fXnOyR8laZ9usjYw=
X-newsreader: xrn 10.11
Bytes: 3167

Michael S <already5chosen@yahoo.com> writes:
>> Another open issue is that the gcc-12 build of gforth-fast (using r13
>> instead of r14) is 3 cycles slower than the gcc-10 build.  I don't see
>> an extension of my BTB theory that would explain this.  So either my
>> BTB theory is wrong or there is another effect at work.
>> 
>
>I tried to understand Indirect Target Predictor paragraph in Opt.
>Manual, but failed.
>Here is the text of this short paragraph for those who don't like too
>look for things themselves, but have better chance than me
>to understand what is going on (i.e. primarily for Mitch Alsup)

Thanks.

>2.8.1.4
>Indirect Target Predictor
>The processor implements a 1024-entry indirect target array used to
>predict the target of some non-RET indirect branches. If a branch has
>had multiple different targets, the indirect target predictor chooses
>among them using global history at L2 BTB correction latency.
>Branches that have so far always had the same target are predicted
>using the static target from the branch's BTB entry. This means the
>prediction latency for correctly predicted indirect branches is
>roughly 5-(3/N), where N is the number of different targets of the
>indirect branch. For these reasons, code should attempt to reduce the
>number of different targets per indirect branch.

In the case of this microbenchmark, every indirect branch has only one
target, and the fact that we see cases where this loop with two
indirect branches is executed in 2 cycles indicates that such indirect
branches can be performed in one cycle; that's probably the part about
the "static target".

What is written looks pretty clear to me; maybe when you have read the
indirect-branch sections of several chipsandcheese articles, this all
looks normal to you (although the formula looks curious to me).  If
you have any questions, I can give you my interpretation of what is
written here.

- anton
-- 
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
  Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>