| Deutsch English Français Italiano |
|
<2025Mar11.190951@mips.complang.tuwien.ac.at> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail From: anton@mips.complang.tuwien.ac.at (Anton Ertl) Newsgroups: comp.arch Subject: Re: An execution time puzzle Date: Tue, 11 Mar 2025 18:09:51 GMT Organization: Institut fuer Computersprachen, Technische Universitaet Wien Lines: 44 Message-ID: <2025Mar11.190951@mips.complang.tuwien.ac.at> References: <2025Mar10.083318@mips.complang.tuwien.ac.at> <2025Mar10.095420@mips.complang.tuwien.ac.at> <2025Mar10.181427@mips.complang.tuwien.ac.at> <2025Mar11.091817@mips.complang.tuwien.ac.at> <20250311132513.00003f2f@yahoo.com> Injection-Date: Tue, 11 Mar 2025 19:17:17 +0100 (CET) Injection-Info: dont-email.me; posting-host="c8a9ea49f90bdbf4e8fcda9a2dd6bc1f"; logging-data="2216077"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/ffN1cnY+V55Sm/mV8hXQm" Cancel-Lock: sha1:fR1286S8K+6fXnOyR8laZ9usjYw= X-newsreader: xrn 10.11 Bytes: 3167 Michael S <already5chosen@yahoo.com> writes: >> Another open issue is that the gcc-12 build of gforth-fast (using r13 >> instead of r14) is 3 cycles slower than the gcc-10 build. I don't see >> an extension of my BTB theory that would explain this. So either my >> BTB theory is wrong or there is another effect at work. >> > >I tried to understand Indirect Target Predictor paragraph in Opt. >Manual, but failed. >Here is the text of this short paragraph for those who don't like too >look for things themselves, but have better chance than me >to understand what is going on (i.e. primarily for Mitch Alsup) Thanks. >2.8.1.4 >Indirect Target Predictor >The processor implements a 1024-entry indirect target array used to >predict the target of some non-RET indirect branches. If a branch has >had multiple different targets, the indirect target predictor chooses >among them using global history at L2 BTB correction latency. >Branches that have so far always had the same target are predicted >using the static target from the branch's BTB entry. This means the >prediction latency for correctly predicted indirect branches is >roughly 5-(3/N), where N is the number of different targets of the >indirect branch. For these reasons, code should attempt to reduce the >number of different targets per indirect branch. In the case of this microbenchmark, every indirect branch has only one target, and the fact that we see cases where this loop with two indirect branches is executed in 2 cycles indicates that such indirect branches can be performed in one cycle; that's probably the part about the "static target". What is written looks pretty clear to me; maybe when you have read the indirect-branch sections of several chipsandcheese articles, this all looks normal to you (although the formula looks curious to me). If you have any questions, I can give you my interpretation of what is written here. - anton -- 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.' Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>