Warning: mysqli::__construct(): (HY000/1203): User howardkn already has more than 'max_user_connections' active connections in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\includes\artfuncs.php on line 21
Failed to connect to MySQL: (1203) User howardkn already has more than 'max_user_connections' active connectionsPath: ...!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Retpoline cost
Date: Sun, 21 Mar 2021 16:00:39 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 46
Message-ID: <2021Mar21.170039@mips.complang.tuwien.ac.at>
References: <2021Mar20.232623@mips.complang.tuwien.ac.at>
Injection-Info: reader02.eternal-september.org; posting-host="fb6eec1f2ee117b2cc0bba2859b93fff";
logging-data="19472"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+NM1/QM18L2Ud3x3u3wiBk"
Cancel-Lock: sha1:rr9OAQiuI1jiCPhZae5afF1vFe0=
X-newsreader: xrn 10.00-beta-3
Bytes: 3043
anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
>The nice thing is that we can get our indirect branches replaced with
>retpolines with very little effort these days, by using the gcc
>options -mindirect-branch=thunk and -mfunction-return=thunk. There
>are different options instead of "thunk", but the effect is unclear
>from the documentation.
I tried the "thunk-inline" option instead:
../configure CC="gcc -mindirect-branch=thunk-inline -mfunction-return=thunk-inline"
It gives significantly faster results (times in seconds):
sieve bubble matrix fib fft
0.095 0.089 0.039 0.063 0.023 gforth-fast no retpolines Ryzen 3900x
0.230 0.210 0.081 0.370 0.175 gforth-fast thunk-inline Ryzen 3900x
0.769 0.674 0.649 0.939 0.423 gforth-fast --no-dynamic thunk-inline 3900x
0.780 0.663 0.647 0.923 0.416 gforth-fast thunk Ryzen 3900x
0.092 0.124 0.052 0.080 0.032 gforth-fast no retpolines Pentium G4560
0.384 0.352 0.120 0.624 0.304 gforth-fast thunk-inline Pentium G4560
1.376 1.288 1.272 1.736 0.784 gforth-fast thunk Pentium G4560
0.492 0.556 0.424 0.700 0.396 gforth-fast no retpolines Intel Atom 330
The reason for the performance difference between thunk-inline and
thunk is that thunk disables the dynamic superinstruction optimization
of Gforth, while thunk-inline does not; dynamic superinstructions
reduce the number of indirect branches performed by Gforth, typically
by a factor of 3, but in the case of matrix quite a bit more. By
disabling dynamic superinstructions with the Gforth command-line
option --no-dyamic, we see that thunk-inline has a per-indirect branch
cost that's similar to thunk.
A typical example of a retpoline from using these two options (for an
branch to the address in %rcx) is:
0x000055acfcb19b87: callq 0x55acfcb19b93
0x000055acfcb19b8c: pause
0x000055acfcb19b8e: lfence
0x000055acfcb19b91: jmp 0x55acfcb19b8c
0x000055acfcb19b93: mov %rcx,(%rsp)
0x000055acfcb19b97: retq
- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup,