| Deutsch English Français Italiano |
|
<2024Sep10.191607@mips.complang.tuwien.ac.at> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Computer architects leaving Intel...
Date: Tue, 10 Sep 2024 17:16:07 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 76
Message-ID: <2024Sep10.191607@mips.complang.tuwien.ac.at>
References: <vasruo$id3b$1@dont-email.me> <vb8587$3gq7e$1@dont-email.me> <vb91e7$3o797$1@dont-email.me> <vb9eeh$3q993$1@dont-email.me> <vb9l7k$3r2c6$2@dont-email.me> <vba26l$3te44$1@dont-email.me> <vbag2s$3vhih$1@dont-email.me> <vbbnf9$8j04$1@dont-email.me> <vbbsl4$9hdg$1@dont-email.me> <vbcbob$bd22$3@dont-email.me> <vbcob9$dvp4$1@dont-email.me> <vbd174$eulp$1@dont-email.me> <vbm67e$2apse$1@dont-email.me> <vbmkln$2cmfo$1@dont-email.me> <vbni3u$2h7pp$1@dont-email.me> <vbp849$2trde$1@dont-email.me>
Injection-Date: Tue, 10 Sep 2024 19:32:31 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="f21f15281580639cd9bc25764f54f587";
logging-data="3236387"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX183b2z351nIsCcJTbPaldsz"
Cancel-Lock: sha1:iTVfLFm6t/JOhvOjksE3h+7Bueo=
X-newsreader: xrn 10.11
Bytes: 4042
David Brown <david.brown@hesbynett.no> writes:
>However, my point was that "hand-optimised" source code can lead to
>poorer results on newer /compilers/ compared to simpler source code. If
>you've googled for "bit twiddling hacks" for cool tricks, or written
>something like "(x << 4) + (x << 2) + x" instead of "x * 21", then the
>results will be slower with a modern compiler and modern cpu, even
>though the "hand-optimised" version might have been faster two decades
>ago. You can expect the modern tool to convert the multiplication into
>shifts and adds if that is more efficient on the target, or a
>multiplication if that is best on the target. But you can't expect the
>compiler to turn the shifts and adds into a multiplication.
Why not? Let's see:
[b3:~/tmp:109062] gcc -Os -c xxx-mul.c && objdump -d xxx-mul.o
xxx-mul.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <foo>:
0: 48 6b c7 15 imul $0x15,%rdi,%rax
4: c3 ret
[b3:~/tmp:109063] gcc -O3 -c xxx-mul.c && objdump -d xxx-mul.o
xxx-mul.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <foo>:
0: 48 8d 04 bf lea (%rdi,%rdi,4),%rax
4: 48 8d 04 87 lea (%rdi,%rax,4),%rax
8: c3 ret
So gcc-12 obviously understands that your "hand-optimized" version is
equivalent to the multiplication, and with -O3 then decides that the
leas are faster.
>(Sometimes it can, but you can't expect it to.)
That also works the other way.
But it becomes really annoying when I intend it not to perform a
transformation, and it performs the transformation, like when writing
"-(x>0)" and the compiler turns that into a conditional branch. These
days gcc does not do that, but I have just seen another twist:
long bar(long x)
{
return -(x>0);
}
gcc-12 -O3 turns this into:
10: 31 c0 xor %eax,%eax
12: 48 85 ff test %rdi,%rdi
15: 0f 9f c0 setg %al
18: f7 d8 neg %eax
1a: 48 98 cltq
1c: c3 ret
So apparently sign-extension optimization is apparently still lacking.
Clang-14 handles this fine:
10: 31 c0 xor %eax,%eax
12: 48 85 ff test %rdi,%rdi
15: 0f 9f c0 setg %al
18: 48 f7 d8 neg %rax
1b: c3 ret
- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>