Deutsch English Français Italiano |
<2024Sep10.191607@mips.complang.tuwien.ac.at> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: anton@mips.complang.tuwien.ac.at (Anton Ertl) Newsgroups: comp.arch Subject: Re: Computer architects leaving Intel... Date: Tue, 10 Sep 2024 17:16:07 GMT Organization: Institut fuer Computersprachen, Technische Universitaet Wien Lines: 76 Message-ID: <2024Sep10.191607@mips.complang.tuwien.ac.at> References: <vasruo$id3b$1@dont-email.me> <vb8587$3gq7e$1@dont-email.me> <vb91e7$3o797$1@dont-email.me> <vb9eeh$3q993$1@dont-email.me> <vb9l7k$3r2c6$2@dont-email.me> <vba26l$3te44$1@dont-email.me> <vbag2s$3vhih$1@dont-email.me> <vbbnf9$8j04$1@dont-email.me> <vbbsl4$9hdg$1@dont-email.me> <vbcbob$bd22$3@dont-email.me> <vbcob9$dvp4$1@dont-email.me> <vbd174$eulp$1@dont-email.me> <vbm67e$2apse$1@dont-email.me> <vbmkln$2cmfo$1@dont-email.me> <vbni3u$2h7pp$1@dont-email.me> <vbp849$2trde$1@dont-email.me> Injection-Date: Tue, 10 Sep 2024 19:32:31 +0200 (CEST) Injection-Info: dont-email.me; posting-host="f21f15281580639cd9bc25764f54f587"; logging-data="3236387"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX183b2z351nIsCcJTbPaldsz" Cancel-Lock: sha1:iTVfLFm6t/JOhvOjksE3h+7Bueo= X-newsreader: xrn 10.11 Bytes: 4042 David Brown <david.brown@hesbynett.no> writes: >However, my point was that "hand-optimised" source code can lead to >poorer results on newer /compilers/ compared to simpler source code. If >you've googled for "bit twiddling hacks" for cool tricks, or written >something like "(x << 4) + (x << 2) + x" instead of "x * 21", then the >results will be slower with a modern compiler and modern cpu, even >though the "hand-optimised" version might have been faster two decades >ago. You can expect the modern tool to convert the multiplication into >shifts and adds if that is more efficient on the target, or a >multiplication if that is best on the target. But you can't expect the >compiler to turn the shifts and adds into a multiplication. Why not? Let's see: [b3:~/tmp:109062] gcc -Os -c xxx-mul.c && objdump -d xxx-mul.o xxx-mul.o: file format elf64-x86-64 Disassembly of section .text: 0000000000000000 <foo>: 0: 48 6b c7 15 imul $0x15,%rdi,%rax 4: c3 ret [b3:~/tmp:109063] gcc -O3 -c xxx-mul.c && objdump -d xxx-mul.o xxx-mul.o: file format elf64-x86-64 Disassembly of section .text: 0000000000000000 <foo>: 0: 48 8d 04 bf lea (%rdi,%rdi,4),%rax 4: 48 8d 04 87 lea (%rdi,%rax,4),%rax 8: c3 ret So gcc-12 obviously understands that your "hand-optimized" version is equivalent to the multiplication, and with -O3 then decides that the leas are faster. >(Sometimes it can, but you can't expect it to.) That also works the other way. But it becomes really annoying when I intend it not to perform a transformation, and it performs the transformation, like when writing "-(x>0)" and the compiler turns that into a conditional branch. These days gcc does not do that, but I have just seen another twist: long bar(long x) { return -(x>0); } gcc-12 -O3 turns this into: 10: 31 c0 xor %eax,%eax 12: 48 85 ff test %rdi,%rdi 15: 0f 9f c0 setg %al 18: f7 d8 neg %eax 1a: 48 98 cltq 1c: c3 ret So apparently sign-extension optimization is apparently still lacking. Clang-14 handles this fine: 10: 31 c0 xor %eax,%eax 12: 48 85 ff test %rdi,%rdi 15: 0f 9f c0 setg %al 18: 48 f7 d8 neg %rax 1b: c3 ret - anton -- 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.' Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>