Deutsch English Français Italiano |
<vbq452$33h0n$1@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!news.mixmin.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: David Brown <david.brown@hesbynett.no> Newsgroups: comp.arch Subject: Re: Computer architects leaving Intel... Date: Tue, 10 Sep 2024 20:45:53 +0200 Organization: A noiseless patient Spider Lines: 90 Message-ID: <vbq452$33h0n$1@dont-email.me> References: <vasruo$id3b$1@dont-email.me> <vb8587$3gq7e$1@dont-email.me> <vb91e7$3o797$1@dont-email.me> <vb9eeh$3q993$1@dont-email.me> <vb9l7k$3r2c6$2@dont-email.me> <vba26l$3te44$1@dont-email.me> <vbag2s$3vhih$1@dont-email.me> <vbbnf9$8j04$1@dont-email.me> <vbbsl4$9hdg$1@dont-email.me> <vbcbob$bd22$3@dont-email.me> <vbcob9$dvp4$1@dont-email.me> <vbd174$eulp$1@dont-email.me> <vbm67e$2apse$1@dont-email.me> <vbmkln$2cmfo$1@dont-email.me> <vbni3u$2h7pp$1@dont-email.me> <vbp849$2trde$1@dont-email.me> <2024Sep10.191607@mips.complang.tuwien.ac.at> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Date: Tue, 10 Sep 2024 20:45:54 +0200 (CEST) Injection-Info: dont-email.me; posting-host="0acb7c652597723591106fbd8bd9fb14"; logging-data="3261463"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19lUVxx7ukML/Aj7xi7CInFJeOCAH2JAEI=" User-Agent: Mozilla Thunderbird Cancel-Lock: sha1:1Brl2IN5AHpUQMOKFJw6MW7ml/o= In-Reply-To: <2024Sep10.191607@mips.complang.tuwien.ac.at> Content-Language: en-GB Bytes: 4943 On 10/09/2024 19:16, Anton Ertl wrote: > David Brown <david.brown@hesbynett.no> writes: >> However, my point was that "hand-optimised" source code can lead to >> poorer results on newer /compilers/ compared to simpler source code. If >> you've googled for "bit twiddling hacks" for cool tricks, or written >> something like "(x << 4) + (x << 2) + x" instead of "x * 21", then the >> results will be slower with a modern compiler and modern cpu, even >> though the "hand-optimised" version might have been faster two decades >> ago. You can expect the modern tool to convert the multiplication into >> shifts and adds if that is more efficient on the target, or a >> multiplication if that is best on the target. But you can't expect the >> compiler to turn the shifts and adds into a multiplication. > > Why not? Let's see: > > [b3:~/tmp:109062] gcc -Os -c xxx-mul.c && objdump -d xxx-mul.o > > xxx-mul.o: file format elf64-x86-64 > > > Disassembly of section .text: > > 0000000000000000 <foo>: > 0: 48 6b c7 15 imul $0x15,%rdi,%rax > 4: c3 ret > [b3:~/tmp:109063] gcc -O3 -c xxx-mul.c && objdump -d xxx-mul.o > > xxx-mul.o: file format elf64-x86-64 > > > Disassembly of section .text: > > 0000000000000000 <foo>: > 0: 48 8d 04 bf lea (%rdi,%rdi,4),%rax > 4: 48 8d 04 87 lea (%rdi,%rax,4),%rax > 8: c3 ret > > So gcc-12 obviously understands that your "hand-optimized" version is > equivalent to the multiplication, and with -O3 then decides that the > leas are faster. > >> (Sometimes it can, but you can't expect it to.) Again - sometimes a compiler will recognise a particular hand-optimised pattern, turn it back to something logically simpler, then optimise from there. But you cannot /expect/ that. On the whole, compilers are more likely to recognise clear and simple patterns than complex ones, especially using bit manipulation in odd ways. There will always be exceptions, this is just a general rule. And a related general rule is that /humans/ are much better at understanding clear code written in a logical way, than something weird and hand-optimised. > > That also works the other way. > > But it becomes really annoying when I intend it not to perform a > transformation, and it performs the transformation, like when writing > "-(x>0)" and the compiler turns that into a conditional branch. These > days gcc does not do that, but I have just seen another twist: > > long bar(long x) > { > return -(x>0); > } > > gcc-12 -O3 turns this into: > > 10: 31 c0 xor %eax,%eax > 12: 48 85 ff test %rdi,%rdi > 15: 0f 9f c0 setg %al > 18: f7 d8 neg %eax > 1a: 48 98 cltq > 1c: c3 ret > > So apparently sign-extension optimization is apparently still lacking. > Clang-14 handles this fine: > > 10: 31 c0 xor %eax,%eax > 12: 48 85 ff test %rdi,%rdi > 15: 0f 9f c0 setg %al > 18: 48 f7 d8 neg %rax > 1b: c3 ret > One day, perhaps, compilers will be perfect. But not yet :-(