Article <vbq452$33h0n$1@dont-email.me>

Deutsch English Français Italiano
<vbq452$33h0n$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!news.mixmin.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: David Brown <david.brown@hesbynett.no>
Newsgroups: comp.arch
Subject: Re: Computer architects leaving Intel...
Date: Tue, 10 Sep 2024 20:45:53 +0200
Organization: A noiseless patient Spider
Lines: 90
Message-ID: <vbq452$33h0n$1@dont-email.me>
References: <vasruo$id3b$1@dont-email.me> <vb8587$3gq7e$1@dont-email.me>
 <vb91e7$3o797$1@dont-email.me> <vb9eeh$3q993$1@dont-email.me>
 <vb9l7k$3r2c6$2@dont-email.me> <vba26l$3te44$1@dont-email.me>
 <vbag2s$3vhih$1@dont-email.me> <vbbnf9$8j04$1@dont-email.me>
 <vbbsl4$9hdg$1@dont-email.me> <vbcbob$bd22$3@dont-email.me>
 <vbcob9$dvp4$1@dont-email.me> <vbd174$eulp$1@dont-email.me>
 <vbm67e$2apse$1@dont-email.me> <vbmkln$2cmfo$1@dont-email.me>
 <vbni3u$2h7pp$1@dont-email.me> <vbp849$2trde$1@dont-email.me>
 <2024Sep10.191607@mips.complang.tuwien.ac.at>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 10 Sep 2024 20:45:54 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="0acb7c652597723591106fbd8bd9fb14";
	logging-data="3261463"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX19lUVxx7ukML/Aj7xi7CInFJeOCAH2JAEI="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:1Brl2IN5AHpUQMOKFJw6MW7ml/o=
In-Reply-To: <2024Sep10.191607@mips.complang.tuwien.ac.at>
Content-Language: en-GB
Bytes: 4943

On 10/09/2024 19:16, Anton Ertl wrote:
> David Brown <david.brown@hesbynett.no> writes:
>> However, my point was that "hand-optimised" source code can lead to
>> poorer results on newer /compilers/ compared to simpler source code.  If
>> you've googled for "bit twiddling hacks" for cool tricks, or written
>> something like "(x << 4) + (x << 2) + x" instead of "x * 21", then the
>> results will be slower with a modern compiler and modern cpu, even
>> though the "hand-optimised" version might have been faster two decades
>> ago.  You can expect the modern tool to convert the multiplication into
>> shifts and adds if that is more efficient on the target, or a
>> multiplication if that is best on the target.  But you can't expect the
>> compiler to turn the shifts and adds into a multiplication.
> 
> Why not?  Let's see:
> 
> [b3:~/tmp:109062] gcc -Os -c xxx-mul.c && objdump -d xxx-mul.o
> 
> xxx-mul.o:     file format elf64-x86-64
> 
> 
> Disassembly of section .text:
> 
> 0000000000000000 <foo>:
>     0:   48 6b c7 15             imul   $0x15,%rdi,%rax
>     4:   c3                      ret
> [b3:~/tmp:109063] gcc -O3 -c xxx-mul.c && objdump -d xxx-mul.o
> 
> xxx-mul.o:     file format elf64-x86-64
> 
> 
> Disassembly of section .text:
> 
> 0000000000000000 <foo>:
>     0:   48 8d 04 bf             lea    (%rdi,%rdi,4),%rax
>     4:   48 8d 04 87             lea    (%rdi,%rax,4),%rax
>     8:   c3                      ret
> 
> So gcc-12 obviously understands that your "hand-optimized" version is
> equivalent to the multiplication, and with -O3 then decides that the
> leas are faster.
> 
>> (Sometimes it can, but you can't expect it to.)

Again - sometimes a compiler will recognise a particular hand-optimised 
pattern, turn it back to something logically simpler, then optimise from 
there.  But you cannot /expect/ that.  On the whole, compilers are more 
likely to recognise clear and simple patterns than complex ones, 
especially using bit manipulation in odd ways.

There will always be exceptions, this is just a general rule.

And a related general rule is that /humans/ are much better at 
understanding clear code written in a logical way, than something weird 
and hand-optimised.

> 
> That also works the other way.
> 
> But it becomes really annoying when I intend it not to perform a
> transformation, and it performs the transformation, like when writing
> "-(x>0)" and the compiler turns that into a conditional branch.  These
> days gcc does not do that, but I have just seen another twist:
> 
> long bar(long x)
> {
>    return -(x>0);
> }
> 
> gcc-12 -O3 turns this into:
> 
>    10:   31 c0                   xor    %eax,%eax
>    12:   48 85 ff                test   %rdi,%rdi
>    15:   0f 9f c0                setg   %al
>    18:   f7 d8                   neg    %eax
>    1a:   48 98                   cltq
>    1c:   c3                      ret
> 
> So apparently sign-extension optimization is apparently still lacking.
> Clang-14 handles this fine:
> 
>    10:   31 c0                   xor    %eax,%eax
>    12:   48 85 ff                test   %rdi,%rdi
>    15:   0f 9f c0                setg   %al
>    18:   48 f7 d8                neg    %rax
>    1b:   c3                      ret
> 

One day, perhaps, compilers will be perfect.  But not yet :-(