Warning: mysqli::__construct(): (HY000/1203): User howardkn already has more than 'max_user_connections' active connections in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\includes\artfuncs.php on line 21
Failed to connect to MySQL: (1203) User howardkn already has more than 'max_user_connections' active connections
Warning: mysqli::query(): Couldn't fetch mysqli in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\index.php on line 66
Article <2024Sep10.191607@mips.complang.tuwien.ac.at>
Deutsch   English   Français   Italiano  
<2024Sep10.191607@mips.complang.tuwien.ac.at>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Computer architects leaving Intel...
Date: Tue, 10 Sep 2024 17:16:07 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 76
Message-ID: <2024Sep10.191607@mips.complang.tuwien.ac.at>
References: <vasruo$id3b$1@dont-email.me> <vb8587$3gq7e$1@dont-email.me> <vb91e7$3o797$1@dont-email.me> <vb9eeh$3q993$1@dont-email.me> <vb9l7k$3r2c6$2@dont-email.me> <vba26l$3te44$1@dont-email.me> <vbag2s$3vhih$1@dont-email.me> <vbbnf9$8j04$1@dont-email.me> <vbbsl4$9hdg$1@dont-email.me> <vbcbob$bd22$3@dont-email.me> <vbcob9$dvp4$1@dont-email.me> <vbd174$eulp$1@dont-email.me> <vbm67e$2apse$1@dont-email.me> <vbmkln$2cmfo$1@dont-email.me> <vbni3u$2h7pp$1@dont-email.me> <vbp849$2trde$1@dont-email.me>
Injection-Date: Tue, 10 Sep 2024 19:32:31 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="f21f15281580639cd9bc25764f54f587";
	logging-data="3236387"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX183b2z351nIsCcJTbPaldsz"
Cancel-Lock: sha1:iTVfLFm6t/JOhvOjksE3h+7Bueo=
X-newsreader: xrn 10.11
Bytes: 4042

David Brown <david.brown@hesbynett.no> writes:
>However, my point was that "hand-optimised" source code can lead to 
>poorer results on newer /compilers/ compared to simpler source code.  If 
>you've googled for "bit twiddling hacks" for cool tricks, or written 
>something like "(x << 4) + (x << 2) + x" instead of "x * 21", then the 
>results will be slower with a modern compiler and modern cpu, even 
>though the "hand-optimised" version might have been faster two decades 
>ago.  You can expect the modern tool to convert the multiplication into 
>shifts and adds if that is more efficient on the target, or a 
>multiplication if that is best on the target.  But you can't expect the 
>compiler to turn the shifts and adds into a multiplication.

Why not?  Let's see:

[b3:~/tmp:109062] gcc -Os -c xxx-mul.c && objdump -d xxx-mul.o

xxx-mul.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <foo>:
   0:   48 6b c7 15             imul   $0x15,%rdi,%rax
   4:   c3                      ret
[b3:~/tmp:109063] gcc -O3 -c xxx-mul.c && objdump -d xxx-mul.o

xxx-mul.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <foo>:
   0:   48 8d 04 bf             lea    (%rdi,%rdi,4),%rax
   4:   48 8d 04 87             lea    (%rdi,%rax,4),%rax
   8:   c3                      ret

So gcc-12 obviously understands that your "hand-optimized" version is
equivalent to the multiplication, and with -O3 then decides that the
leas are faster.

>(Sometimes it can, but you can't expect it to.)

That also works the other way.

But it becomes really annoying when I intend it not to perform a
transformation, and it performs the transformation, like when writing
"-(x>0)" and the compiler turns that into a conditional branch.  These
days gcc does not do that, but I have just seen another twist:

long bar(long x)
{
  return -(x>0);
}

gcc-12 -O3 turns this into:

  10:   31 c0                   xor    %eax,%eax
  12:   48 85 ff                test   %rdi,%rdi
  15:   0f 9f c0                setg   %al
  18:   f7 d8                   neg    %eax
  1a:   48 98                   cltq
  1c:   c3                      ret

So apparently sign-extension optimization is apparently still lacking.
Clang-14 handles this fine:

  10:   31 c0                   xor    %eax,%eax
  12:   48 85 ff                test   %rdi,%rdi
  15:   0f 9f c0                setg   %al
  18:   48 f7 d8                neg    %rax
  1b:   c3                      ret

- anton
-- 
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
  Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>