| Deutsch English Français Italiano |
|
<2024Sep6.152642@mips.complang.tuwien.ac.at> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Computer architects leaving Intel...
Date: Fri, 06 Sep 2024 13:26:42 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 181
Message-ID: <2024Sep6.152642@mips.complang.tuwien.ac.at>
References: <2024Aug30.161204@mips.complang.tuwien.ac.at> <vautmu$vr5r$1@dont-email.me> <2024Aug31.170347@mips.complang.tuwien.ac.at> <vavpnh$13tj0$2@dont-email.me> <vb2hir$1ju7q$1@dont-email.me> <8lcadjhnlcj5se1hrmo232viiccjk5alu4@4ax.com> <vb4amr$2rcbt$1@dont-email.me> <2024Sep5.133102@mips.complang.tuwien.ac.at> <vbchiv$cde4$1@dont-email.me> <2024Sep5.174939@mips.complang.tuwien.ac.at> <ljuc4fF86o3U2@mid.individual.net> <2024Sep6.092535@mips.complang.tuwien.ac.at> <20240906135718.00004f84@yahoo.com>
Injection-Date: Fri, 06 Sep 2024 16:27:06 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="2e85e44354343afe68fe387c30e9d02d";
logging-data="908895"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18U+KOCHGRuRScBLLmHCAOY"
Cancel-Lock: sha1:ub351VFsQUJ3LjI3P9M4HYqMBgU=
X-newsreader: xrn 10.11
Bytes: 7419
Michael S <already5chosen@yahoo.com> writes:
>On Fri, 06 Sep 2024 07:25:35 GMT
>anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:
>> What gcc produces for both formulations is longer than
>>
>> dec %rdi
>> jno ...
>>
>
>Good trick.
Thanks. It's not from me. I published it in 2015
<https://www.complang.tuwien.ac.at/kps2015/proceedings/KPS_2015_submission_29.pdf>,
but unfortunately did not give a reference to where I have it from (I
read it elsewhere).
>The same trick in non-destructive form would be 1 byte longer.
> cmp $1, %rdi
> jno ...
>
>But I was not able to force any of compilers currently installed on my
>home desktop (gcc 13.2, clang 18.1, MSVC 19.30.30706 == VS2022) to
>produce such code.
>
>The closest was MSVC that sometimes (not in all circumstances) produces
>2 bytes longer versiin:
> 49 8d 49 ff lea -0x1(%r9),%rcx
> 4c 3b c9 cmp %rcx,%r9
>
>Of course, it's still good deal shorter than
> 48 ba 00 00 00 00 00 00 00 80 movabs $0x8000000000000000,%rdx
> 4c 3b ca cmp %rdx,%r9
>
>Both gcc and clang [under -fwrapv] insisted on turning x<=x-1 into
>x==LLONG_MIN.
>
>However even if we were able to force compiler to produce desired code,
>the space saving is architecture-specific.
With this gcc-specific code we can force it:
extern long foo1(long);
extern long foo2(long);
long bar(long a, long b)
{
long c;
if (__builtin_sub_overflow(b,1,&c))
return foo1(a);
else
return foo2(a);
}
gcc -O3 -c and gcc -Os -c (gcc-12.2) produce, on AMD64:
0: 48 83 c6 ff add $0xffffffffffffffff,%rsi
4: 70 05 jo b <bar+0xb>
6: e9 00 00 00 00 jmp b <bar+0xb>
b: e9 00 00 00 00 jmp 10 <bar+0x10>
So, even though %rsi is dead afterwards, it does not use dec, but it's
certainly better than the other variants.
On Arch A64 both gcc invocations (gcc-10.2) produce:
0: f1000421 subs x1, x1, #0x1
4: 54000046 b.vs c <bar+0xc>
8: 14000000 b 0 <foo2>
c: 14000000 b 0 <foo1>
On RV64GC bith gcc invocations (gcc-10.3) produce:
0000000000000000 <bar>:
0: fff58793 addi a5,a1,-1
4: 00f5c663 blt a1,a5,10 <.L6>
8: 00000317 auipc t1,0x0
c: 00030067 jr t1 # 8 <bar+0x8>
0000000000000010 <.L6>:
10: 00000317 auipc t1,0x0
14: 00030067 jr t1 # 10 <.L6>
So on RISC-V gcc manages to actually translate the if back into "if (b
< b-1)" without pessimising that (but gcc-10 does not pessimize this
code on AMD64, either.
>E.g. I expect no saving on ARM64 where both variants occupie 8 bytes.
Here we have the three variants:
#include <limits.h>
extern long foo1(long);
extern long foo2(long);
long bar(long a, long b)
{
long c;
if (__builtin_sub_overflow(b,1,&c))
return foo1(a);
else
return foo2(a);
}
long bar2(long a, long b)
{
if (b < b-1)
return foo1(a);
else
return foo2(a);
}
long bar3(long a, long b)
{
if (b == LONG_MIN)
return foo1(a);
else
return foo2(a);
}
And here is what gcc-10 -Os -fwrapv -Wall -c produces:
ARM A64:
subs x1, x1, #0x1 sub x2, x1, #0x1 mov x2, #0x8000000000000000
b.vs c <bar+0xc> cmp x2, x1 cmp x1, x2
b.le 20 <bar2+0x10> b.ne 34 <bar3+0x10>
RV64GC:
addi a5,a1,-1 addi a5,a1,-1 li a5,-1
bge a1,a5,10 <.L4> bge a1,a5,28 <.L6> slli a5,a5,0x3f
bne a1,a5,40 <.L8>
8 Bytes 8 Bytes 8 Bytes
AMD64:
add $-1,%rsi lea -0x1(%rsi),%rax mov $0x1,%eax
jo b <bar+0xb> cmp %rsi,%rax shl $0x3f,%rax
jle 1e <bar2+0xe> cmp %rax,%rsi
jne 36 <bar3+0x13>
6 Bytes 9 Bytes 14 Bytes
With gcc-12 on AMD64:
add -1,%rsi mov $0x1,%eax mov $0x1,%eax
jo b <bar+0xb> shl $0x3f,%rax shl $0x3f,%rax
cmp %rax,%rsi cmp %rax,%rsi
jne 23 <bar2+0x13> jne 23 <bar2+0x13>
6 Bytes 14 Bytes 14 Bytes
(Actually in the latter case gcc recognizes that bar2 and bar3 are
equivalent and jumps from bar3 to bar2, but I am sure that without
bar2, bar3 would look the same as bar2 does now).
So when gcc does not pessimize "b < b-1" into "b == LONG_MIN", the
straightforward code for the former has the same or smaller size, and
the same or smaller number of instructions on these architectures.
The "__builtin_sub_overflow(b,1,&c)" has the same or fewer bytes than
"b < b-1" and the same or fewer instructions. So, with
straightforward translations "__builtin_sub_overflow(b,1,&c)"
dominates "b < b-1", which dominates "b == LONG_MIN".
As a new feature, gcc-12 recognizes "b < b-1" and pessimizes it into
the same code as "b == LONG_MIN".
>> Interestingly, the first idiom is a case where gcc recognizes what the
>> intention of the programmer is, and warns that it is going to
>> miscompile it. The warning is good, the miscompilation not (but it
>> would be worse without the warning).
>>
>
>You had more luck with warnings than I did.
>In all my test cases both gcc and clang [in absence of -fwrapv]
>silently dropped the check and depended code.
Interesting. I tried both "b < b-1" and "b >= b+1" and got no warning
(with gcc-10 and gcc-12), but I have seen a warning with one of those
idioms in the past. Maybe someone decided that warning about this
idiom is unnecessary, while "optimizing" it is.
- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>