Deutsch English Français Italiano |
<2024Sep6.152642@mips.complang.tuwien.ac.at> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: anton@mips.complang.tuwien.ac.at (Anton Ertl) Newsgroups: comp.arch Subject: Re: Computer architects leaving Intel... Date: Fri, 06 Sep 2024 13:26:42 GMT Organization: Institut fuer Computersprachen, Technische Universitaet Wien Lines: 181 Message-ID: <2024Sep6.152642@mips.complang.tuwien.ac.at> References: <2024Aug30.161204@mips.complang.tuwien.ac.at> <vautmu$vr5r$1@dont-email.me> <2024Aug31.170347@mips.complang.tuwien.ac.at> <vavpnh$13tj0$2@dont-email.me> <vb2hir$1ju7q$1@dont-email.me> <8lcadjhnlcj5se1hrmo232viiccjk5alu4@4ax.com> <vb4amr$2rcbt$1@dont-email.me> <2024Sep5.133102@mips.complang.tuwien.ac.at> <vbchiv$cde4$1@dont-email.me> <2024Sep5.174939@mips.complang.tuwien.ac.at> <ljuc4fF86o3U2@mid.individual.net> <2024Sep6.092535@mips.complang.tuwien.ac.at> <20240906135718.00004f84@yahoo.com> Injection-Date: Fri, 06 Sep 2024 16:27:06 +0200 (CEST) Injection-Info: dont-email.me; posting-host="2e85e44354343afe68fe387c30e9d02d"; logging-data="908895"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18U+KOCHGRuRScBLLmHCAOY" Cancel-Lock: sha1:ub351VFsQUJ3LjI3P9M4HYqMBgU= X-newsreader: xrn 10.11 Bytes: 7419 Michael S <already5chosen@yahoo.com> writes: >On Fri, 06 Sep 2024 07:25:35 GMT >anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote: >> What gcc produces for both formulations is longer than >> >> dec %rdi >> jno ... >> > >Good trick. Thanks. It's not from me. I published it in 2015 <https://www.complang.tuwien.ac.at/kps2015/proceedings/KPS_2015_submission_29.pdf>, but unfortunately did not give a reference to where I have it from (I read it elsewhere). >The same trick in non-destructive form would be 1 byte longer. > cmp $1, %rdi > jno ... > >But I was not able to force any of compilers currently installed on my >home desktop (gcc 13.2, clang 18.1, MSVC 19.30.30706 == VS2022) to >produce such code. > >The closest was MSVC that sometimes (not in all circumstances) produces >2 bytes longer versiin: > 49 8d 49 ff lea -0x1(%r9),%rcx > 4c 3b c9 cmp %rcx,%r9 > >Of course, it's still good deal shorter than > 48 ba 00 00 00 00 00 00 00 80 movabs $0x8000000000000000,%rdx > 4c 3b ca cmp %rdx,%r9 > >Both gcc and clang [under -fwrapv] insisted on turning x<=x-1 into >x==LLONG_MIN. > >However even if we were able to force compiler to produce desired code, >the space saving is architecture-specific. With this gcc-specific code we can force it: extern long foo1(long); extern long foo2(long); long bar(long a, long b) { long c; if (__builtin_sub_overflow(b,1,&c)) return foo1(a); else return foo2(a); } gcc -O3 -c and gcc -Os -c (gcc-12.2) produce, on AMD64: 0: 48 83 c6 ff add $0xffffffffffffffff,%rsi 4: 70 05 jo b <bar+0xb> 6: e9 00 00 00 00 jmp b <bar+0xb> b: e9 00 00 00 00 jmp 10 <bar+0x10> So, even though %rsi is dead afterwards, it does not use dec, but it's certainly better than the other variants. On Arch A64 both gcc invocations (gcc-10.2) produce: 0: f1000421 subs x1, x1, #0x1 4: 54000046 b.vs c <bar+0xc> 8: 14000000 b 0 <foo2> c: 14000000 b 0 <foo1> On RV64GC bith gcc invocations (gcc-10.3) produce: 0000000000000000 <bar>: 0: fff58793 addi a5,a1,-1 4: 00f5c663 blt a1,a5,10 <.L6> 8: 00000317 auipc t1,0x0 c: 00030067 jr t1 # 8 <bar+0x8> 0000000000000010 <.L6>: 10: 00000317 auipc t1,0x0 14: 00030067 jr t1 # 10 <.L6> So on RISC-V gcc manages to actually translate the if back into "if (b < b-1)" without pessimising that (but gcc-10 does not pessimize this code on AMD64, either. >E.g. I expect no saving on ARM64 where both variants occupie 8 bytes. Here we have the three variants: #include <limits.h> extern long foo1(long); extern long foo2(long); long bar(long a, long b) { long c; if (__builtin_sub_overflow(b,1,&c)) return foo1(a); else return foo2(a); } long bar2(long a, long b) { if (b < b-1) return foo1(a); else return foo2(a); } long bar3(long a, long b) { if (b == LONG_MIN) return foo1(a); else return foo2(a); } And here is what gcc-10 -Os -fwrapv -Wall -c produces: ARM A64: subs x1, x1, #0x1 sub x2, x1, #0x1 mov x2, #0x8000000000000000 b.vs c <bar+0xc> cmp x2, x1 cmp x1, x2 b.le 20 <bar2+0x10> b.ne 34 <bar3+0x10> RV64GC: addi a5,a1,-1 addi a5,a1,-1 li a5,-1 bge a1,a5,10 <.L4> bge a1,a5,28 <.L6> slli a5,a5,0x3f bne a1,a5,40 <.L8> 8 Bytes 8 Bytes 8 Bytes AMD64: add $-1,%rsi lea -0x1(%rsi),%rax mov $0x1,%eax jo b <bar+0xb> cmp %rsi,%rax shl $0x3f,%rax jle 1e <bar2+0xe> cmp %rax,%rsi jne 36 <bar3+0x13> 6 Bytes 9 Bytes 14 Bytes With gcc-12 on AMD64: add -1,%rsi mov $0x1,%eax mov $0x1,%eax jo b <bar+0xb> shl $0x3f,%rax shl $0x3f,%rax cmp %rax,%rsi cmp %rax,%rsi jne 23 <bar2+0x13> jne 23 <bar2+0x13> 6 Bytes 14 Bytes 14 Bytes (Actually in the latter case gcc recognizes that bar2 and bar3 are equivalent and jumps from bar3 to bar2, but I am sure that without bar2, bar3 would look the same as bar2 does now). So when gcc does not pessimize "b < b-1" into "b == LONG_MIN", the straightforward code for the former has the same or smaller size, and the same or smaller number of instructions on these architectures. The "__builtin_sub_overflow(b,1,&c)" has the same or fewer bytes than "b < b-1" and the same or fewer instructions. So, with straightforward translations "__builtin_sub_overflow(b,1,&c)" dominates "b < b-1", which dominates "b == LONG_MIN". As a new feature, gcc-12 recognizes "b < b-1" and pessimizes it into the same code as "b == LONG_MIN". >> Interestingly, the first idiom is a case where gcc recognizes what the >> intention of the programmer is, and warns that it is going to >> miscompile it. The warning is good, the miscompilation not (but it >> would be worse without the warning). >> > >You had more luck with warnings than I did. >In all my test cases both gcc and clang [in absence of -fwrapv] >silently dropped the check and depended code. Interesting. I tried both "b < b-1" and "b >= b+1" and got no warning (with gcc-10 and gcc-12), but I have seen a warning with one of those idioms in the past. Maybe someone decided that warning about this idiom is unnecessary, while "optimizing" it is. - anton -- 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.' Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>