| Deutsch English Français Italiano |
|
<20240909160847.000062a2@yahoo.com> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Michael S <already5chosen@yahoo.com>
Newsgroups: comp.arch
Subject: Re: Computer architects leaving Intel...
Date: Mon, 9 Sep 2024 16:08:47 +0300
Organization: A noiseless patient Spider
Lines: 116
Message-ID: <20240909160847.000062a2@yahoo.com>
References: <2024Aug30.161204@mips.complang.tuwien.ac.at>
<86v7zep35n.fsf@linuxsc.com>
<20240902180903.000035ee@yahoo.com>
<vb7ank$3d0c5$1@dont-email.me>
<20240903190928.00002f92@yahoo.com>
<vb7idh$3e2af$1@dont-email.me>
<86seufo11j.fsf@linuxsc.com>
<vba6qa$3u4jc$1@dont-email.me>
<1246395e530759ac79805e45b3830d8f@www.novabbs.org>
<8634m9lga1.fsf@linuxsc.com>
<vbmb3h$2bfqh$1@dont-email.me>
<20240909122219.00007f81@yahoo.com>
<2024Sep9.123034@mips.complang.tuwien.ac.at>
<20240909145854.00001e4e@yahoo.com>
<2024Sep9.142813@mips.complang.tuwien.ac.at>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 09 Sep 2024 15:08:25 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="45fff2496b15112b5e4e03cadfa28742";
logging-data="2041887"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/fS8LQ6jm+GADwVcF81w50WlFQMzTZunc="
Cancel-Lock: sha1:3UOnROMuBLgZyFOLpDzFRSUkX34=
X-Newsreader: Claws Mail 3.19.1 (GTK+ 2.24.33; x86_64-w64-mingw32)
Bytes: 5611
On Mon, 09 Sep 2024 12:28:13 GMT
anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:
> Michael S <already5chosen@yahoo.com> writes:
> >On Mon, 09 Sep 2024 10:30:34 GMT
> >anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:
> >> One would hope so, but here's what happens with gcc-12:
> >>
> >> #include <string.h>
> >>
> >> void foo1(char *p, char* q)
> >> {
> >> memcpy(p,q,32);
> >> }
> >>
> >> void foo2(char *p, char* q)
> >> {
> >> memmove(p,q,32);
> >> }
> >>
> >> gcc -O3 -mavx2 -c -Wall xxx-memmove.c ; objdump -d xxx-memmove.o:
> >>
> >> 0000000000000000 <foo1>:
> >> 0: c5 fa 6f 06 vmovdqu (%rsi),%xmm0
> >> 4: c5 fa 7f 07 vmovdqu %xmm0,(%rdi)
> >> 8: c5 fa 6f 4e 10 vmovdqu 0x10(%rsi),%xmm1
> >> d: c5 fa 7f 4f 10 vmovdqu %xmm1,0x10(%rdi)
> >> 12: c3 ret
> >> 13: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1)
> >> 1a: 00 00 00 00
> >> 1e: 66 90 xchg %ax,%ax
> >>
> >> 0000000000000020 <foo2>:
> >> 20: ba 20 00 00 00 mov $0x20,%edx
> >> 25: e9 00 00 00 00 jmp 2a <foo2+0xa>
> >>
> >> The jmp in line 25 is probably a tail-call to memmove().
> >>
> >> My guess is that xmm registers and unrolling are used here rather
> >> than ymm registers because waking up the second 128 bits takes
> >> time. But even with that, the code uses two different registers,
> >> and if scheduled differently, could be used for implementing
> >> foo2():
> >>
> >> 0: c5 fa 6f 06 vmovdqu (%rsi),%xmm0
> >> 8: c5 fa 6f 4e 10 vmovdqu 0x10(%rsi),%xmm1
> >> 4: c5 fa 7f 07 vmovdqu %xmm0,(%rdi)
> >> d: c5 fa 7f 4f 10 vmovdqu %xmm1,0x10(%rdi)
> >> 12: c3 ret
> >>
> >> - anton
> >
> >Try -march instead of -mavx2. E.g. -march=haswell
> >Sometimes gcc is beyond logic.
>
> For gcc -O3 -march=haswell I got the same result (with gcc-12). I
> also tried -march=x86-64-v3 with the same result.
>
> But gcc -O3 -march=x86-64-v4 produced:
>
My gcc was 14.1 and -O2. It produced same code as yours below (forcase
of 32) with -march=haswell
> 0000000000000000 <foo1>:
> 0: c5 fe 6f 06 vmovdqu (%rsi),%ymm0
> 4: c5 fe 7f 07 vmovdqu %ymm0,(%rdi)
> 8: c5 f8 77 vzeroupper
> b: c3 ret
> c: 0f 1f 40 00 nopl 0x0(%rax)
>
> 0000000000000010 <foo2>:
> 10: c5 fe 6f 06 vmovdqu (%rsi),%ymm0
> 14: c5 fe 7f 07 vmovdqu %ymm0,(%rdi)
> 18: c5 f8 77 vzeroupper
> 1b: c3 ret
>
> And when changing the length to 64:
>
> 0000000000000000 <foo1>:
> 0: 62 f1 fe 48 6f 06 vmovdqu64 (%rsi),%zmm0
> 6: 62 f1 fe 48 7f 07 vmovdqu64 %zmm0,(%rdi)
> c: c5 f8 77 vzeroupper
> f: c3 ret
>
> 0000000000000010 <foo2>:
> 10: 62 f1 fe 48 6f 06 vmovdqu64 (%rsi),%zmm0
> 16: 62 f1 fe 48 7f 07 vmovdqu64 %zmm0,(%rdi)
> 1c: c5 f8 77 vzeroupper
> 1f: c3 ret
>
And here I got different code for -march=tigerlake and
-march=znver4 despite both having approximately the same ISA.
It seems, for Toger Lake gcc is over-concerned about impact of
unaligned 64-bit accesses.
> But when changing the length to 63:
>
> 0000000000000000 <foo1>:
> 0: c5 fe 6f 06 vmovdqu (%rsi),%ymm0
> 4: c5 fe 7f 07 vmovdqu %ymm0,(%rdi)
> 8: c5 fe 6f 4e 1f vmovdqu 0x1f(%rsi),%ymm1
> d: c5 fe 7f 4f 1f vmovdqu %ymm1,0x1f(%rdi)
> 12: c5 f8 77 vzeroupper
> 15: c3 ret
> 16: 66 2e 0f 1f 84 00 00 cs nopw 0x0(%rax,%rax,1)
> 1d: 00 00 00
>
> 0000000000000020 <foo2>:
> 20: ba 3f 00 00 00 mov $0x3f,%edx
> 25: e9 00 00 00 00 jmp 2a <foo2+0xa>
>
> - anton