| Deutsch English Français Italiano |
|
<20240909145854.00001e4e@yahoo.com> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Michael S <already5chosen@yahoo.com>
Newsgroups: comp.arch
Subject: Re: Computer architects leaving Intel...
Date: Mon, 9 Sep 2024 14:58:54 +0300
Organization: A noiseless patient Spider
Lines: 82
Message-ID: <20240909145854.00001e4e@yahoo.com>
References: <2024Aug30.161204@mips.complang.tuwien.ac.at>
<vb3k0m$1rth7$1@dont-email.me>
<17d615c6a9e70e9fabe1721c55cfa176@www.novabbs.org>
<86v7zep35n.fsf@linuxsc.com>
<20240902180903.000035ee@yahoo.com>
<vb7ank$3d0c5$1@dont-email.me>
<20240903190928.00002f92@yahoo.com>
<vb7idh$3e2af$1@dont-email.me>
<86seufo11j.fsf@linuxsc.com>
<vba6qa$3u4jc$1@dont-email.me>
<1246395e530759ac79805e45b3830d8f@www.novabbs.org>
<8634m9lga1.fsf@linuxsc.com>
<vbmb3h$2bfqh$1@dont-email.me>
<20240909122219.00007f81@yahoo.com>
<2024Sep9.123034@mips.complang.tuwien.ac.at>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 09 Sep 2024 13:58:32 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="45fff2496b15112b5e4e03cadfa28742";
logging-data="2041887"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19oaQfBKEy1NW4RJOZ4c8n8yY7jner73EU="
Cancel-Lock: sha1:1lP2t5+M8Rq8+WNNFPLlixGKNQ4=
X-Newsreader: Claws Mail 3.19.1 (GTK+ 2.24.33; x86_64-w64-mingw32)
Bytes: 4046
On Mon, 09 Sep 2024 10:30:34 GMT
anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:
> Michael S <already5chosen@yahoo.com> writes:
> >On Mon, 9 Sep 2024 10:20:00 +0200
> >Terje Mathisen <terje.mathisen@tmsw.no> wrote:
> >> float invsqrt(float x)
> >> {
> >> ...
> >> int32_t ix = *(int32_t *) &x;
> [...]
> >> int32_t ix;
> >> memcpy(&ix, &x, sizeof(ix));
> ...
> >I don't know if it is always true in more complex cases, where
> >absence of aliasing is less obvious to compiler.
>
> Something like
>
> memmove(*p, *q, 8)
>
> can be translated to something like
>
> 0: 48 8b 06 mov (%rsi),%rax
> 3: 48 89 07 mov %rax,(%rdi)
>
> without any aliasing worries, and indeed, gcc-9, gcc-10, and gcc-12,
> does that.
>
> >However, I'd expect that as
> >long as a copied item fits in register, the magic will work equally
> >with both memcpy and memmove.
>
> One would hope so, but here's what happens with gcc-12:
>
> #include <string.h>
>
> void foo1(char *p, char* q)
> {
> memcpy(p,q,32);
> }
>
> void foo2(char *p, char* q)
> {
> memmove(p,q,32);
> }
>
> gcc -O3 -mavx2 -c -Wall xxx-memmove.c ; objdump -d xxx-memmove.o:
>
> 0000000000000000 <foo1>:
> 0: c5 fa 6f 06 vmovdqu (%rsi),%xmm0
> 4: c5 fa 7f 07 vmovdqu %xmm0,(%rdi)
> 8: c5 fa 6f 4e 10 vmovdqu 0x10(%rsi),%xmm1
> d: c5 fa 7f 4f 10 vmovdqu %xmm1,0x10(%rdi)
> 12: c3 ret
> 13: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1)
> 1a: 00 00 00 00
> 1e: 66 90 xchg %ax,%ax
>
> 0000000000000020 <foo2>:
> 20: ba 20 00 00 00 mov $0x20,%edx
> 25: e9 00 00 00 00 jmp 2a <foo2+0xa>
>
> The jmp in line 25 is probably a tail-call to memmove().
>
> My guess is that xmm registers and unrolling are used here rather than
> ymm registers because waking up the second 128 bits takes time. But
> even with that, the code uses two different registers, and if
> scheduled differently, could be used for implementing foo2():
>
> 0: c5 fa 6f 06 vmovdqu (%rsi),%xmm0
> 8: c5 fa 6f 4e 10 vmovdqu 0x10(%rsi),%xmm1
> 4: c5 fa 7f 07 vmovdqu %xmm0,(%rdi)
> d: c5 fa 7f 4f 10 vmovdqu %xmm1,0x10(%rdi)
> 12: c3 ret
>
> - anton
Try -march instead of -mavx2. E.g. -march=haswell
Sometimes gcc is beyond logic.