Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Michael S Newsgroups: comp.arch Subject: Re: Computer architects leaving Intel... Date: Mon, 9 Sep 2024 14:58:54 +0300 Organization: A noiseless patient Spider Lines: 82 Message-ID: <20240909145854.00001e4e@yahoo.com> References: <2024Aug30.161204@mips.complang.tuwien.ac.at> <17d615c6a9e70e9fabe1721c55cfa176@www.novabbs.org> <86v7zep35n.fsf@linuxsc.com> <20240902180903.000035ee@yahoo.com> <20240903190928.00002f92@yahoo.com> <86seufo11j.fsf@linuxsc.com> <1246395e530759ac79805e45b3830d8f@www.novabbs.org> <8634m9lga1.fsf@linuxsc.com> <20240909122219.00007f81@yahoo.com> <2024Sep9.123034@mips.complang.tuwien.ac.at> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Injection-Date: Mon, 09 Sep 2024 13:58:32 +0200 (CEST) Injection-Info: dont-email.me; posting-host="45fff2496b15112b5e4e03cadfa28742"; logging-data="2041887"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19oaQfBKEy1NW4RJOZ4c8n8yY7jner73EU=" Cancel-Lock: sha1:1lP2t5+M8Rq8+WNNFPLlixGKNQ4= X-Newsreader: Claws Mail 3.19.1 (GTK+ 2.24.33; x86_64-w64-mingw32) Bytes: 4046 On Mon, 09 Sep 2024 10:30:34 GMT anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote: > Michael S writes: > >On Mon, 9 Sep 2024 10:20:00 +0200 > >Terje Mathisen wrote: > >> float invsqrt(float x) > >> { > >> ... > >> int32_t ix = *(int32_t *) &x; > [...] > >> int32_t ix; > >> memcpy(&ix, &x, sizeof(ix)); > ... > >I don't know if it is always true in more complex cases, where > >absence of aliasing is less obvious to compiler. > > Something like > > memmove(*p, *q, 8) > > can be translated to something like > > 0: 48 8b 06 mov (%rsi),%rax > 3: 48 89 07 mov %rax,(%rdi) > > without any aliasing worries, and indeed, gcc-9, gcc-10, and gcc-12, > does that. > > >However, I'd expect that as > >long as a copied item fits in register, the magic will work equally > >with both memcpy and memmove. > > One would hope so, but here's what happens with gcc-12: > > #include > > void foo1(char *p, char* q) > { > memcpy(p,q,32); > } > > void foo2(char *p, char* q) > { > memmove(p,q,32); > } > > gcc -O3 -mavx2 -c -Wall xxx-memmove.c ; objdump -d xxx-memmove.o: > > 0000000000000000 : > 0: c5 fa 6f 06 vmovdqu (%rsi),%xmm0 > 4: c5 fa 7f 07 vmovdqu %xmm0,(%rdi) > 8: c5 fa 6f 4e 10 vmovdqu 0x10(%rsi),%xmm1 > d: c5 fa 7f 4f 10 vmovdqu %xmm1,0x10(%rdi) > 12: c3 ret > 13: 66 66 2e 0f 1f 84 00 data16 cs nopw 0x0(%rax,%rax,1) > 1a: 00 00 00 00 > 1e: 66 90 xchg %ax,%ax > > 0000000000000020 : > 20: ba 20 00 00 00 mov $0x20,%edx > 25: e9 00 00 00 00 jmp 2a > > The jmp in line 25 is probably a tail-call to memmove(). > > My guess is that xmm registers and unrolling are used here rather than > ymm registers because waking up the second 128 bits takes time. But > even with that, the code uses two different registers, and if > scheduled differently, could be used for implementing foo2(): > > 0: c5 fa 6f 06 vmovdqu (%rsi),%xmm0 > 8: c5 fa 6f 4e 10 vmovdqu 0x10(%rsi),%xmm1 > 4: c5 fa 7f 07 vmovdqu %xmm0,(%rdi) > d: c5 fa 7f 4f 10 vmovdqu %xmm1,0x10(%rdi) > 12: c3 ret > > - anton Try -march instead of -mavx2. E.g. -march=haswell Sometimes gcc is beyond logic.