Deutsch English Français Italiano |
<2024Sep9.090725@mips.complang.tuwien.ac.at> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: anton@mips.complang.tuwien.ac.at (Anton Ertl) Newsgroups: comp.arch Subject: Re: Computer architects leaving Intel... Date: Mon, 09 Sep 2024 07:07:25 GMT Organization: Institut fuer Computersprachen, Technische Universitaet Wien Lines: 76 Message-ID: <2024Sep9.090725@mips.complang.tuwien.ac.at> References: <2024Aug30.161204@mips.complang.tuwien.ac.at> <8lcadjhnlcj5se1hrmo232viiccjk5alu4@4ax.com> <vb3k0m$1rth7$1@dont-email.me> <17d615c6a9e70e9fabe1721c55cfa176@www.novabbs.org> <86v7zep35n.fsf@linuxsc.com> <20240902180903.000035ee@yahoo.com> <vb7ank$3d0c5$1@dont-email.me> <20240903190928.00002f92@yahoo.com> <vb7idh$3e2af$1@dont-email.me> <86seufo11j.fsf@linuxsc.com> <vba6qa$3u4jc$1@dont-email.me> <1246395e530759ac79805e45b3830d8f@www.novabbs.org> <8634m9lga1.fsf@linuxsc.com> Injection-Date: Mon, 09 Sep 2024 09:40:24 +0200 (CEST) Injection-Info: dont-email.me; posting-host="bff24d76364ca7ea0d223f91191d43fd"; logging-data="2463772"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX184Fq83N+iwaLOWvW2g+1F0" Cancel-Lock: sha1:ntMFAo8wxz3ZjF5e8Mz56JKgL6E= X-newsreader: xrn 10.11 Bytes: 4118 Tim Rentsch <tr.17687@z991.linuxsc.com> writes: >mitchalsup@aol.com (MitchAlsup1) writes: >> So: >> # define memcpy memomve > >Incidentally, if one wants to do this, it's advisable to write > > #undef memcpy > >before the #define of memcpy. > >> and move forward with life--for the 2 extra cycles memmove costs >> it saves everyone long term grief. Is it two extra cycles? Here are some data points from <2017Sep23.174313@mips.complang.tuwien.ac.at>: Haswell (Core i7-4790K), glibc 2.19 1 8 32 64 128 256 512 1K 2K 4K 8K 16K block size 14 14 15 15 17 30 48 85 150 281 570 1370 memmove 15 16 13 16 19 32 48 86 161 327 631 1420 memcpy Skylake (Core i5-6600K), glibc 2.19 1 8 32 64 128 256 512 1K 2K 4K 8K 16K block size 14 14 14 14 15 27 43 77 147 305 573 1417 memmove 13 14 10 12 14 27 46 85 165 313 607 1350 memcpy Zen (Ryzen 5 1600X), glibc 2.24 1 8 32 64 128 256 512 1K 2K 4K 8K 16K block size 16 16 16 17 32 43 66 107 177 328 601 1225 memmove 13 13 14 13 38 49 73 116 188 336 610 1233 memcpy I don't see a consistent speedup of memcpy over memmove here. However, when one uses memcpy(&var,ptr,8) or the like to perform an unaligned access, gcc transforms this into a load (or store) without the redefinition of memcpy, but into much slower code with the redefinition (i.e., when using memmove instead of memcpy). >Simply replacing memcpy() by memmove() of course will always >work, but there might be negative consequences beyond a cost >of 2 extra cycles -- for example, if a negative stride is >better performing than a positive stride, but the nature >of the compaction forces memmove() to always take the slower >choice. If the two memory blocks don't overlap, memmove() can use the fastest stride. If the two memory blocks overlap, memcpy() as implemented in glibc is a bad idea. The way to go for memmove() is: On hardware where positive stride is faster: if (((uintptr)(dest-src)) >= len) return memcpy_posstride(dest,src,len) else return memcpy_negstride(dest,src,len) On hardware where the negative stride is faster: if (((uintptr)(src-dest)) >= len) return memcpy_negstride(dest,src,len) else return memcpy_posstride(dest,src,len) And I expect that my test is undefined behaviour, but most people except the UB advocates should understand what I mean. The benefit of this comparison over just comparing the addresses is that the branch will have a much lower miss rate. - anton -- 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.' Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>