| Deutsch English Français Italiano |
|
<86le01j6y4.fsf@linuxsc.com> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!2.eu.feeder.erje.net!3.eu.feeder.erje.net!feeder.erje.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Tim Rentsch <tr.17687@z991.linuxsc.com> Newsgroups: comp.arch Subject: Re: Computer architects leaving Intel... Date: Mon, 09 Sep 2024 06:24:35 -0700 Organization: A noiseless patient Spider Lines: 56 Message-ID: <86le01j6y4.fsf@linuxsc.com> References: <2024Aug30.161204@mips.complang.tuwien.ac.at> <8lcadjhnlcj5se1hrmo232viiccjk5alu4@4ax.com> <vb3k0m$1rth7$1@dont-email.me> <17d615c6a9e70e9fabe1721c55cfa176@www.novabbs.org> <86v7zep35n.fsf@linuxsc.com> <20240902180903.000035ee@yahoo.com> <vb7ank$3d0c5$1@dont-email.me> <20240903190928.00002f92@yahoo.com> <vb7idh$3e2af$1@dont-email.me> <86seufo11j.fsf@linuxsc.com> <vba6qa$3u4jc$1@dont-email.me> <1246395e530759ac79805e45b3830d8f@www.novabbs.org> <8634m9lga1.fsf@linuxsc.com> <2024Sep9.090725@mips.complang.tuwien.ac.at> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Injection-Date: Mon, 09 Sep 2024 15:24:36 +0200 (CEST) Injection-Info: dont-email.me; posting-host="2550a8cb929efdfde26bde1f2c6c70c6"; logging-data="2550486"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18IHSsek/xx/8vUxrd5IgypyLPYBiwBb9I=" User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux) Cancel-Lock: sha1:kStMtfw00Ds3l7p/f1IrTWvLGLY= sha1:55Zgy3uWcwJERDSJPlcNdF8JFek= Bytes: 3743 anton@mips.complang.tuwien.ac.at (Anton Ertl) writes: > Tim Rentsch <tr.17687@z991.linuxsc.com> writes: > >> mitchalsup@aol.com (MitchAlsup1) writes: >> >>> So: >>> # define memcpy memomve >> >> Incidentally, if one wants to do this, it's advisable to write >> >> #undef memcpy >> >> before the #define of memcpy. >> >>> and move forward with life--for the 2 extra cycles memmove costs >>> it saves everyone long term grief. > > Is it two extra cycles? Here are some data points from > <2017Sep23.174313@mips.complang.tuwien.ac.at>: > > Haswell (Core i7-4790K), glibc 2.19 > 1 8 32 64 128 256 512 1K 2K 4K 8K 16K block size > 14 14 15 15 17 30 48 85 150 281 570 1370 memmove > 15 16 13 16 19 32 48 86 161 327 631 1420 memcpy > > Skylake (Core i5-6600K), glibc 2.19 > 1 8 32 64 128 256 512 1K 2K 4K 8K 16K block size > 14 14 14 14 15 27 43 77 147 305 573 1417 memmove > 13 14 10 12 14 27 46 85 165 313 607 1350 memcpy > > Zen (Ryzen 5 1600X), glibc 2.24 > 1 8 32 64 128 256 512 1K 2K 4K 8K 16K block size > 16 16 16 17 32 43 66 107 177 328 601 1225 memmove > 13 13 14 13 38 49 73 116 188 336 610 1233 memcpy > > I don't see a consistent speedup of memcpy over memmove here. > > However, when one uses memcpy(&var,ptr,8) or the like to perform an > unaligned access, gcc transforms this into a load (or store) without > the redefinition of memcpy, but into much slower code with the > redefinition (i.e., when using memmove instead of memcpy). > >> Simply replacing memcpy() by memmove() of course will always >> work, but there might be negative consequences beyond a cost >> of 2 extra cycles -- for example, if a negative stride is >> better performing than a positive stride, but the nature >> of the compaction forces memmove() to always take the slower >> choice. > > If the two memory blocks don't overlap, memmove() can use the > fastest stride. It /could/ use the fastest stride. Whether it /does/ use the fastest stride is a different question (and one that may have different answers on different platforms).