Path: ...!2.eu.feeder.erje.net!3.eu.feeder.erje.net!feeder.erje.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Tim Rentsch <tr.17687@z991.linuxsc.com>
Newsgroups: comp.arch
Subject: Re: Computer architects leaving Intel...
Date: Mon, 09 Sep 2024 06:24:35 -0700
Organization: A noiseless patient Spider
Lines: 56
Message-ID: <86le01j6y4.fsf@linuxsc.com>
References: <2024Aug30.161204@mips.complang.tuwien.ac.at> <8lcadjhnlcj5se1hrmo232viiccjk5alu4@4ax.com> <vb3k0m$1rth7$1@dont-email.me> <17d615c6a9e70e9fabe1721c55cfa176@www.novabbs.org> <86v7zep35n.fsf@linuxsc.com> <20240902180903.000035ee@yahoo.com> <vb7ank$3d0c5$1@dont-email.me> <20240903190928.00002f92@yahoo.com> <vb7idh$3e2af$1@dont-email.me> <86seufo11j.fsf@linuxsc.com> <vba6qa$3u4jc$1@dont-email.me> <1246395e530759ac79805e45b3830d8f@www.novabbs.org> <8634m9lga1.fsf@linuxsc.com> <2024Sep9.090725@mips.complang.tuwien.ac.at>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Injection-Date: Mon, 09 Sep 2024 15:24:36 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="2550a8cb929efdfde26bde1f2c6c70c6";
	logging-data="2550486"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX18IHSsek/xx/8vUxrd5IgypyLPYBiwBb9I="
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux)
Cancel-Lock: sha1:kStMtfw00Ds3l7p/f1IrTWvLGLY=
	sha1:55Zgy3uWcwJERDSJPlcNdF8JFek=
Bytes: 3743

anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:

> Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
>
>> mitchalsup@aol.com (MitchAlsup1) writes:
>>
>>> So:
>>> # define memcpy memomve
>>
>> Incidentally, if one wants to do this, it's advisable to write
>>
>>  #undef  memcpy
>>
>> before the #define of memcpy.
>>
>>> and move forward with life--for the 2 extra cycles memmove costs
>>> it saves everyone long term grief.
>
> Is it two extra cycles?  Here are some data points from
> <2017Sep23.174313@mips.complang.tuwien.ac.at>:
>
> Haswell (Core i7-4790K), glibc 2.19
>    1    8   32   64  128  256  512   1K  2K    4K   8K  16K block size
>   14   14   15   15   17   30   48   85  150  281  570 1370 memmove
>   15   16   13   16   19   32   48   86  161  327  631 1420 memcpy
>
> Skylake (Core i5-6600K), glibc 2.19
>    1    8   32   64  128  256  512   1K  2K    4K   8K  16K block size
>   14   14   14   14   15   27   43   77  147  305  573 1417 memmove
>   13   14   10   12   14   27   46   85  165  313  607 1350 memcpy
>
> Zen (Ryzen 5 1600X), glibc 2.24
>    1    8   32   64  128  256  512   1K  2K    4K   8K  16K block size
>   16   16   16   17   32   43   66  107  177  328  601 1225 memmove
>   13   13   14   13   38   49   73  116  188  336  610 1233 memcpy
>
> I don't see a consistent speedup of memcpy over memmove here.
>
> However, when one uses memcpy(&var,ptr,8) or the like to perform an
> unaligned access, gcc transforms this into a load (or store) without
> the redefinition of memcpy, but into much slower code with the
> redefinition (i.e., when using memmove instead of memcpy).
>
>> Simply replacing memcpy() by memmove() of course will always
>> work, but there might be negative consequences beyond a cost
>> of 2 extra cycles -- for example, if a negative stride is
>> better performing than a positive stride, but the nature
>> of the compaction forces memmove() to always take the slower
>> choice.
>
> If the two memory blocks don't overlap, memmove() can use the
> fastest stride.

It /could/ use the fastest stride.  Whether it /does/ use the
fastest stride is a different question (and one that may have
different answers on different platforms).