Path: ...!2.eu.feeder.erje.net!3.eu.feeder.erje.net!feeder.erje.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Tim Rentsch
Newsgroups: comp.arch
Subject: Re: Computer architects leaving Intel...
Date: Mon, 09 Sep 2024 06:24:35 -0700
Organization: A noiseless patient Spider
Lines: 56
Message-ID: <86le01j6y4.fsf@linuxsc.com>
References: <2024Aug30.161204@mips.complang.tuwien.ac.at> <8lcadjhnlcj5se1hrmo232viiccjk5alu4@4ax.com> <17d615c6a9e70e9fabe1721c55cfa176@www.novabbs.org> <86v7zep35n.fsf@linuxsc.com> <20240902180903.000035ee@yahoo.com> <20240903190928.00002f92@yahoo.com> <86seufo11j.fsf@linuxsc.com> <1246395e530759ac79805e45b3830d8f@www.novabbs.org> <8634m9lga1.fsf@linuxsc.com> <2024Sep9.090725@mips.complang.tuwien.ac.at>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Injection-Date: Mon, 09 Sep 2024 15:24:36 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="2550a8cb929efdfde26bde1f2c6c70c6";
logging-data="2550486"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18IHSsek/xx/8vUxrd5IgypyLPYBiwBb9I="
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux)
Cancel-Lock: sha1:kStMtfw00Ds3l7p/f1IrTWvLGLY=
sha1:55Zgy3uWcwJERDSJPlcNdF8JFek=
Bytes: 3743
anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
> Tim Rentsch writes:
>
>> mitchalsup@aol.com (MitchAlsup1) writes:
>>
>>> So:
>>> # define memcpy memomve
>>
>> Incidentally, if one wants to do this, it's advisable to write
>>
>> #undef memcpy
>>
>> before the #define of memcpy.
>>
>>> and move forward with life--for the 2 extra cycles memmove costs
>>> it saves everyone long term grief.
>
> Is it two extra cycles? Here are some data points from
> <2017Sep23.174313@mips.complang.tuwien.ac.at>:
>
> Haswell (Core i7-4790K), glibc 2.19
> 1 8 32 64 128 256 512 1K 2K 4K 8K 16K block size
> 14 14 15 15 17 30 48 85 150 281 570 1370 memmove
> 15 16 13 16 19 32 48 86 161 327 631 1420 memcpy
>
> Skylake (Core i5-6600K), glibc 2.19
> 1 8 32 64 128 256 512 1K 2K 4K 8K 16K block size
> 14 14 14 14 15 27 43 77 147 305 573 1417 memmove
> 13 14 10 12 14 27 46 85 165 313 607 1350 memcpy
>
> Zen (Ryzen 5 1600X), glibc 2.24
> 1 8 32 64 128 256 512 1K 2K 4K 8K 16K block size
> 16 16 16 17 32 43 66 107 177 328 601 1225 memmove
> 13 13 14 13 38 49 73 116 188 336 610 1233 memcpy
>
> I don't see a consistent speedup of memcpy over memmove here.
>
> However, when one uses memcpy(&var,ptr,8) or the like to perform an
> unaligned access, gcc transforms this into a load (or store) without
> the redefinition of memcpy, but into much slower code with the
> redefinition (i.e., when using memmove instead of memcpy).
>
>> Simply replacing memcpy() by memmove() of course will always
>> work, but there might be negative consequences beyond a cost
>> of 2 extra cycles -- for example, if a negative stride is
>> better performing than a positive stride, but the nature
>> of the compaction forces memmove() to always take the slower
>> choice.
>
> If the two memory blocks don't overlap, memmove() can use the
> fastest stride.
It /could/ use the fastest stride. Whether it /does/ use the
fastest stride is a different question (and one that may have
different answers on different platforms).