Article <2024Sep9.090725@mips.complang.tuwien.ac.at>

Deutsch English Français Italiano
<2024Sep9.090725@mips.complang.tuwien.ac.at>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Computer architects leaving Intel...
Date: Mon, 09 Sep 2024 07:07:25 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 76
Message-ID: <2024Sep9.090725@mips.complang.tuwien.ac.at>
References: <2024Aug30.161204@mips.complang.tuwien.ac.at> <8lcadjhnlcj5se1hrmo232viiccjk5alu4@4ax.com> <vb3k0m$1rth7$1@dont-email.me> <17d615c6a9e70e9fabe1721c55cfa176@www.novabbs.org> <86v7zep35n.fsf@linuxsc.com> <20240902180903.000035ee@yahoo.com> <vb7ank$3d0c5$1@dont-email.me> <20240903190928.00002f92@yahoo.com> <vb7idh$3e2af$1@dont-email.me> <86seufo11j.fsf@linuxsc.com> <vba6qa$3u4jc$1@dont-email.me> <1246395e530759ac79805e45b3830d8f@www.novabbs.org> <8634m9lga1.fsf@linuxsc.com>
Injection-Date: Mon, 09 Sep 2024 09:40:24 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="bff24d76364ca7ea0d223f91191d43fd";
	logging-data="2463772"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX184Fq83N+iwaLOWvW2g+1F0"
Cancel-Lock: sha1:ntMFAo8wxz3ZjF5e8Mz56JKgL6E=
X-newsreader: xrn 10.11
Bytes: 4118

Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
>mitchalsup@aol.com (MitchAlsup1) writes:
>> So:
>> # define memcpy memomve
>
>Incidentally, if one wants to do this, it's advisable to write
>
>  #undef  memcpy
>
>before the #define of memcpy.
>
>> and move forward with life--for the 2 extra cycles memmove costs
>> it saves everyone long term grief.

Is it two extra cycles?  Here are some data points from
<2017Sep23.174313@mips.complang.tuwien.ac.at>:

Haswell (Core i7-4790K), glibc 2.19
   1    8   32   64  128  256  512   1K  2K    4K   8K  16K block size
  14   14   15   15   17   30   48   85  150  281  570 1370 memmove
  15   16   13   16   19   32   48   86  161  327  631 1420 memcpy

Skylake (Core i5-6600K), glibc 2.19
   1    8   32   64  128  256  512   1K  2K    4K   8K  16K block size
  14   14   14   14   15   27   43   77  147  305  573 1417 memmove
  13   14   10   12   14   27   46   85  165  313  607 1350 memcpy

Zen (Ryzen 5 1600X), glibc 2.24
   1    8   32   64  128  256  512   1K  2K    4K   8K  16K block size
  16   16   16   17   32   43   66  107  177  328  601 1225 memmove
  13   13   14   13   38   49   73  116  188  336  610 1233 memcpy

I don't see a consistent speedup of memcpy over memmove here.

However, when one uses memcpy(&var,ptr,8) or the like to perform an
unaligned access, gcc transforms this into a load (or store) without
the redefinition of memcpy, but into much slower code with the
redefinition (i.e., when using memmove instead of memcpy).

>Simply replacing memcpy() by memmove() of course will always
>work, but there might be negative consequences beyond a cost
>of 2 extra cycles -- for example, if a negative stride is
>better performing than a positive stride, but the nature
>of the compaction forces memmove() to always take the slower
>choice.

If the two memory blocks don't overlap, memmove() can use the fastest
stride.  If the two memory blocks overlap, memcpy() as implemented in
glibc is a bad idea.

The way to go for memmove() is:

On hardware where positive stride is faster:

if (((uintptr)(dest-src)) >= len)
  return memcpy_posstride(dest,src,len)
else
  return memcpy_negstride(dest,src,len)

On hardware where the negative stride is faster:

if (((uintptr)(src-dest)) >= len)
  return memcpy_negstride(dest,src,len)
else
  return memcpy_posstride(dest,src,len)

And I expect that my test is undefined behaviour, but most people
except the UB advocates should understand what I mean.

The benefit of this comparison over just comparing the addresses is
that the branch will have a much lower miss rate.

- anton
-- 
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
  Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>