Deutsch   English   Français   Italiano  
<2025Mar12.174636@mips.complang.tuwien.ac.at>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: rep movsb vs. simpler instructions for memcpy/memmove (was: Why VAX ..)
Date: Wed, 12 Mar 2025 16:46:36 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 102
Message-ID: <2025Mar12.174636@mips.complang.tuwien.ac.at>
References: <vpufbv$4qc5$1@dont-email.me> <vq4qav$1dksd$1@dont-email.me> <vq5dm2$1h3mg$5@dont-email.me> <2025Mar4.110420@mips.complang.tuwien.ac.at> <vq829a$232tl$6@dont-email.me> <2025Mar5.083636@mips.complang.tuwien.ac.at> <vqdljd$29f8f$2@paganini.bofh.team> <vqdrh9$3cdrc$1@dont-email.me> <vqqcm0$3l3i5$1@paganini.bofh.team> <2025Mar12.094228@mips.complang.tuwien.ac.at> <20250312114828.00003e99@yahoo.com> <2025Mar12.122836@mips.complang.tuwien.ac.at> <20250312140915.000010a8@yahoo.com>
Injection-Date: Wed, 12 Mar 2025 18:25:40 +0100 (CET)
Injection-Info: dont-email.me; posting-host="e153970a2f02b13d8544fa3a32e66bd4";
	logging-data="2869919"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1/IkKiP5SzAnUhZ1Ly7o0Ai"
Cancel-Lock: sha1:FRAWz7rhN9oNDR56sE5fEQPeNuQ=
X-newsreader: xrn 10.11

Michael S <already5chosen@yahoo.com> writes:
>On Wed, 12 Mar 2025 11:28:36 GMT
>anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:
>> My experiments were with the code in
>> <https://github.com/AntonErtl/move/>.
>
>Non of those are simple loops that I mentioned above.

They are not.  If you want short code, rep movsb is unbeatable (for
memmove(), you have to do a little more, however).

>>  I posted performance results in
>> <2017Sep19.082137@mips.complang.tuwien.ac.at>
>> <2017Sep20.184358@mips.complang.tuwien.ac.at>
>> <2017Sep23.174313@mips.complang.tuwien.ac.at>
>> 
>> My routines were generally faster than rep movsb, except for pretty
>> large blocks (16KB).
>> 
>
>Idiots from corporate IT blocked http://al.howardknight.net/

I feel with you.  In my workplace, Usenet is blocked (probably
unintentionally).  I have to post from home.

>So, link to google groups

Sorry, I cannot provide that service.  Trying to access
groups.google.com tells me:

|Couldn’t sign you in
|
|The browser you’re using doesn’t support JavaScript, or has JavaScript
|turned off.
|
|To keep your Google Account secure, try signing in on a browser that
|has JavaScript turned on.

I certainly won't turn on JavaScript for Google, and apparently Google
wants me to log in to a Google account to access groups.google.com.  I
don't have a Google account and I don't want one.

But all I would do is try whether google groups finds the message-ids.
You can do that yourself.

>or, if posts are relatively recent, to
>https://www.novabbs.com/devel/thread.php?group=comp.arch
>would be helpful.

The posts are from 2017; these message-ids are not random-generated.

>I don't know why gnu memcpy is huge. I don't even know if it is
>really *that* huge. But several KB is number that I had seen
>stated by other people.

I stated in one of these messages that I have seen an 11KB memmove in
glibc.  Let's see:

objdump -t /debian8/usr/lib/x86_64-linux-gnu/libc.a|grep .text|grep 'memmove'
00000000000001a0 g   i   .text  0000000000000047 __libc_memmove
0000000000000000 g     F .text  000000000000019f __memmove_sse2
00000000000001a0 g   i   .text  0000000000000047 memmove
0000000000000000 g     F .text.ssse3    0000000000000009 __memmove_chk_ssse3
0000000000000010 g     F .text.ssse3    0000000000002b67 __memmove_ssse3
0000000000000000 g     F .text.ssse3    0000000000000009 __memmove_chk_ssse3_back
0000000000000010 g     F .text.ssse3    0000000000002b06 __memmove_ssse3_back
....

Yes, 11111 bytes for __memmove_ssse3.  Debian 8 is one of the systems
I used at the time.

Let's see how it looks in Debian 12:

objdump -t /usr/lib/x86_64-linux-gnu/libc.a|grep .text|grep 'memmove'|grep -v wmemmove
0000000000000000 l     F .text  00000000000000f6 __libc_memmove_ifunc
0000000000000000 g   i   .text  00000000000000f6 __libc_memmove
0000000000000000 g   i   .text  00000000000000f6 memmove
0000000000000010 g     F .text.avx      000000000000002f __memmove_avx_unaligned
0000000000000080 g     F .text.avx      00000000000006de __memmove_avx_unaligned_erms
0000000000000010 g     F .text.avx.rtm  000000000000002d __memmove_avx_unaligned_rtm
0000000000000080 g     F .text.avx.rtm  00000000000006df __memmove_avx_unaligned_erms_rtm
0000000000000020 g     F .text.avx512   0000000000000009 __memmove_chk_avx512_no_vzeroupper
0000000000000030 g     F .text.avx512   000000000000073b __memmove_avx512_no_vzeroupper
0000000000000010 g     F .text.evex512  0000000000000037 __memmove_avx512_unaligned
0000000000000080 g     F .text.evex512  00000000000007a0 __memmove_avx512_unaligned_erms
0000000000000020 g     F .text  0000000000000009 __memmove_chk_erms
0000000000000030 g     F .text  000000000000002d __memmove_erms
0000000000000010 g     F .text.evex     0000000000000034 __memmove_evex_unaligned
0000000000000080 g     F .text.evex     00000000000007bb __memmove_evex_unaligned_erms
0000000000000010 g     F .text  0000000000000028 __memmove_sse2_unaligned
0000000000000080 g     F .text  0000000000000552 __memmove_sse2_unaligned_erms
0000000000000040 g     F .text.ssse3    0000000000000f3d __memmove_ssse3
0000000000000000 g     F .text  000000000000000e __memmove_chk

So __memmove_ssse3 is no longer that big ("only" 3901 bytes); it's
still the biggest implementation, but many others are quite a bit
bigger than the 0x113=275 bytes of my ssememmove.

- anton
-- 
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
  Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>