Path: news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail From: anton@mips.complang.tuwien.ac.at (Anton Ertl) Newsgroups: comp.arch Subject: rep movsb vs. simpler instructions for memcpy/memmove (was: Why VAX ..) Date: Wed, 12 Mar 2025 16:46:36 GMT Organization: Institut fuer Computersprachen, Technische Universitaet Wien Lines: 102 Message-ID: <2025Mar12.174636@mips.complang.tuwien.ac.at> References: <2025Mar4.110420@mips.complang.tuwien.ac.at> <2025Mar5.083636@mips.complang.tuwien.ac.at> <2025Mar12.094228@mips.complang.tuwien.ac.at> <20250312114828.00003e99@yahoo.com> <2025Mar12.122836@mips.complang.tuwien.ac.at> <20250312140915.000010a8@yahoo.com> Injection-Date: Wed, 12 Mar 2025 18:25:40 +0100 (CET) Injection-Info: dont-email.me; posting-host="e153970a2f02b13d8544fa3a32e66bd4"; logging-data="2869919"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/IkKiP5SzAnUhZ1Ly7o0Ai" Cancel-Lock: sha1:FRAWz7rhN9oNDR56sE5fEQPeNuQ= X-newsreader: xrn 10.11 Michael S writes: >On Wed, 12 Mar 2025 11:28:36 GMT >anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote: >> My experiments were with the code in >> . > >Non of those are simple loops that I mentioned above. They are not. If you want short code, rep movsb is unbeatable (for memmove(), you have to do a little more, however). >> I posted performance results in >> <2017Sep19.082137@mips.complang.tuwien.ac.at> >> <2017Sep20.184358@mips.complang.tuwien.ac.at> >> <2017Sep23.174313@mips.complang.tuwien.ac.at> >> >> My routines were generally faster than rep movsb, except for pretty >> large blocks (16KB). >> > >Idiots from corporate IT blocked http://al.howardknight.net/ I feel with you. In my workplace, Usenet is blocked (probably unintentionally). I have to post from home. >So, link to google groups Sorry, I cannot provide that service. Trying to access groups.google.com tells me: |Couldn’t sign you in | |The browser you’re using doesn’t support JavaScript, or has JavaScript |turned off. | |To keep your Google Account secure, try signing in on a browser that |has JavaScript turned on. I certainly won't turn on JavaScript for Google, and apparently Google wants me to log in to a Google account to access groups.google.com. I don't have a Google account and I don't want one. But all I would do is try whether google groups finds the message-ids. You can do that yourself. >or, if posts are relatively recent, to >https://www.novabbs.com/devel/thread.php?group=comp.arch >would be helpful. The posts are from 2017; these message-ids are not random-generated. >I don't know why gnu memcpy is huge. I don't even know if it is >really *that* huge. But several KB is number that I had seen >stated by other people. I stated in one of these messages that I have seen an 11KB memmove in glibc. Let's see: objdump -t /debian8/usr/lib/x86_64-linux-gnu/libc.a|grep .text|grep 'memmove' 00000000000001a0 g i .text 0000000000000047 __libc_memmove 0000000000000000 g F .text 000000000000019f __memmove_sse2 00000000000001a0 g i .text 0000000000000047 memmove 0000000000000000 g F .text.ssse3 0000000000000009 __memmove_chk_ssse3 0000000000000010 g F .text.ssse3 0000000000002b67 __memmove_ssse3 0000000000000000 g F .text.ssse3 0000000000000009 __memmove_chk_ssse3_back 0000000000000010 g F .text.ssse3 0000000000002b06 __memmove_ssse3_back .... Yes, 11111 bytes for __memmove_ssse3. Debian 8 is one of the systems I used at the time. Let's see how it looks in Debian 12: objdump -t /usr/lib/x86_64-linux-gnu/libc.a|grep .text|grep 'memmove'|grep -v wmemmove 0000000000000000 l F .text 00000000000000f6 __libc_memmove_ifunc 0000000000000000 g i .text 00000000000000f6 __libc_memmove 0000000000000000 g i .text 00000000000000f6 memmove 0000000000000010 g F .text.avx 000000000000002f __memmove_avx_unaligned 0000000000000080 g F .text.avx 00000000000006de __memmove_avx_unaligned_erms 0000000000000010 g F .text.avx.rtm 000000000000002d __memmove_avx_unaligned_rtm 0000000000000080 g F .text.avx.rtm 00000000000006df __memmove_avx_unaligned_erms_rtm 0000000000000020 g F .text.avx512 0000000000000009 __memmove_chk_avx512_no_vzeroupper 0000000000000030 g F .text.avx512 000000000000073b __memmove_avx512_no_vzeroupper 0000000000000010 g F .text.evex512 0000000000000037 __memmove_avx512_unaligned 0000000000000080 g F .text.evex512 00000000000007a0 __memmove_avx512_unaligned_erms 0000000000000020 g F .text 0000000000000009 __memmove_chk_erms 0000000000000030 g F .text 000000000000002d __memmove_erms 0000000000000010 g F .text.evex 0000000000000034 __memmove_evex_unaligned 0000000000000080 g F .text.evex 00000000000007bb __memmove_evex_unaligned_erms 0000000000000010 g F .text 0000000000000028 __memmove_sse2_unaligned 0000000000000080 g F .text 0000000000000552 __memmove_sse2_unaligned_erms 0000000000000040 g F .text.ssse3 0000000000000f3d __memmove_ssse3 0000000000000000 g F .text 000000000000000e __memmove_chk So __memmove_ssse3 is no longer that big ("only" 3901 bytes); it's still the biggest implementation, but many others are quite a bit bigger than the 0x113=275 bytes of my ssememmove. - anton -- 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.' Mitch Alsup,