Deutsch English Français Italiano |
<2024Sep12.162042@mips.complang.tuwien.ac.at> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!2.eu.feeder.erje.net!feeder.erje.net!news.swapon.de!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: anton@mips.complang.tuwien.ac.at (Anton Ertl) Newsgroups: comp.arch Subject: Re: Computer architects leaving Intel... Date: Thu, 12 Sep 2024 14:20:42 GMT Organization: Institut fuer Computersprachen, Technische Universitaet Wien Lines: 101 Message-ID: <2024Sep12.162042@mips.complang.tuwien.ac.at> References: <2024Aug30.161204@mips.complang.tuwien.ac.at> <17d615c6a9e70e9fabe1721c55cfa176@www.novabbs.org> <86v7zep35n.fsf@linuxsc.com> <20240902180903.000035ee@yahoo.com> <vb7ank$3d0c5$1@dont-email.me> <20240903190928.00002f92@yahoo.com> <vb7idh$3e2af$1@dont-email.me> <86seufo11j.fsf@linuxsc.com> <vba6qa$3u4jc$1@dont-email.me> <1246395e530759ac79805e45b3830d8f@www.novabbs.org> <8634m9lga1.fsf@linuxsc.com> <2024Sep9.090725@mips.complang.tuwien.ac.at> <86y13xf2c9.fsf@linuxsc.com> Injection-Date: Thu, 12 Sep 2024 16:46:38 +0200 (CEST) Injection-Info: dont-email.me; posting-host="b3a05cc3d9bbc2cf43d4db2bfea46e1b"; logging-data="308519"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+bD3MyJXqYGt7SRFO3YjER" Cancel-Lock: sha1:poQzPNuKS+gYxrICiKueoCYL6XE= X-newsreader: xrn 10.11 Bytes: 4773 Tim Rentsch <tr.17687@z991.linuxsc.com> writes: >anton@mips.complang.tuwien.ac.at (Anton Ertl) writes: > >[considering which way to copy with memmove()] > >> If the two memory blocks don't overlap, memmove() can use the >> fastest stride. [...] >> >> The way to go for memmove() is: >> >> On hardware where positive stride is faster: >> >> if (((uintptr)(dest-src)) >= len) >> return memcpy_posstride(dest,src,len) >> else >> return memcpy_negstride(dest,src,len) >> >> On hardware where the negative stride is faster: >> >> if (((uintptr)(src-dest)) >= len) >> return memcpy_negstride(dest,src,len) >> else >> return memcpy_posstride(dest,src,len) >> >> And I expect that my test is undefined behaviour, but most people >> except the UB advocates should understand what I mean. .... >Last but not least, having two different code blocks for the >different preferences is clunky. The two blocks can be >combined by fusing the two test expressions into a single >expression, as for example > >#ifndef PREFER_UPWARDS >#define PREFER_UPWARDS 1 >#endif/*PREFER_UPWARDS*/ > >extern void* ascending_copy( void*, const void*, size_t ); >extern void* descending_copy( void*, const void*, size_t ); > >void * >good_memmove( void *vd, const void *vs, size_t n ){ > const char *d = vd; > const char *s = vs; > _Bool upwards = PREFER_UPWARDS ? d-s +0ull >= n : s-d +0ull < n; > > return > upwards > ? ascending_copy( vd, vs, n ) > : descending_copy( vd, vs, n ); >} > >Using the preprocessor symbol PREFER_UPWARDS to select between >the two preferences (ascending or descending) allows the choice >to made by a -D compiler option, and we can expect the compiler >to optimize away the part of the test that is never used. That's clever, but for usage in glibc or the like the clunky version is the preferred one: memmove() is usually called through the dynamic linking mechanism, and which implementation is actually called is selected based on the hardware that it runs on (what does it do when the program is linked statically?). There seem to be quite a few memmove() (and __memmove_chk()) implementations in glibc-2.36 on AMD64: __memmove_chk __memmove_sse2_unaligned_erms __memmove_chk __memmove_chk_erms __memmove_chk_evex_unaligned __memmove_chk_avx_unaligned __memmove_chk_ssse3 __memmove_chk_sse2_unaligned __memmove_erms __memmove_avx512_unaligned __memmove_evex_unaligned __memmove_evex_unaligned_erms __memmove_avx_unaligned __memmove_avx_unaligned_erms __memmove_avx_unaligned_rtm __memmove_ssse3 __memmove_sse2_unaligned __memmove_chk_sse2_unaligned_erms __memmove_chk_avx512_no_vzeroupper __memmove_chk_avx512_unaligned __memmove_chk_avx512_unaligned_erms __memmove_chk_evex_unaligned_erms __memmove_chk_avx_unaligned_erms __memmove_chk_avx_unaligned_rtm __memmove_chk_avx_unaligned_erms_rtm __memmove_avx512_no_vzeroupper __memmove_avx512_unaligned_erms __memmove_avx_unaligned_erms_rtm From what I read, __memmove_chk() (which has an additional destlen parameter) is apparently not intended to be called explicitly from the source code, so I guess that some compilers generate calls to it. - anton -- 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.' Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>