Deutsch English Français Italiano |
<velaia$1kbdj$1@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!feeds.phibee-telecom.net!2.eu.feeder.erje.net!feeder.erje.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: David Brown <david.brown@hesbynett.no> Newsgroups: comp.arch Subject: Re: 80286 protected mode Date: Tue, 15 Oct 2024 10:53:30 +0200 Organization: A noiseless patient Spider Lines: 276 Message-ID: <velaia$1kbdj$1@dont-email.me> References: <2024Oct6.150415@mips.complang.tuwien.ac.at> <memo.20241006163428.19028W@jgd.cix.co.uk> <2024Oct7.093314@mips.complang.tuwien.ac.at> <7c8e5c75ce0f1e7c95ec3ae4bdbc9249@www.novabbs.org> <2024Oct8.092821@mips.complang.tuwien.ac.at> <ve5ek3$2jamt$1@dont-email.me> <ve6gv4$2o2cj$1@dont-email.me> <ve6olo$2pag3$2@dont-email.me> <73e776d6becb377b484c5dcc72b526dc@www.novabbs.org> <ve7sco$31tgt$1@dont-email.me> <2b31e1343b1f3fadd55ad6b87d879b78@www.novabbs.org> <ve99fg$38kta$1@dont-email.me> <veh6j8$q71j$1@dont-email.me> <vej5p5$1772o$1@dont-email.me> <vejagr$181vo$1@dont-email.me> <vejcqc$1772o$3@dont-email.me> <20241014190856.00003a58@yahoo.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Date: Tue, 15 Oct 2024 10:53:31 +0200 (CEST) Injection-Info: dont-email.me; posting-host="47228e08c2736a5aef9f5441cfbf6fae"; logging-data="1715635"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19RQjpcnD37h5g1CEo5cwTKSA8fS2ptMAY=" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.11.0 Cancel-Lock: sha1:rDOefY+lktHAid3csfSBpm2gSDI= In-Reply-To: <20241014190856.00003a58@yahoo.com> Content-Language: en-GB Bytes: 14615 On 14/10/2024 18:08, Michael S wrote: > On Mon, 14 Oct 2024 17:19:40 +0200 > David Brown <david.brown@hesbynett.no> wrote: > >> On 14/10/2024 16:40, Terje Mathisen wrote: >>> David Brown wrote: >>>> On 13/10/2024 21:21, Terje Mathisen wrote: >>>>> David Brown wrote: >>>>>> On 10/10/2024 20:38, MitchAlsup1 wrote: >>>>>>> On Thu, 10 Oct 2024 6:31:52 +0000, David Brown wrote: >>>>>>> >>>>>>>> On 09/10/2024 23:37, MitchAlsup1 wrote: >>>>>>>>> On Wed, 9 Oct 2024 20:22:16 +0000, David Brown wrote: >>>>>>>>> >>>>>>>>>> On 09/10/2024 20:10, Thomas Koenig wrote: >>>>>>>>>>> David Brown <david.brown@hesbynett.no> schrieb: >>>>>>>>>>> >>>>>>>>>>>> When would you ever /need/ to compare pointers to >>>>>>>>>>>> different objects? >>>>>>>>>>>> For almost all C programmers, the answer is "never". >>>>>>>>>>> >>>>>>>>>>> Sometimes, it is handy to encode certain conditions in >>>>>>>>>>> pointers, rather than having only a valid pointer or >>>>>>>>>>> NULL. A compiler, for example, might want to store the >>>>>>>>>>> fact that an error occurred while parsing a subexpression >>>>>>>>>>> as a special pointer constant. >>>>>>>>>>> >>>>>>>>>>> Compilers often have the unfair advantage, though, that >>>>>>>>>>> they can rely on what application programmers cannot, their >>>>>>>>>>> implementation details. (Some do not, such as f2c). >>>>>>>>>> >>>>>>>>>> Standard library authors have the same superpowers, so that >>>>>>>>>> they can >>>>>>>>>> implement an efficient memmove() even though a pure standard >>>>>>>>>> C programmer cannot (other than by simply calling the >>>>>>>>>> standard library >>>>>>>>>> memmove() function!). >>>>>>>>> >>>>>>>>> This is more a symptom of bad ISA design/evolution than of >>>>>>>>> libc writers needing superpowers. >>>>>>>> >>>>>>>> No, it is not. It has absolutely /nothing/ to do with the >>>>>>>> ISA. >>>>>>> >>>>>>> For example, if ISA contains an MM instruction which is the >>>>>>> embodiment of memmove() then absolutely no heroics are needed >>>>>>> of desired in the libc call. >>>>>>> >>>>>> >>>>>> The existence of a dedicated assembly instruction does not let >>>>>> you write an efficient memmove() in standard C. That's why I >>>>>> said there was no connection between the two concepts. >>>>>> >>>>>> For some targets, it can be helpful to write memmove() in >>>>>> assembly or using inline assembly, rather than in non-portable C >>>>>> (which is the common case). >>>>>> >>>>>>> Thus, it IS a symptom of ISA evolution that one has to rewrite >>>>>>> memmove() every time wider SIMD registers are available. >>>>>> >>>>>> It is not that simple. >>>>>> >>>>>> There can often be trade-offs between the speed of memmove() and >>>>>> memcpy() on large transfers, and the overhead in setting things >>>>>> up that is proportionally more costly for small transfers. >>>>>> Often that can be eliminated when the compiler optimises the >>>>>> functions inline - when the compiler knows the size of the >>>>>> move/copy, it can optimise directly. >>>>> >>>>> What you are missing here David is the fact that Mitch's MM is a >>>>> single instruction which does the entire memmove() operation, and >>>>> has the inside knowledge about cache (residency at level x? width >>>>> in bytes)/memory ranges/access rights/etc needed to do so in a >>>>> very close to optimal manner, for both short and long transfers. >>>> >>>> I am not missing that at all. And I agree that an advanced >>>> hardware MM instruction could be a very efficient way to implement >>>> both memcpy and memmove. (For my own kind of work, I'd worry >>>> about such looping instructions causing an unbounded increased in >>>> interrupt latency, but that too is solvable given enough hardware >>>> effort.) >>>> >>>> And I agree that once you have an "MM" (or similar) instruction, >>>> you don't need to re-write the implementation for your memmove() >>>> and memcpy() library functions for every new generation of >>>> processors of a given target family. >>>> >>>> What I /don't/ agree with is the claim that you /do/ need to keep >>>> re-writing your implementations all the time. You will >>>> /sometimes/ get benefits from doing so, but it is not as simple as >>>> Mitch made out. >>>>> >>>>> I.e. totally removing the need for compiler tricks or wide >>>>> register operations. >>>>> >>>>> Also apropos the compiler library issue: >>>>> >>>>> You start by teaching the compiler about the MM instruction, and >>>>> to recognize common patterns (just as most compilers already do >>>>> today), and then the memmove() calls will usually be inlined. >>>>> >>>> >>>> The original compile library issue was that it is impossible to >>>> write an efficient memmove() implementation using pure portable >>>> standard C. That is independent of any ISA, any specialist >>>> instructions for memory moves, and any compiler optimisations. >>>> And it is independent of the fact that some good compilers can >>>> inline at least some calls to memcpy() and memmove() today, using >>>> whatever instructions are most efficient for the target. >>> >>> David, you and Mitch are among my most cherished writers here on >>> c.arch, I really don't think any of us really disagree, it is just >>> that we have been discussing two (mostly) orthogonal issues. >> >> I agree. It's a "god dag mann, økseskaft" situation. >> >> I have a huge respect for Mitch, his knowledge and experience, and >> his willingness to share that freely with others. That's why I have >> found this very frustrating. >> >>> >>> a) memmove/memcpy are so important that people have been spending a >>> lot of time & effort trying to make it faster, with the >>> complication that in general it cannot be implemented in pure C >>> (which disallows direct comparison of arbitrary pointers). >>> >> >> Yes. >> >> (Unlike memmov(), memcpy() can be implemented in standard C as a >> simple byte-copy loop, without needing to compare pointers. But an >> implementation that copies in larger blocks than a byte requires >> implementation dependent behaviour to determine alignments, or it >> must rely on unaligned accesses being allowed by the implementation.) >> >>> b) Mitch have, like Andy ("Crazy") Glew many years before, realized >>> that if a cpu architecture actually has an instruction designed to >>> do this particular job, it behooves cpu architects to make sure >>> that it is in fact so fast that it obviates any need for tricky >>> coding to replace it. >> >> Yes. >> >>> Ideally, it should be able to copy a single object, up to a cache >>> line in size, in the same or less time needed to do so manually >>> with a SIMD 512-bit load followed by a 512-bit store (both ops >>> masked to not touch anything it shouldn't) >>> >> >> Yes. >> >>> REP MOVSB on x86 does the canonical memcpy() operation, originally >>> by moving single bytes, and this was so slow that we also had REP >>> MOVSW (moving 16-bit entities) and then REP MOVSD on the 386 and >>> REP MOVSQ on 64-bit cpus. >>> >>> With a suitable chunk of logic, the basic MOVSB operation could in >>> fact handle any kinds of alignments and sizes, while doing the >>> actual transfer at maximum bus speeds, i.e. at least one cache >>> line/cycle for things already in $L1. >>> >> >> I agree on all of that. >> >> I am quite happy with the argument that suitable hardware can do >> these basic operations faster than a software loop or the x86 "rep" >> instructions. ========== REMAINDER OF ARTICLE TRUNCATED ==========