Article <velaia$1kbdj$1@dont-email.me>

Deutsch English Français Italiano
<velaia$1kbdj$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!feeds.phibee-telecom.net!2.eu.feeder.erje.net!feeder.erje.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: David Brown <david.brown@hesbynett.no>
Newsgroups: comp.arch
Subject: Re: 80286 protected mode
Date: Tue, 15 Oct 2024 10:53:30 +0200
Organization: A noiseless patient Spider
Lines: 276
Message-ID: <velaia$1kbdj$1@dont-email.me>
References: <2024Oct6.150415@mips.complang.tuwien.ac.at>
 <memo.20241006163428.19028W@jgd.cix.co.uk>
 <2024Oct7.093314@mips.complang.tuwien.ac.at>
 <7c8e5c75ce0f1e7c95ec3ae4bdbc9249@www.novabbs.org>
 <2024Oct8.092821@mips.complang.tuwien.ac.at> <ve5ek3$2jamt$1@dont-email.me>
 <ve6gv4$2o2cj$1@dont-email.me> <ve6olo$2pag3$2@dont-email.me>
 <73e776d6becb377b484c5dcc72b526dc@www.novabbs.org>
 <ve7sco$31tgt$1@dont-email.me>
 <2b31e1343b1f3fadd55ad6b87d879b78@www.novabbs.org>
 <ve99fg$38kta$1@dont-email.me> <veh6j8$q71j$1@dont-email.me>
 <vej5p5$1772o$1@dont-email.me> <vejagr$181vo$1@dont-email.me>
 <vejcqc$1772o$3@dont-email.me> <20241014190856.00003a58@yahoo.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 15 Oct 2024 10:53:31 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="47228e08c2736a5aef9f5441cfbf6fae";
	logging-data="1715635"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX19RQjpcnD37h5g1CEo5cwTKSA8fS2ptMAY="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.11.0
Cancel-Lock: sha1:rDOefY+lktHAid3csfSBpm2gSDI=
In-Reply-To: <20241014190856.00003a58@yahoo.com>
Content-Language: en-GB
Bytes: 14615

On 14/10/2024 18:08, Michael S wrote:
> On Mon, 14 Oct 2024 17:19:40 +0200
> David Brown <david.brown@hesbynett.no> wrote:
> 
>> On 14/10/2024 16:40, Terje Mathisen wrote:
>>> David Brown wrote:
>>>> On 13/10/2024 21:21, Terje Mathisen wrote:
>>>>> David Brown wrote:
>>>>>> On 10/10/2024 20:38, MitchAlsup1 wrote:
>>>>>>> On Thu, 10 Oct 2024 6:31:52 +0000, David Brown wrote:
>>>>>>>   
>>>>>>>> On 09/10/2024 23:37, MitchAlsup1 wrote:
>>>>>>>>> On Wed, 9 Oct 2024 20:22:16 +0000, David Brown wrote:
>>>>>>>>>   
>>>>>>>>>> On 09/10/2024 20:10, Thomas Koenig wrote:
>>>>>>>>>>> David Brown <david.brown@hesbynett.no> schrieb:
>>>>>>>>>>>   
>>>>>>>>>>>> When would you ever /need/ to compare pointers to
>>>>>>>>>>>> different objects?
>>>>>>>>>>>> For almost all C programmers, the answer is "never".
>>>>>>>>>>>
>>>>>>>>>>> Sometimes, it is handy to encode certain conditions in
>>>>>>>>>>> pointers, rather than having only a valid pointer or
>>>>>>>>>>> NULL.Ã‚Â  A compiler, for example, might want to store the
>>>>>>>>>>> fact that an error occurred while parsing a subexpression
>>>>>>>>>>> as a special pointer constant.
>>>>>>>>>>>
>>>>>>>>>>> Compilers often have the unfair advantage, though, that
>>>>>>>>>>> they can rely on what application programmers cannot, their
>>>>>>>>>>> implementation details.Ã‚Â  (Some do not, such as f2c).
>>>>>>>>>>
>>>>>>>>>> Standard library authors have the same superpowers, so that
>>>>>>>>>> they can
>>>>>>>>>> implement an efficient memmove() even though a pure standard
>>>>>>>>>> C programmer cannot (other than by simply calling the
>>>>>>>>>> standard library
>>>>>>>>>> memmove() function!).
>>>>>>>>>
>>>>>>>>> This is more a symptom of bad ISA design/evolution than of
>>>>>>>>> libc writers needing superpowers.
>>>>>>>>
>>>>>>>> No, it is not.Ã‚Â  It has absolutely /nothing/ to do with the
>>>>>>>> ISA.
>>>>>>>
>>>>>>> For example, if ISA contains an MM instruction which is the
>>>>>>> embodiment of memmove() then absolutely no heroics are needed
>>>>>>> of desired in the libc call.
>>>>>>>   
>>>>>>
>>>>>> The existence of a dedicated assembly instruction does not let
>>>>>> you write an efficient memmove() in standard C.Â  That's why I
>>>>>> said there was no connection between the two concepts.
>>>>>>
>>>>>> For some targets, it can be helpful to write memmove() in
>>>>>> assembly or using inline assembly, rather than in non-portable C
>>>>>> (which is the common case).
>>>>>>   
>>>>>>> Thus, it IS a symptom of ISA evolution that one has to rewrite
>>>>>>> memmove() every time wider SIMD registers are available.
>>>>>>
>>>>>> It is not that simple.
>>>>>>
>>>>>> There can often be trade-offs between the speed of memmove() and
>>>>>> memcpy() on large transfers, and the overhead in setting things
>>>>>> up that is proportionally more costly for small transfers.Â
>>>>>> Often that can be eliminated when the compiler optimises the
>>>>>> functions inline - when the compiler knows the size of the
>>>>>> move/copy, it can optimise directly.
>>>>>
>>>>> What you are missing here David is the fact that Mitch's MM is a
>>>>> single instruction which does the entire memmove() operation, and
>>>>> has the inside knowledge about cache (residency at level x? width
>>>>> in bytes)/memory ranges/access rights/etc needed to do so in a
>>>>> very close to optimal manner, for both short and long transfers.
>>>>
>>>> I am not missing that at all.  And I agree that an advanced
>>>> hardware MM instruction could be a very efficient way to implement
>>>> both memcpy and memmove.  (For my own kind of work, I'd worry
>>>> about such looping instructions causing an unbounded increased in
>>>> interrupt latency, but that too is solvable given enough hardware
>>>> effort.)
>>>>
>>>> And I agree that once you have an "MM" (or similar) instruction,
>>>> you don't need to re-write the implementation for your memmove()
>>>> and memcpy() library functions for every new generation of
>>>> processors of a given target family.
>>>>
>>>> What I /don't/ agree with is the claim that you /do/ need to keep
>>>> re-writing your implementations all the time.  You will
>>>> /sometimes/ get benefits from doing so, but it is not as simple as
>>>> Mitch made out.
>>>>>
>>>>> I.e. totally removing the need for compiler tricks or wide
>>>>> register operations.
>>>>>
>>>>> Also apropos the compiler library issue:
>>>>>
>>>>> You start by teaching the compiler about the MM instruction, and
>>>>> to recognize common patterns (just as most compilers already do
>>>>> today), and then the memmove() calls will usually be inlined.
>>>>>   
>>>>
>>>> The original compile library issue was that it is impossible to
>>>> write an efficient memmove() implementation using pure portable
>>>> standard C. That is independent of any ISA, any specialist
>>>> instructions for memory moves, and any compiler optimisations.
>>>> And it is independent of the fact that some good compilers can
>>>> inline at least some calls to memcpy() and memmove() today, using
>>>> whatever instructions are most efficient for the target.
>>>
>>> David, you and Mitch are among my most cherished writers here on
>>> c.arch, I really don't think any of us really disagree, it is just
>>> that we have been discussing two (mostly) orthogonal issues.
>>
>> I agree.  It's a "god dag mann, økseskaft" situation.
>>
>> I have a huge respect for Mitch, his knowledge and experience, and
>> his willingness to share that freely with others.  That's why I have
>> found this very frustrating.
>>
>>>
>>> a) memmove/memcpy are so important that people have been spending a
>>> lot of time & effort trying to make it faster, with the
>>> complication that in general it cannot be implemented in pure C
>>> (which disallows direct comparison of arbitrary pointers).
>>>    
>>
>> Yes.
>>
>> (Unlike memmov(), memcpy() can be implemented in standard C as a
>> simple byte-copy loop, without needing to compare pointers.  But an
>> implementation that copies in larger blocks than a byte requires
>> implementation dependent behaviour to determine alignments, or it
>> must rely on unaligned accesses being allowed by the implementation.)
>>
>>> b) Mitch have, like Andy ("Crazy") Glew many years before, realized
>>> that if a cpu architecture actually has an instruction designed to
>>> do this particular job, it behooves cpu architects to make sure
>>> that it is in fact so fast that it obviates any need for tricky
>>> coding to replace it.
>>
>> Yes.
>>
>>> Ideally, it should be able to copy a single object, up to a cache
>>> line in size, in the same or less time needed to do so manually
>>> with a SIMD 512-bit load followed by a 512-bit store (both ops
>>> masked to not touch anything it shouldn't)
>>>    
>>
>> Yes.
>>
>>> REP MOVSB on x86 does the canonical memcpy() operation, originally
>>> by moving single bytes, and this was so slow that we also had REP
>>> MOVSW (moving 16-bit entities) and then REP MOVSD on the 386 and
>>> REP MOVSQ on 64-bit cpus.
>>>
>>> With a suitable chunk of logic, the basic MOVSB operation could in
>>> fact handle any kinds of alignments and sizes, while doing the
>>> actual transfer at maximum bus speeds, i.e. at least one cache
>>> line/cycle for things already in $L1.
>>>    
>>
>> I agree on all of that.
>>
>> I am quite happy with the argument that suitable hardware can do
>> these basic operations faster than a software loop or the x86 "rep"
>> instructions.
========== REMAINDER OF ARTICLE TRUNCATED ==========