Article <vejagr$181vo$1@dont-email.me>

Deutsch English Français Italiano
<vejagr$181vo$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Terje Mathisen <terje.mathisen@tmsw.no>
Newsgroups: comp.arch
Subject: Re: 80286 protected mode
Date: Mon, 14 Oct 2024 16:40:26 +0200
Organization: A noiseless patient Spider
Lines: 161
Message-ID: <vejagr$181vo$1@dont-email.me>
References: <2024Oct6.150415@mips.complang.tuwien.ac.at>
 <memo.20241006163428.19028W@jgd.cix.co.uk>
 <2024Oct7.093314@mips.complang.tuwien.ac.at>
 <7c8e5c75ce0f1e7c95ec3ae4bdbc9249@www.novabbs.org>
 <2024Oct8.092821@mips.complang.tuwien.ac.at> <ve5ek3$2jamt$1@dont-email.me>
 <ve6gv4$2o2cj$1@dont-email.me> <ve6olo$2pag3$2@dont-email.me>
 <73e776d6becb377b484c5dcc72b526dc@www.novabbs.org>
 <ve7sco$31tgt$1@dont-email.me>
 <2b31e1343b1f3fadd55ad6b87d879b78@www.novabbs.org>
 <ve99fg$38kta$1@dont-email.me> <veh6j8$q71j$1@dont-email.me>
 <vej5p5$1772o$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: quoted-printable
Injection-Date: Mon, 14 Oct 2024 16:40:28 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="9853f7f66d906fd2ff142a1d221b8f4c";
	logging-data="1312760"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX18yTpVnM0EkOqQXFaUr6/anDeTvodODFBNVxxPLiF5z+w=="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
 Firefox/91.0 SeaMonkey/2.53.19
Cancel-Lock: sha1:r5e9PbXxCLchXntefy3FDEH9Nx0=
In-Reply-To: <vej5p5$1772o$1@dont-email.me>
Bytes: 8151

David Brown wrote:
> On 13/10/2024 21:21, Terje Mathisen wrote:
>> David Brown wrote:
>>> On 10/10/2024 20:38, MitchAlsup1 wrote:
>>>> On Thu, 10 Oct 2024 6:31:52 +0000, David Brown wrote:
>>>>
>>>>> On 09/10/2024 23:37, MitchAlsup1 wrote:
>>>>>> On Wed, 9 Oct 2024 20:22:16 +0000, David Brown wrote:
>>>>>>
>>>>>>> On 09/10/2024 20:10, Thomas Koenig wrote:
>>>>>>>> David Brown <david.brown@hesbynett.no> schrieb:
>>>>>>>>
>>>>>>>>> When would you ever /need/ to compare pointers to different=20
>>>>>>>>> objects?
>>>>>>>>> For almost all C programmers, the answer is "never".
>>>>>>>>
>>>>>>>> Sometimes, it is handy to encode certain conditions in pointers,=

>>>>>>>> rather than having only a valid pointer or NULL.=C3=83=E2=80=9A=C3=
=82=C2=A0 A compiler,
>>>>>>>> for example, might want to store the fact that an error occurred=

>>>>>>>> while parsing a subexpression as a special pointer constant.
>>>>>>>>
>>>>>>>> Compilers often have the unfair advantage, though, that they can=

>>>>>>>> rely on what application programmers cannot, their implementatio=
n
>>>>>>>> details.=C3=83=E2=80=9A=C3=82=C2=A0 (Some do not, such as f2c).
>>>>>>>
>>>>>>> Standard library authors have the same superpowers, so that they =
can
>>>>>>> implement an efficient memmove() even though a pure standard C
>>>>>>> programmer cannot (other than by simply calling the standard libr=
ary
>>>>>>> memmove() function!).
>>>>>>
>>>>>> This is more a symptom of bad ISA design/evolution than of libc
>>>>>> writers needing superpowers.
>>>>>
>>>>> No, it is not.=C3=83=E2=80=9A=C3=82=C2=A0 It has absolutely /nothin=
g/ to do with the ISA.
>>>>
>>>> For example, if ISA contains an MM instruction which is the
>>>> embodiment of memmove() then absolutely no heroics are needed
>>>> of desired in the libc call.
>>>>
>>>
>>> The existence of a dedicated assembly instruction does not let you=20
>>> write an efficient memmove() in standard C.=C3=82=C2=A0 That's why I =
said there=20
>>> was no connection between the two concepts.
>>>
>>> For some targets, it can be helpful to write memmove() in assembly or=
=20
>>> using inline assembly, rather than in non-portable C (which is the=20
>>> common case).
>>>
>>>> Thus, it IS a symptom of ISA evolution that one has to rewrite
>>>> memmove() every time wider SIMD registers are available.
>>>
>>> It is not that simple.
>>>
>>> There can often be trade-offs between the speed of memmove() and=20
>>> memcpy() on large transfers, and the overhead in setting things up=20
>>> that is proportionally more costly for small transfers.=C3=82=C2=A0 O=
ften that=20
>>> can be eliminated when the compiler optimises the functions inline - =

>>> when the compiler knows the size of the move/copy, it can optimise=20
>>> directly.
>>
>> What you are missing here David is the fact that Mitch's MM is a=20
>> single instruction which does the entire memmove() operation, and has =

>> the inside knowledge about cache (residency at level x? width in=20
>> bytes)/memory ranges/access rights/etc needed to do so in a very close=
=20
>> to optimal manner, for both short and long transfers.
>=20
> I am not missing that at all.=C2=A0 And I agree that an advanced hardwa=
re MM=20
> instruction could be a very efficient way to implement both memcpy and =

> memmove.=C2=A0 (For my own kind of work, I'd worry about such looping=20
> instructions causing an unbounded increased in interrupt latency, but=20
> that too is solvable given enough hardware effort.)
>=20
> And I agree that once you have an "MM" (or similar) instruction, you=20
> don't need to re-write the implementation for your memmove() and=20
> memcpy() library functions for every new generation of processors of a =

> given target family.
>=20
> What I /don't/ agree with is the claim that you /do/ need to keep=20
> re-writing your implementations all the time.=C2=A0 You will /sometimes=
/ get=20
> benefits from doing so, but it is not as simple as Mitch made out.
>=20
>>
>> I.e. totally removing the need for compiler tricks or wide register=20
>> operations.
>>
>> Also apropos the compiler library issue:
>>
>> You start by teaching the compiler about the MM instruction, and to=20
>> recognize common patterns (just as most compilers already do today),=20
>> and then the memmove() calls will usually be inlined.
>>
>=20
> The original compile library issue was that it is impossible to write a=
n=20
> efficient memmove() implementation using pure portable standard C.=C2=A0=
 That=20
> is independent of any ISA, any specialist instructions for memory moves=
,=20
> and any compiler optimisations.=C2=A0 And it is independent of the fact=
 that=20
> some good compilers can inline at least some calls to memcpy() and=20
> memmove() today, using whatever instructions are most efficient for the=
=20
> target.

David, you and Mitch are among my most cherished writers here on c.arch, =

I really don't think any of us really disagree, it is just that we have=20
been discussing two (mostly) orthogonal issues.

a) memmove/memcpy are so important that people have been spending a lot=20
of time & effort trying to make it faster, with the complication that in =

general it cannot be implemented in pure C (which disallows direct=20
comparison of arbitrary pointers).

b) Mitch have, like Andy ("Crazy") Glew many years before, realized that =

if a cpu architecture actually has an instruction designed to do this=20
particular job, it behooves cpu architects to make sure that it is in=20
fact so fast that it obviates any need for tricky coding to replace it.

Ideally, it should be able to copy a single object, up to a cache line=20
in size, in the same or less time needed to do so manually with a SIMD=20
512-bit load followed by a 512-bit store (both ops masked to not touch=20
anything it shouldn't)

REP MOVSB on x86 does the canonical memcpy() operation, originally by=20
moving single bytes, and this was so slow that we also had REP MOVSW=20
(moving 16-bit entities) and then REP MOVSD on the 386 and REP MOVSQ on=20
64-bit cpus.

With a suitable chunk of logic, the basic MOVSB operation could in fact=20
handle any kinds of alignments and sizes, while doing the actual=20
transfer at maximum bus speeds, i.e. at least one cache line/cycle for=20
things already in $L1.

Terje

--=20
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"