Path: ...!feeds.phibee-telecom.net!news.mixmin.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Michael S <already5chosen@yahoo.com>
Newsgroups: comp.arch
Subject: Re: 80286 protected mode
Date: Mon, 14 Oct 2024 19:08:56 +0300
Organization: A noiseless patient Spider
Lines: 218
Message-ID: <20241014190856.00003a58@yahoo.com>
References: <2024Oct6.150415@mips.complang.tuwien.ac.at>
	<memo.20241006163428.19028W@jgd.cix.co.uk>
	<2024Oct7.093314@mips.complang.tuwien.ac.at>
	<7c8e5c75ce0f1e7c95ec3ae4bdbc9249@www.novabbs.org>
	<2024Oct8.092821@mips.complang.tuwien.ac.at>
	<ve5ek3$2jamt$1@dont-email.me>
	<ve6gv4$2o2cj$1@dont-email.me>
	<ve6olo$2pag3$2@dont-email.me>
	<73e776d6becb377b484c5dcc72b526dc@www.novabbs.org>
	<ve7sco$31tgt$1@dont-email.me>
	<2b31e1343b1f3fadd55ad6b87d879b78@www.novabbs.org>
	<ve99fg$38kta$1@dont-email.me>
	<veh6j8$q71j$1@dont-email.me>
	<vej5p5$1772o$1@dont-email.me>
	<vejagr$181vo$1@dont-email.me>
	<vejcqc$1772o$3@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Injection-Date: Mon, 14 Oct 2024 18:08:25 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="b3e063db664c626e2a7d1761c39b6d49";
	logging-data="1204195"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX18kp9YqcllsUTqVlMNIuZwmuC5kn8VSqW8="
Cancel-Lock: sha1:vq8WWUGzrlrBRhYzV4S/4L1ZMdA=
X-Newsreader: Claws Mail 3.19.1 (GTK+ 2.24.33; x86_64-w64-mingw32)
Bytes: 11747

On Mon, 14 Oct 2024 17:19:40 +0200
David Brown <david.brown@hesbynett.no> wrote:

> On 14/10/2024 16:40, Terje Mathisen wrote:
> > David Brown wrote: =20
> >> On 13/10/2024 21:21, Terje Mathisen wrote: =20
> >>> David Brown wrote: =20
> >>>> On 10/10/2024 20:38, MitchAlsup1 wrote: =20
> >>>>> On Thu, 10 Oct 2024 6:31:52 +0000, David Brown wrote:
> >>>>> =20
> >>>>>> On 09/10/2024 23:37, MitchAlsup1 wrote: =20
> >>>>>>> On Wed, 9 Oct 2024 20:22:16 +0000, David Brown wrote:
> >>>>>>> =20
> >>>>>>>> On 09/10/2024 20:10, Thomas Koenig wrote: =20
> >>>>>>>>> David Brown <david.brown@hesbynett.no> schrieb:
> >>>>>>>>> =20
> >>>>>>>>>> When would you ever /need/ to compare pointers to
> >>>>>>>>>> different objects?
> >>>>>>>>>> For almost all C programmers, the answer is "never". =20
> >>>>>>>>>
> >>>>>>>>> Sometimes, it is handy to encode certain conditions in
> >>>>>>>>> pointers, rather than having only a valid pointer or
> >>>>>>>>> NULL.=C3=83=E2=80=9A=C3=82=C2=A0 A compiler, for example, might=
 want to store the
> >>>>>>>>> fact that an error occurred while parsing a subexpression
> >>>>>>>>> as a special pointer constant.
> >>>>>>>>>
> >>>>>>>>> Compilers often have the unfair advantage, though, that
> >>>>>>>>> they can rely on what application programmers cannot, their
> >>>>>>>>> implementation details.=C3=83=E2=80=9A=C3=82=C2=A0 (Some do not=
, such as f2c). =20
> >>>>>>>>
> >>>>>>>> Standard library authors have the same superpowers, so that
> >>>>>>>> they can
> >>>>>>>> implement an efficient memmove() even though a pure standard
> >>>>>>>> C programmer cannot (other than by simply calling the
> >>>>>>>> standard library
> >>>>>>>> memmove() function!). =20
> >>>>>>>
> >>>>>>> This is more a symptom of bad ISA design/evolution than of
> >>>>>>> libc writers needing superpowers. =20
> >>>>>>
> >>>>>> No, it is not.=C3=83=E2=80=9A=C3=82=C2=A0 It has absolutely /nothi=
ng/ to do with the
> >>>>>> ISA. =20
> >>>>>
> >>>>> For example, if ISA contains an MM instruction which is the
> >>>>> embodiment of memmove() then absolutely no heroics are needed
> >>>>> of desired in the libc call.
> >>>>> =20
> >>>>
> >>>> The existence of a dedicated assembly instruction does not let
> >>>> you write an efficient memmove() in standard C.=C3=82=C2=A0 That's w=
hy I
> >>>> said there was no connection between the two concepts.
> >>>>
> >>>> For some targets, it can be helpful to write memmove() in
> >>>> assembly or using inline assembly, rather than in non-portable C
> >>>> (which is the common case).
> >>>> =20
> >>>>> Thus, it IS a symptom of ISA evolution that one has to rewrite
> >>>>> memmove() every time wider SIMD registers are available. =20
> >>>>
> >>>> It is not that simple.
> >>>>
> >>>> There can often be trade-offs between the speed of memmove() and=20
> >>>> memcpy() on large transfers, and the overhead in setting things
> >>>> up that is proportionally more costly for small transfers.=C3=82
> >>>> Often that can be eliminated when the compiler optimises the
> >>>> functions inline - when the compiler knows the size of the
> >>>> move/copy, it can optimise directly. =20
> >>>
> >>> What you are missing here David is the fact that Mitch's MM is a=20
> >>> single instruction which does the entire memmove() operation, and
> >>> has the inside knowledge about cache (residency at level x? width
> >>> in bytes)/memory ranges/access rights/etc needed to do so in a
> >>> very close to optimal manner, for both short and long transfers. =20
> >>
> >> I am not missing that at all.=C2=A0 And I agree that an advanced
> >> hardware MM instruction could be a very efficient way to implement
> >> both memcpy and memmove.=C2=A0 (For my own kind of work, I'd worry
> >> about such looping instructions causing an unbounded increased in
> >> interrupt latency, but that too is solvable given enough hardware
> >> effort.)
> >>
> >> And I agree that once you have an "MM" (or similar) instruction,
> >> you don't need to re-write the implementation for your memmove()
> >> and memcpy() library functions for every new generation of
> >> processors of a given target family.
> >>
> >> What I /don't/ agree with is the claim that you /do/ need to keep=20
> >> re-writing your implementations all the time.=C2=A0 You will
> >> /sometimes/ get benefits from doing so, but it is not as simple as
> >> Mitch made out.=20
> >>>
> >>> I.e. totally removing the need for compiler tricks or wide
> >>> register operations.
> >>>
> >>> Also apropos the compiler library issue:
> >>>
> >>> You start by teaching the compiler about the MM instruction, and
> >>> to recognize common patterns (just as most compilers already do
> >>> today), and then the memmove() calls will usually be inlined.
> >>> =20
> >>
> >> The original compile library issue was that it is impossible to
> >> write an efficient memmove() implementation using pure portable
> >> standard C. That is independent of any ISA, any specialist
> >> instructions for memory moves, and any compiler optimisations.
> >> And it is independent of the fact that some good compilers can
> >> inline at least some calls to memcpy() and memmove() today, using
> >> whatever instructions are most efficient for the target. =20
> >=20
> > David, you and Mitch are among my most cherished writers here on
> > c.arch, I really don't think any of us really disagree, it is just
> > that we have been discussing two (mostly) orthogonal issues. =20
>=20
> I agree.  It's a "god dag mann, =C3=B8kseskaft" situation.
>=20
> I have a huge respect for Mitch, his knowledge and experience, and
> his willingness to share that freely with others.  That's why I have
> found this very frustrating.
>=20
> >=20
> > a) memmove/memcpy are so important that people have been spending a
> > lot of time & effort trying to make it faster, with the
> > complication that in general it cannot be implemented in pure C
> > (which disallows direct comparison of arbitrary pointers).
> >  =20
>=20
> Yes.
>=20
> (Unlike memmov(), memcpy() can be implemented in standard C as a
> simple byte-copy loop, without needing to compare pointers.  But an=20
> implementation that copies in larger blocks than a byte requires=20
> implementation dependent behaviour to determine alignments, or it
> must rely on unaligned accesses being allowed by the implementation.)
>=20
> > b) Mitch have, like Andy ("Crazy") Glew many years before, realized
> > that if a cpu architecture actually has an instruction designed to
> > do this particular job, it behooves cpu architects to make sure
> > that it is in fact so fast that it obviates any need for tricky
> > coding to replace it.=20
>=20
> Yes.
>=20
> > Ideally, it should be able to copy a single object, up to a cache
> > line in size, in the same or less time needed to do so manually
> > with a SIMD 512-bit load followed by a 512-bit store (both ops
> > masked to not touch anything it shouldn't)
> >  =20
>=20
> Yes.
>=20
> > REP MOVSB on x86 does the canonical memcpy() operation, originally
> > by moving single bytes, and this was so slow that we also had REP
> > MOVSW (moving 16-bit entities) and then REP MOVSD on the 386 and
> > REP MOVSQ on 64-bit cpus.
> >=20
> > With a suitable chunk of logic, the basic MOVSB operation could in
> > fact handle any kinds of alignments and sizes, while doing the
> > actual transfer at maximum bus speeds, i.e. at least one cache
> > line/cycle for things already in $L1.
> >  =20
>=20
> I agree on all of that.
========== REMAINDER OF ARTICLE TRUNCATED ==========