Article <2025May3.081100@mips.complang.tuwien.ac.at>

Deutsch English Français Italiano
<2025May3.081100@mips.complang.tuwien.ac.at>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: DMA is obsolete
Date: Sat, 03 May 2025 06:11:00 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 267
Message-ID: <2025May3.081100@mips.complang.tuwien.ac.at>
References: <vuj131$fnu$1@gal.iecc.com> <5a77c46910dd2100886ce6fc44c4c460@www.novabbs.org> <vv19rs$t2d$1@reader1.panix.com> <2025May2.073450@mips.complang.tuwien.ac.at> <vv2mqb$hem$1@reader1.panix.com>
Injection-Date: Sat, 03 May 2025 09:59:55 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="d230db8aa85e68b3bc438415dc7f1948";
	logging-data="3264090"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX18sc0gJXwPRRmeL5doTsFq7"
Cancel-Lock: sha1:o1VQeq9gcCGREwsPs8upKmeuep4=
X-newsreader: xrn 10.11

cross@spitfire.i.gajendra.net (Dan Cross) writes:
>In article <2025May2.073450@mips.complang.tuwien.ac.at>,
>Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
>>I think it's the same thing as Greenspun's tenth rule: First you find
>>that a classical DMA engine is too limiting, then you find that an A53
>>is too limiting, and eventually you find that it would be practical to
>>run the ISA of the main cores.  In particular, it allows you to use
>>the toolchain of the main cores for developing them,
>
>These are issues solveable with the software architecture and
>build system for the host OS.

Certainly, one can work around many bad decisions, and in reality one
has to work around some bad decisions, but the issue here is not
whether "the issues are solvable", but which decision leads to better
or worse consequences.

>The important characteristic is
>that the software coupling makes architectural sense, and that
>simply does not require using the same ISA across IPs.

IP?  Internet Protocol?  Software Coupling sounds to me like a concept
from Constantine out of my Software engineering class.  I guess you
did not mean either, but it's unclear what you mean.

In any case, I have made arguments why it would make sense to use the
same ISA as for the OS for programming the cores that replace DMA
engines.  I will discuss your counterarguments below, but the most
important one to me seems to be that these cores would cost more than
with a different ISA.  There is something to that, but when the
application ISA is cheap to implement (e.g., RV64GC), that cost is
small; it may be more an argument for also selecting the
cheap-to-implement ISA for the OS/application cores.

>Indeed, consider AMD's Zen CPUs; the PSP/ASP/whatever it's
>called these days is an ARM core while the big CPUs are x86.
>I'm pretty sure there's an Xtensa DSP in there to do DRAM and
>timing and PCIe link training.

The PSPs are not programmable by the OS or application programmers, so
using the same ISA would not benefit the OS or application
programmers.  By contrast, the idea for the DMA replacement engines is
that they are programmable by the OS and maybe the application
programmers, and that changes whether the same ISA is beneficial.

What is "ASP/whatever"?

>Similarly with the ME on Intel.

Last I read about it, ME uses a core developed by Intel with IA-32 or
AMD64; but in any case, the ME is not programmable by OS or
application programmers, either.

>A BMC might be running on whatever.

Again, a BMC is not programmable by OS or application programmers.

>We increasingly see ARM
>based SBCs that have small RISC-V microcontroller-class cores
>embedded in the SoC for exactly this sort of thing.

That's interesting; it points to RISC-V being cheaper to implement
than ARM.  As for "that sort of thing", they are all not programmable
by OS or application programmers, so see above.

>Our hardware RoT

?

>The problem is when such service cores are hidden (as they are
>in the case of the PSP, SMU, MPIO, and similar components, to
>use AMD as the example) and treated like black boxes by
>software.  It's really cool that I can configure the IO crossbar
>in useful way tailored to specific configurations, but it's much
>less cool that I have to do what amounts to an RPC over the SMN
>to some totally undocumented entity somewhere in the SoC to do
>it.  Bluntly, as an OS person, I do not want random bits of code
>running anywhere on my machine that I am not at least aware of
>(yes, this includes firmware blobs on devices).

Well, one goes with the other.  If you design the hardware for being
programmed by the OS programmers, you use the same ISA for all the
cores that the OS programmers program, whereas if you design the
hardware as programmed by "firmware" programmers, you use a
cheap-to-implement ISA and design the whole thing such that it is
opaque to OS programmers and only offers some certain capabilities to
OS programmers.

And that's not just limited to ISAs.  A very successful example is the
way that flash memory is usually exposed to OSs: as a block device
like a plain old hard disk, and all the idiosyncracies of flash are
hidden in the device behind a flash translation layer that is
implemented by a microcontroller on the device.

What's "SMN"?

>>and you can also
>>use the facilities of the main cores (e.g., debugging features that
>>may be absent of the I/O cores) during development.
>
>This is interesting, but we've found it more useful going the
>other way around.  We do most of our debugging via the SP.
>Since The SP is also responsible for system initialization and
>holding x86 in reset until we're reading for it to start
>running, it's the obvious nexus for debugging the system
>holistically.

Sure, for debugging on the core-dump level that's useful.  I was
thinking about watchpoint and breakpoint registers and performance
counters that one may not want to implement on the DMA-replacement
core, but that is implemented on the OS/application cores.

>>Marking the binaries that should be able to run on the IO service
>>processors with some flag, and letting the component of the OS that
>>assigns processes to cores heed this flag is not rocket science.
>
>I agree, that's easy.  And yet, mistakes will be made, and there
>will be tension between wanting to dedicate those CPUs to IO
>services and wanting to use them for GP programs: I can easily
>imagine a paper where someone modifies a scheduler to move IO
>bound programs to those cores.  Using a different ISA obviates
>most of that, and provides an (admittedly modest) security benefit.

If there really is such tension, that indicates that such cores would
be useful for general-purpose use.  That makes the case for using the
same ISA even stronger.

As for "mistakes will be made", that also goes the other way: With a
separate toolchain for the DMA-replacement ISA, there is lots of
opportunity for mistakes.

As for "security benefit", where is that supposed to come from?  What
attack scenario do you have in mind where that "security benefit"
could materialize?

>And if I already have to modify or configure the OS to
>accommodate the existence of these things in the first place,
>then accommodating an ISA difference really isn't that much
>extra work.  The critical observation is that a typical SMP view
>of the world no longer makes sense for the system architecture,
>and trying to shoehorn that model onto the hardware reality is
>just going to cause frustration.

The shared-memory multiprocessing view of the world is very
successful, while distributed-memory computers are limited to
supercomputing and other areas where hardware cost still dominates
over software cost (i.e., where the software crisis has not happened
yet); as an example of the lack of success of the distributed-memory
paradigm, take the PlayStation 3; programmers found it too hard to
work with, so they did not use the hardware well, and eventually Sony
decided to go for an SMP machine for the PlayStation 4 and 5.

OTOH, one can say that the way many peripherals work on
general-purpose computers is more along the lines of
distributed-memory; but that's probably due to the relative hardware
and software costs for that peripheral.  Sure, the performance
characteristics are non-uniform (NUMA) in many cases, but 1) caches
tend to smooth over that, and 2) most of the code is not
performance-critical, so it just needs to run, which is easier to
achieve with SMP and harder with distributed memory.

Sure, people have argued for advantages of other models for decades,
like you do now, but SMP has usually won.

>>>>On the other hand, you buy a motherboard with said ASIC core,
>>>>and you can boot the MB without putting a big chip in the
>>>>socket--but you may have to deal with scant DRAM since the
>>>>big centralized chip contains teh memory controller.
>>>
>>>A neat hack for bragging rights, but not terribly practical?
>>
>>Very practical for updating the firmware of the board to support the
>>big chip you want to put in the socket (called "BIOS FlashBack" in
>>connection with AMD big chips).
>
>"BIOS", as loaded from the EFS by the ABL on the PSP on EPYC
>class chips, is usually stored in a QSPI flash on the main
>board (though starting with Turin you _can_ boot via eSPI).
>Strictly speaking, you don't _need_ an x86 core to rewrite that.
>On our machines, we do that from the SP, but we don't use AGESA
>or UEFI: all of the platform enablement stuff done in PEI and
>DXE we do directly in the host OS.

EFS?  ABL?  QSPI? eSPI?  PEI?  DXE?

========== REMAINDER OF ARTICLE TRUNCATED ==========