Article <vv4rcd$3bbi0$1@dont-email.me>

Deutsch English Français Italiano
<vv4rcd$3bbi0$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail
From: Robert Finch <robfi680@gmail.com>
Newsgroups: comp.arch
Subject: Re: DMA is obsolete
Date: Sat, 3 May 2025 06:32:44 -0400
Organization: A noiseless patient Spider
Lines: 282
Message-ID: <vv4rcd$3bbi0$1@dont-email.me>
References: <vuj131$fnu$1@gal.iecc.com>
 <5a77c46910dd2100886ce6fc44c4c460@www.novabbs.org>
 <vv19rs$t2d$1@reader1.panix.com> <2025May2.073450@mips.complang.tuwien.ac.at>
 <vv2mqb$hem$1@reader1.panix.com> <2025May3.081100@mips.complang.tuwien.ac.at>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 03 May 2025 12:32:46 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="8708c899f85eff90a4ac9e3bad19d5f1";
	logging-data="3518016"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1/fw6fe7iQ47yi65OOqLmEXENUL3lu4vYw="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:qCe7ty0byb3DrZRD868qX761MdU=
Content-Language: en-US
In-Reply-To: <2025May3.081100@mips.complang.tuwien.ac.at>
Bytes: 15434

On 2025-05-03 2:11 a.m., Anton Ertl wrote:
> cross@spitfire.i.gajendra.net (Dan Cross) writes:
>> In article <2025May2.073450@mips.complang.tuwien.ac.at>,
>> Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
>>> I think it's the same thing as Greenspun's tenth rule: First you find
>>> that a classical DMA engine is too limiting, then you find that an A53
>>> is too limiting, and eventually you find that it would be practical to
>>> run the ISA of the main cores.  In particular, it allows you to use
>>> the toolchain of the main cores for developing them,
>>
>> These are issues solveable with the software architecture and
>> build system for the host OS.
> 
> Certainly, one can work around many bad decisions, and in reality one
> has to work around some bad decisions, but the issue here is not
> whether "the issues are solvable", but which decision leads to better
> or worse consequences.
> 
>> The important characteristic is
>> that the software coupling makes architectural sense, and that
>> simply does not require using the same ISA across IPs.
> 
> IP?  Internet Protocol?  Software Coupling sounds to me like a concept
> from Constantine out of my Software engineering class.  I guess you
> did not mean either, but it's unclear what you mean.
> 
> In any case, I have made arguments why it would make sense to use the
> same ISA as for the OS for programming the cores that replace DMA
> engines.  I will discuss your counterarguments below, but the most
> important one to me seems to be that these cores would cost more than
> with a different ISA.  There is something to that, but when the
> application ISA is cheap to implement (e.g., RV64GC), that cost is
> small; it may be more an argument for also selecting the
> cheap-to-implement ISA for the OS/application cores.
> 
>> Indeed, consider AMD's Zen CPUs; the PSP/ASP/whatever it's
>> called these days is an ARM core while the big CPUs are x86.
>> I'm pretty sure there's an Xtensa DSP in there to do DRAM and
>> timing and PCIe link training.
> 
> The PSPs are not programmable by the OS or application programmers, so
> using the same ISA would not benefit the OS or application
> programmers.  By contrast, the idea for the DMA replacement engines is
> that they are programmable by the OS and maybe the application
> programmers, and that changes whether the same ISA is beneficial.
> 
> What is "ASP/whatever"?
> 
>> Similarly with the ME on Intel.
> 
> Last I read about it, ME uses a core developed by Intel with IA-32 or
> AMD64; but in any case, the ME is not programmable by OS or
> application programmers, either.
> 
>> A BMC might be running on whatever.
> 
> Again, a BMC is not programmable by OS or application programmers.
> 
>> We increasingly see ARM
>> based SBCs that have small RISC-V microcontroller-class cores
>> embedded in the SoC for exactly this sort of thing.
> 
> That's interesting; it points to RISC-V being cheaper to implement
> than ARM.  As for "that sort of thing", they are all not programmable
> by OS or application programmers, so see above.
> 
>> Our hardware RoT
> 
> ?
> 
>> The problem is when such service cores are hidden (as they are
>> in the case of the PSP, SMU, MPIO, and similar components, to
>> use AMD as the example) and treated like black boxes by
>> software.  It's really cool that I can configure the IO crossbar
>> in useful way tailored to specific configurations, but it's much
>> less cool that I have to do what amounts to an RPC over the SMN
>> to some totally undocumented entity somewhere in the SoC to do
>> it.  Bluntly, as an OS person, I do not want random bits of code
>> running anywhere on my machine that I am not at least aware of
>> (yes, this includes firmware blobs on devices).
> 
> Well, one goes with the other.  If you design the hardware for being
> programmed by the OS programmers, you use the same ISA for all the
> cores that the OS programmers program, whereas if you design the
> hardware as programmed by "firmware" programmers, you use a
> cheap-to-implement ISA and design the whole thing such that it is
> opaque to OS programmers and only offers some certain capabilities to
> OS programmers.
> 
> And that's not just limited to ISAs.  A very successful example is the
> way that flash memory is usually exposed to OSs: as a block device
> like a plain old hard disk, and all the idiosyncracies of flash are
> hidden in the device behind a flash translation layer that is
> implemented by a microcontroller on the device.
> 
> What's "SMN"?
> 
>>> and you can also
>>> use the facilities of the main cores (e.g., debugging features that
>>> may be absent of the I/O cores) during development.
>>
>> This is interesting, but we've found it more useful going the
>> other way around.  We do most of our debugging via the SP.
>> Since The SP is also responsible for system initialization and
>> holding x86 in reset until we're reading for it to start
>> running, it's the obvious nexus for debugging the system
>> holistically.
> 
> Sure, for debugging on the core-dump level that's useful.  I was
> thinking about watchpoint and breakpoint registers and performance
> counters that one may not want to implement on the DMA-replacement
> core, but that is implemented on the OS/application cores.
> 
>>> Marking the binaries that should be able to run on the IO service
>>> processors with some flag, and letting the component of the OS that
>>> assigns processes to cores heed this flag is not rocket science.
>>
>> I agree, that's easy.  And yet, mistakes will be made, and there
>> will be tension between wanting to dedicate those CPUs to IO
>> services and wanting to use them for GP programs: I can easily
>> imagine a paper where someone modifies a scheduler to move IO
>> bound programs to those cores.  Using a different ISA obviates
>> most of that, and provides an (admittedly modest) security benefit.
> 
> If there really is such tension, that indicates that such cores would
> be useful for general-purpose use.  That makes the case for using the
> same ISA even stronger.
> 
> As for "mistakes will be made", that also goes the other way: With a
> separate toolchain for the DMA-replacement ISA, there is lots of
> opportunity for mistakes.
> 
> As for "security benefit", where is that supposed to come from?  What
> attack scenario do you have in mind where that "security benefit"
> could materialize?
> 
>> And if I already have to modify or configure the OS to
>> accommodate the existence of these things in the first place,
>> then accommodating an ISA difference really isn't that much
>> extra work.  The critical observation is that a typical SMP view
>> of the world no longer makes sense for the system architecture,
>> and trying to shoehorn that model onto the hardware reality is
>> just going to cause frustration.
> 
> The shared-memory multiprocessing view of the world is very
> successful, while distributed-memory computers are limited to
> supercomputing and other areas where hardware cost still dominates
> over software cost (i.e., where the software crisis has not happened
> yet); as an example of the lack of success of the distributed-memory
> paradigm, take the PlayStation 3; programmers found it too hard to
> work with, so they did not use the hardware well, and eventually Sony
> decided to go for an SMP machine for the PlayStation 4 and 5.
> 
> OTOH, one can say that the way many peripherals work on
> general-purpose computers is more along the lines of
> distributed-memory; but that's probably due to the relative hardware
> and software costs for that peripheral.  Sure, the performance
> characteristics are non-uniform (NUMA) in many cases, but 1) caches
> tend to smooth over that, and 2) most of the code is not
> performance-critical, so it just needs to run, which is easier to
> achieve with SMP and harder with distributed memory.
> 
> Sure, people have argued for advantages of other models for decades,
> like you do now, but SMP has usually won.
> 
>>>>> On the other hand, you buy a motherboard with said ASIC core,
>>>>> and you can boot the MB without putting a big chip in the
>>>>> socket--but you may have to deal with scant DRAM since the
>>>>> big centralized chip contains teh memory controller.
>>>>
>>>> A neat hack for bragging rights, but not terribly practical?
>>>
>>> Very practical for updating the firmware of the board to support the
>>> big chip you want to put in the socket (called "BIOS FlashBack" in
>>> connection with AMD big chips).
>>
========== REMAINDER OF ARTICLE TRUNCATED ==========