Deutsch English Français Italiano |
<2025May3.081100@mips.complang.tuwien.ac.at> View for Bookmarking (what is this?) Look up another Usenet article |
Path: news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail From: anton@mips.complang.tuwien.ac.at (Anton Ertl) Newsgroups: comp.arch Subject: Re: DMA is obsolete Date: Sat, 03 May 2025 06:11:00 GMT Organization: Institut fuer Computersprachen, Technische Universitaet Wien Lines: 267 Message-ID: <2025May3.081100@mips.complang.tuwien.ac.at> References: <vuj131$fnu$1@gal.iecc.com> <5a77c46910dd2100886ce6fc44c4c460@www.novabbs.org> <vv19rs$t2d$1@reader1.panix.com> <2025May2.073450@mips.complang.tuwien.ac.at> <vv2mqb$hem$1@reader1.panix.com> Injection-Date: Sat, 03 May 2025 09:59:55 +0200 (CEST) Injection-Info: dont-email.me; posting-host="d230db8aa85e68b3bc438415dc7f1948"; logging-data="3264090"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18sc0gJXwPRRmeL5doTsFq7" Cancel-Lock: sha1:o1VQeq9gcCGREwsPs8upKmeuep4= X-newsreader: xrn 10.11 cross@spitfire.i.gajendra.net (Dan Cross) writes: >In article <2025May2.073450@mips.complang.tuwien.ac.at>, >Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote: >>I think it's the same thing as Greenspun's tenth rule: First you find >>that a classical DMA engine is too limiting, then you find that an A53 >>is too limiting, and eventually you find that it would be practical to >>run the ISA of the main cores. In particular, it allows you to use >>the toolchain of the main cores for developing them, > >These are issues solveable with the software architecture and >build system for the host OS. Certainly, one can work around many bad decisions, and in reality one has to work around some bad decisions, but the issue here is not whether "the issues are solvable", but which decision leads to better or worse consequences. >The important characteristic is >that the software coupling makes architectural sense, and that >simply does not require using the same ISA across IPs. IP? Internet Protocol? Software Coupling sounds to me like a concept from Constantine out of my Software engineering class. I guess you did not mean either, but it's unclear what you mean. In any case, I have made arguments why it would make sense to use the same ISA as for the OS for programming the cores that replace DMA engines. I will discuss your counterarguments below, but the most important one to me seems to be that these cores would cost more than with a different ISA. There is something to that, but when the application ISA is cheap to implement (e.g., RV64GC), that cost is small; it may be more an argument for also selecting the cheap-to-implement ISA for the OS/application cores. >Indeed, consider AMD's Zen CPUs; the PSP/ASP/whatever it's >called these days is an ARM core while the big CPUs are x86. >I'm pretty sure there's an Xtensa DSP in there to do DRAM and >timing and PCIe link training. The PSPs are not programmable by the OS or application programmers, so using the same ISA would not benefit the OS or application programmers. By contrast, the idea for the DMA replacement engines is that they are programmable by the OS and maybe the application programmers, and that changes whether the same ISA is beneficial. What is "ASP/whatever"? >Similarly with the ME on Intel. Last I read about it, ME uses a core developed by Intel with IA-32 or AMD64; but in any case, the ME is not programmable by OS or application programmers, either. >A BMC might be running on whatever. Again, a BMC is not programmable by OS or application programmers. >We increasingly see ARM >based SBCs that have small RISC-V microcontroller-class cores >embedded in the SoC for exactly this sort of thing. That's interesting; it points to RISC-V being cheaper to implement than ARM. As for "that sort of thing", they are all not programmable by OS or application programmers, so see above. >Our hardware RoT ? >The problem is when such service cores are hidden (as they are >in the case of the PSP, SMU, MPIO, and similar components, to >use AMD as the example) and treated like black boxes by >software. It's really cool that I can configure the IO crossbar >in useful way tailored to specific configurations, but it's much >less cool that I have to do what amounts to an RPC over the SMN >to some totally undocumented entity somewhere in the SoC to do >it. Bluntly, as an OS person, I do not want random bits of code >running anywhere on my machine that I am not at least aware of >(yes, this includes firmware blobs on devices). Well, one goes with the other. If you design the hardware for being programmed by the OS programmers, you use the same ISA for all the cores that the OS programmers program, whereas if you design the hardware as programmed by "firmware" programmers, you use a cheap-to-implement ISA and design the whole thing such that it is opaque to OS programmers and only offers some certain capabilities to OS programmers. And that's not just limited to ISAs. A very successful example is the way that flash memory is usually exposed to OSs: as a block device like a plain old hard disk, and all the idiosyncracies of flash are hidden in the device behind a flash translation layer that is implemented by a microcontroller on the device. What's "SMN"? >>and you can also >>use the facilities of the main cores (e.g., debugging features that >>may be absent of the I/O cores) during development. > >This is interesting, but we've found it more useful going the >other way around. We do most of our debugging via the SP. >Since The SP is also responsible for system initialization and >holding x86 in reset until we're reading for it to start >running, it's the obvious nexus for debugging the system >holistically. Sure, for debugging on the core-dump level that's useful. I was thinking about watchpoint and breakpoint registers and performance counters that one may not want to implement on the DMA-replacement core, but that is implemented on the OS/application cores. >>Marking the binaries that should be able to run on the IO service >>processors with some flag, and letting the component of the OS that >>assigns processes to cores heed this flag is not rocket science. > >I agree, that's easy. And yet, mistakes will be made, and there >will be tension between wanting to dedicate those CPUs to IO >services and wanting to use them for GP programs: I can easily >imagine a paper where someone modifies a scheduler to move IO >bound programs to those cores. Using a different ISA obviates >most of that, and provides an (admittedly modest) security benefit. If there really is such tension, that indicates that such cores would be useful for general-purpose use. That makes the case for using the same ISA even stronger. As for "mistakes will be made", that also goes the other way: With a separate toolchain for the DMA-replacement ISA, there is lots of opportunity for mistakes. As for "security benefit", where is that supposed to come from? What attack scenario do you have in mind where that "security benefit" could materialize? >And if I already have to modify or configure the OS to >accommodate the existence of these things in the first place, >then accommodating an ISA difference really isn't that much >extra work. The critical observation is that a typical SMP view >of the world no longer makes sense for the system architecture, >and trying to shoehorn that model onto the hardware reality is >just going to cause frustration. The shared-memory multiprocessing view of the world is very successful, while distributed-memory computers are limited to supercomputing and other areas where hardware cost still dominates over software cost (i.e., where the software crisis has not happened yet); as an example of the lack of success of the distributed-memory paradigm, take the PlayStation 3; programmers found it too hard to work with, so they did not use the hardware well, and eventually Sony decided to go for an SMP machine for the PlayStation 4 and 5. OTOH, one can say that the way many peripherals work on general-purpose computers is more along the lines of distributed-memory; but that's probably due to the relative hardware and software costs for that peripheral. Sure, the performance characteristics are non-uniform (NUMA) in many cases, but 1) caches tend to smooth over that, and 2) most of the code is not performance-critical, so it just needs to run, which is easier to achieve with SMP and harder with distributed memory. Sure, people have argued for advantages of other models for decades, like you do now, but SMP has usually won. >>>>On the other hand, you buy a motherboard with said ASIC core, >>>>and you can boot the MB without putting a big chip in the >>>>socket--but you may have to deal with scant DRAM since the >>>>big centralized chip contains teh memory controller. >>> >>>A neat hack for bragging rights, but not terribly practical? >> >>Very practical for updating the firmware of the board to support the >>big chip you want to put in the socket (called "BIOS FlashBack" in >>connection with AMD big chips). > >"BIOS", as loaded from the EFS by the ABL on the PSP on EPYC >class chips, is usually stored in a QSPI flash on the main >board (though starting with Turin you _can_ boot via eSPI). >Strictly speaking, you don't _need_ an x86 core to rewrite that. >On our machines, we do that from the SP, but we don't use AGESA >or UEFI: all of the platform enablement stuff done in PEI and >DXE we do directly in the host OS. EFS? ABL? QSPI? eSPI? PEI? DXE? ========== REMAINDER OF ARTICLE TRUNCATED ==========