Article <859cc676b91aa1173e44acf0f2be636b@www.novabbs.org>

Deutsch English Français Italiano
<859cc676b91aa1173e44acf0f2be636b@www.novabbs.org>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: news.eternal-september.org!eternal-september.org!feeder3.eternal-september.org!i2pn.org!i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup1)
Newsgroups: comp.arch
Subject: Re: DMA is obsolete
Date: Fri, 2 May 2025 17:40:09 +0000
Organization: Rocksolid Light
Message-ID: <859cc676b91aa1173e44acf0f2be636b@www.novabbs.org>
References: <vuj131$fnu$1@gal.iecc.com> <da5b3dea460370fc1fe8ad2323da9bc4@www.novabbs.org> <vuvrlr$att$1@reader1.panix.com> <5a77c46910dd2100886ce6fc44c4c460@www.novabbs.org> <vv19rs$t2d$1@reader1.panix.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
	logging-data="2846882"; mail-complaints-to="usenet@i2pn2.org";
	posting-account="o5SwNDfMfYu6Mv4wwLiW6e/jbA93UAdzFodw5PEa6eU";
User-Agent: Rocksolid Light
X-Rslight-Site: $2y$10$S3qGOUW/z9wODvghKzjDy.id7QVuibziQSenukB3/su4EQ8pg.NZu
X-Spam-Checker-Version: SpamAssassin 4.0.0
X-Rslight-Posting-User: cb29269328a20fe5719ed6a1c397e21f651bda71

On Fri, 2 May 2025 2:15:24 +0000, Dan Cross wrote:

> In article <5a77c46910dd2100886ce6fc44c4c460@www.novabbs.org>,
> MitchAlsup1 <mitchalsup@aol.com> wrote:
>>On Thu, 1 May 2025 13:07:07 +0000, Dan Cross wrote:
>>> In article <da5b3dea460370fc1fe8ad2323da9bc4@www.novabbs.org>,
>>> MitchAlsup1 <mitchalsup@aol.com> wrote:
>>>>On Sat, 26 Apr 2025 17:29:06 +0000, Scott Lurndal wrote:
>>>>[snip]
>>>>Reminds me of trying to sell a micro x86-64 to AMD as a project.
>>>>The µ86 is a small x86-64 core made available as IP in Verilog
>>>>where it has/runs the same ISA as main GBOoO x86, but is placed
>>>>"out in the PCIe" interconnect--performing I/O services topo-
>>>>logically adjacent to the device itself. This allows 1ns access
>>>>latencies to DCRs and performing OS queueing of DPCs,... without
>>>>bothering the GBOoO cores.
>>>>
>>>>AMD didn't buy the arguments.
>>>
>>> I can see it either way; I suppose the argument as to whether I
>>> buy it or not comes down to, "in depends".  How much control do
>>> I, as the OS implementer, have over this core?
>>
>>Other than it being placed "away" from the centralized cores,
>>it runs the same ISA as the main cores has longer latency to
>>coherent memory and shorter latency to device control registers
>>--which is why it is placed close to the device itself:: latency.
>>The big fast centralized core is going to get microsecond latency
>>from MMI/O device whereas ASIC version will have handful of nano-
>>second latencies. So the 5 GHZ core sees ~1 microsecond while the
>>little ASIC sees 10 nanoseconds. ...
>
> Yes, I get the argument for WHY you'd do it, I just want to make
> sure that it's an ordinary core (albeit one that is far away
> from the sockets with the main SoC complexes) that I interact
> with in the usual manner.  Compare to, say, MP1 or MP0 on AMD
> Zen, where it runs its own (proprietary) firmware that I
> interact with via an RPC protocol over an AXI bus, if I interact
> with it at all: most OEMs just punt and run AGESA (we don't).
>
>>> If it is yet another hidden core embedded somewhere deep in the
>>> SoC complex and I can't easily interact with it from the OS,
>>> then no thanks: we've got enough of those between MP0, MP1, MP5,
>>> etc, etc.
>>>
>>> On the other hand, if it's got a "normal" APIC ID, the OS has
>>> control over it like any other LP, and its coherent with the big
>>> cores, then yeah, sign me up: I've been wanting something like
>>> that for a long time now.
>>
>>It is just a core that is cheap enough to put in ASICs, that
>>can offload some I/O burden without you having to do anything
>>other than setting some bits in some CRs so interrupts are
>>routed to this core rather than some more centralized core.
>
> Sounds good.
>
>>> Consider a virtualization application.  A problem with, say,
>>> SR-IOV is that very often the hypervisor wants to interpose some
>>> sort of administrative policy between the virtual function and
>>> whatever it actually corresponds to, but get out of the fast
>>> path for most IO.  This implies a kind of offload architecture
>>> where there's some (presumably software) agent dedicated to
>>> handling IO that can be parameterized with such a policy.  A
>>
>>Interesting:: Could you cite any literature, here !?!
>
> Sure.  This paper is a bit older, but gets at the main points:
> https://www.usenix.org/system/files/conference/nsdi18/nsdi18-firestone.pdf
>
> I don't know if the details are public for similar technologies
> from Amazon or Google.
>
>>> core very close to the device could handle that swimmingly,
>>> though I'm not sure it would be enough to do it at (say) line
>>> rate for a 400Gbps NIC or Gen5 NVMe device.
>>
>>I suspect the 400 GHz NIC needs a rather BIG core to handle the
>>traffic loads.
>
> Indeed.  Part of the challenge for the hyperscalars is in
> meeting that demand while not burning too many host resources,
> which are the thing they're actually selling their customer in
> the first place.  A lot of folks are pushing this off to the NIC
> itself, and I've seen at least one team that implemented NVMe in
> firmware on a 100Gbps NIC, exposed via SR-IOV, as part of a
> disaggregated storage architecture.
>
> Another option is to push this to the switch; things like Intel
> Tofino2 were well-position for this, but of course Intel, in its
> infinite wisdom and vision, canc'ed Tofino.
>
>>> ....but why x86_64?  It strikes me that as long as the _data_
>>> formats vis the software-visible ABI are the same, it doesn't
>>> need to use the same ISA.  In fact, I can see advantages to not
>>> doing so.
>>
>>Having the remote core run the same OS code as every other core
>>means the OS developers have fewer hoops to jump through. Bug-for
>>bug compatibility means that clearing of those CRs just leaves
>>the core out in the periphery idling and bothering no one.
>
> Eh...Having to jump through hoops here matters less to me for
> this kind of use case than if I'm trying to use those cores for
> general-purpose compute.  Having a separate ISA means I cannot
> accidentally run a program meant only for the big cores on the
> IO service processors.  As long as the OS has total control over
> the execution of the core, and it participates in whatever cache
> coherency scheme the rest of the system uses, then the ISA just
> isn't that important.
>
>>On the other hand, you buy a motherboard with said ASIC core,
>>and you can boot the MB without putting a big chip in the
>>socket--but you may have to deal with scant DRAM since the
>>big centralized chip contains teh memory controller.
>
> A neat hack for bragging rights, but not terribly practical?
>
> Anyway, it's a neat idea.  It's very reminiscent of IBM channel
> controllers, in a way.

It is more like the Peripheral Processors of CDC 6600 that run
ISA of a CDC 6600 without as much fancy execution in periphery.

> 	- Dan C.