Article <499b7179ca8b4650a63c444fdc00c2cd@www.novabbs.org>

Deutsch English Français Italiano
<499b7179ca8b4650a63c444fdc00c2cd@www.novabbs.org>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: news.eternal-september.org!eternal-september.org!feeder3.eternal-september.org!i2pn.org!i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup1)
Newsgroups: comp.arch
Subject: Re: DMA is obsolete
Date: Sun, 27 Apr 2025 20:49:50 +0000
Organization: Rocksolid Light
Message-ID: <499b7179ca8b4650a63c444fdc00c2cd@www.novabbs.org>
References: <vuj131$fnu$1@gal.iecc.com> <slrn100q2dv.eisl.lars@cleo.beagle-ears.com> <Crc*345aA@news.chiark.greenend.org.uk>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
	logging-data="2159869"; mail-complaints-to="usenet@i2pn2.org";
	posting-account="o5SwNDfMfYu6Mv4wwLiW6e/jbA93UAdzFodw5PEa6eU";
User-Agent: Rocksolid Light
X-Rslight-Site: $2y$10$7wLplXNr4c02FXWOiQT8buK1UxdvU2MrS7Va7Pc4qsrHvQaRAWExG
X-Spam-Checker-Version: SpamAssassin 4.0.0
X-Rslight-Posting-User: cb29269328a20fe5719ed6a1c397e21f651bda71

On Sun, 27 Apr 2025 18:35:08 +0000, Theo wrote:

> Lars Poulsen <lars@cleo.beagle-ears.com> wrote:
>> What is the difference between DMA and message-passing to another core
>> doing CMOV loop at the ISA level?
>>
>> DMA means doing that it the micro-engine instead of at the ISA level.
>> Same difference.
>>
>> What am I missing?
>
> Width and specialisation.
>
> You can absolutely write a DMA engine in software.  One thing that is
> troublesome is that the CPU datapath might be a lot narrower than the
> number
> of bits you can move in a single cycle.  eg on FPGA we can't clock logic
> anywhere near the DRAM clock so we end up making a very wide memory bus
> that
> runs at a lower clock - 512/1024/2048/...  bits wide.  You can do that
> in a
> regular ISA using vector registers/instructions but it adds complexity
> you
> don't need.

With anything at 7nm or smaller, the main core interconnect should be
1 cache line wide (512 bits = 64 bytes :: although IBM's choice of 256
byte cache lines might be troublesome for now.)

> The other is that there's often some degree of marshalling that needs to
> happen - reading scatter/gather lists, formatting packets the right way
> for
> PCIe, filling in the right header fields, etc.  It's more efficient to
> do
> that in hardware than it is to spend multiple instructions per packet
> doing
> it.  Meanwhile the DRAM bandwidth is being wasted.

SW is nadda-verrryyy guud at twiddling bits like HW is.