| Deutsch English Français Italiano |
|
<vuj53m$2s0jv$1@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail From: Terje Mathisen <terje.mathisen@tmsw.no> Newsgroups: comp.arch Subject: Re: DMA is obsolete Date: Sat, 26 Apr 2025 19:28:21 +0200 Organization: A noiseless patient Spider Lines: 42 Message-ID: <vuj53m$2s0jv$1@dont-email.me> References: <vuj131$fnu$1@gal.iecc.com> <slrn100q2dv.eisl.lars@cleo.beagle-ears.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Date: Sat, 26 Apr 2025 19:28:22 +0200 (CEST) Injection-Info: dont-email.me; posting-host="5e2967bf5c7bd177c5e627af12d074d3"; logging-data="3015295"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+w13I/pfzBjGhSquyJ+v51dny5c0i9ZlBPtLYjrzMK5w==" User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:128.0) Gecko/20100101 Firefox/128.0 SeaMonkey/2.53.20 Cancel-Lock: sha1:dGELwokkKlRQbeUo/8ZB33qNtr0= In-Reply-To: <slrn100q2dv.eisl.lars@cleo.beagle-ears.com> Bytes: 2670 Lars Poulsen wrote: > On 2025-04-26, John Levine <johnl@taugh.com> wrote: >> Well, not entirely. This preprint argues that in environments with >> lots of cores and where latency is an issue, programmed I/O can outperform >> DMA. >> >> Rethinking Programmed I/O for Fast Devices, Cheap Cores, and Coherent Interconnects >> >> Anastasiia Ruzhanskaia, Pengcheng Xu, David Cock, Timothy Roscoe [snip] >> >> https://arxiv.org/abs/2409.08141 > > What is the difference between DMA and message-passing to another core > doing CMOV loop at the ISA level? > > DMA means doing that it the micro-engine instead of at the ISA level. > Same difference. > > What am I missing? > I think, in the end it all comes down to power: If the DMA engine can move n GB of data using less total power than having a regular core do it with programmed IO, then the DMA engine wins. OTOH, I have argued here in c.arch that for most data input streams, a regular core is going to look at the data eventually, and in that case the same core can do the work and either process it directly (in register file sized or smaller blocks)or work as a prefetcher to first load up $L1-sized blocks and then process that chunk. On the gripping hand, if this is either going out, or you only need to look at a small percentage of the incoming cache lines worth of data, then the more power-efficient DMA engine can still win. Terje -- - <Terje.Mathisen at tmsw.no> "almost all programming can be viewed as an exercise in caching"