| Deutsch English Français Italiano |
|
<499b7179ca8b4650a63c444fdc00c2cd@www.novabbs.org> View for Bookmarking (what is this?) Look up another Usenet article |
Path: news.eternal-september.org!eternal-september.org!feeder3.eternal-september.org!i2pn.org!i2pn2.org!.POSTED!not-for-mail From: mitchalsup@aol.com (MitchAlsup1) Newsgroups: comp.arch Subject: Re: DMA is obsolete Date: Sun, 27 Apr 2025 20:49:50 +0000 Organization: Rocksolid Light Message-ID: <499b7179ca8b4650a63c444fdc00c2cd@www.novabbs.org> References: <vuj131$fnu$1@gal.iecc.com> <slrn100q2dv.eisl.lars@cleo.beagle-ears.com> <Crc*345aA@news.chiark.greenend.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Info: i2pn2.org; logging-data="2159869"; mail-complaints-to="usenet@i2pn2.org"; posting-account="o5SwNDfMfYu6Mv4wwLiW6e/jbA93UAdzFodw5PEa6eU"; User-Agent: Rocksolid Light X-Rslight-Site: $2y$10$7wLplXNr4c02FXWOiQT8buK1UxdvU2MrS7Va7Pc4qsrHvQaRAWExG X-Spam-Checker-Version: SpamAssassin 4.0.0 X-Rslight-Posting-User: cb29269328a20fe5719ed6a1c397e21f651bda71 On Sun, 27 Apr 2025 18:35:08 +0000, Theo wrote: > Lars Poulsen <lars@cleo.beagle-ears.com> wrote: >> What is the difference between DMA and message-passing to another core >> doing CMOV loop at the ISA level? >> >> DMA means doing that it the micro-engine instead of at the ISA level. >> Same difference. >> >> What am I missing? > > Width and specialisation. > > You can absolutely write a DMA engine in software. One thing that is > troublesome is that the CPU datapath might be a lot narrower than the > number > of bits you can move in a single cycle. eg on FPGA we can't clock logic > anywhere near the DRAM clock so we end up making a very wide memory bus > that > runs at a lower clock - 512/1024/2048/... bits wide. You can do that > in a > regular ISA using vector registers/instructions but it adds complexity > you > don't need. With anything at 7nm or smaller, the main core interconnect should be 1 cache line wide (512 bits = 64 bytes :: although IBM's choice of 256 byte cache lines might be troublesome for now.) > The other is that there's often some degree of marshalling that needs to > happen - reading scatter/gather lists, formatting packets the right way > for > PCIe, filling in the right header fields, etc. It's more efficient to > do > that in hardware than it is to spend multiple instructions per packet > doing > it. Meanwhile the DRAM bandwidth is being wasted. SW is nadda-verrryyy guud at twiddling bits like HW is.