Path: ...!weretis.net!feeder6.news.weretis.net!i2pn.org!i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup1)
Newsgroups: comp.arch
Subject: Re: Another security vulnerability
Date: Sat, 30 Mar 2024 01:06:23 +0000
Organization: Rocksolid Light
Message-ID: <5140da0c7db5686c4bb9948276454914@www.novabbs.org>
References: <utpoi2$b6to$1@dont-email.me> <2024Mar25.082534@mips.complang.tuwien.ac.at> <20240326192941.0000314a@yahoo.com> <uu0kt1$2nr9j$1@dont-email.me> <VpVMN.731075$p%Mb.618266@fx15.iad> <2024Mar27.191411@mips.complang.tuwien.ac.at> <HH_MN.732789$p%Mb.8039@fx15.iad> <5fc6ea8088c0afe8618d2862cbacebab@www.novabbs.org> <TfhNN.110764$_a1e.90012@fx16.iad> <14b25c0880216e54fe36d28c96e8428c@www.novabbs.org> <uu56rq$3u2ve$1@dont-email.me> <%1ANN.756839$p%Mb.622365@fx15.iad> <uu7bj0$h78h$1@dont-email.me> <8mHNN.117982$Sf59.36214@fx48.iad>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
	logging-data="3666007"; mail-complaints-to="usenet@i2pn2.org";
	posting-account="PGd4t4cXnWwgUWG9VtTiCsm47oOWbHLcTr4rYoM0Edo";
User-Agent: Rocksolid Light
X-Rslight-Site: $2y$10$owmzDK56O5ZGiRHToArvfedFCPJEc9BcNDqur37Zqnx4uER6m/NZm
X-Rslight-Posting-User: ac58ceb75ea22753186dae54d967fed894c3dce8
X-Spam-Checker-Version: SpamAssassin 4.0.0
Bytes: 4209
Lines: 60

Scott Lurndal wrote:

> "Paul A. Clayton" <paaronclayton@gmail.com> writes:
>>On 3/29/24 10:15 AM, Scott Lurndal wrote:
>>> "Paul A. Clayton" <paaronclayton@gmail.com> writes:
>>>> On 3/28/24 3:59â€¯PM, MitchAlsup1 wrote:
>>[snip]

>>>> However, even for a "general purpose" processor, "word"-granular
>>>> atomic operations could justify not having all data transfers be
>>>> cache line size. (Such are rare compared with cache line loads
>>>>from memory or other caches, but a design might have narrower
>>>> connections for coherence, interrupts, etc. that could be used for
>>>> small data communication.)
>>> 
>>> So long as the data transfer is cachable, the atomics can be handled
>>> at the LLC, rather than the memory controller.
>>
>>Yes, but if the width of the on-chip network — which is what Mitch
>>was referring to in transferring a cache line in one cycle — is
>>c.72 bytes (64 bytes for the data and 8 bytes for control
>>information) it seems that short messages would either have to be
>>grouped (increasing latency) or waste a significant fraction of
>>the potential bandwidth for that transfer. Compressed cache lines
>>would also not save bandwidth. These may not be significant
>>considerations, but this is an answer to "why define anything
>>smaller than a cache line?", i.e., seemingly reasonable
>>motivations may exist.
>>

> It's not uncommon for the bus/switch/mesh -structure- to be 512-bits wide,
> which indeed will support a full cache line transfer in a single transaction;

It is not the transaction it is a single beat of the clock. One can have
narrower bus widths and simply divide the cache line size by the bus width
to get the number of required beats.

> it also supports high-volume DMA operations (either memory to memory or
> device to memory).

> Most of the interconnect (bus, switched or point-to-point) implementations
> have an 

or more than one

>         overlaying protocol (including the cache coherency
> protocol) and are effectively message based, with agents posting requests
> that don't need a reply and expecting a reply for the rest.

Many older busses read PTP and PTEs from memory sizeof( PTE ) at a time,
some of them requesting write permission so that used and modified bits
can be written back immediately.{{Which skirts the distinction between
cacheable and uncacheable in several ways.}}

> That doesn't require that every transaction over that bus to
> utilize the full width of the bus.

In my wide bus situation, the line width is used to gang up multiple
responses (from different end-points) into a single beat==message.
For example the chip-to-chip transport can carry multiple independent
SNOOP responses in a single beat (saving cycles and lowering latency).