Article <v4cpn6$1phq4$1@dont-email.me>

Deutsch English Français Italiano
<v4cpn6$1phq4$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!feed.opticnetworks.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Terje Mathisen <terje.mathisen@tmsw.no>
Newsgroups: comp.arch
Subject: Re: Privilege Levels Below User
Date: Wed, 12 Jun 2024 20:34:13 +0200
Organization: A noiseless patient Spider
Lines: 138
Message-ID: <v4cpn6$1phq4$1@dont-email.me>
References: <jai66jd4ih4ejmek0abnl4gvg5td4obsqg@4ax.com>
 <h0ib6j576v8o37qu1ojrsmeb5o88f29upe@4ax.com>
 <2024Jun9.185245@mips.complang.tuwien.ac.at>
 <38ob6jl9sl3ceb0qugaf26cbv8lk7hmdil@4ax.com>
 <2024Jun10.091648@mips.complang.tuwien.ac.at>
 <o32f6jlq2qpi9s1u8giq521vv40uqrkiod@4ax.com>
 <3a691dbdc80ebcc98d69c3a234f4135b@www.novabbs.org>
 <k58h6jlvp9rl13br6v1t24t47t4t2brfiv@4ax.com>
 <5a27391589243e11b610b14c3015ec09@www.novabbs.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: quoted-printable
Injection-Date: Wed, 12 Jun 2024 20:34:15 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="8ff6901c2e9b02e5825e75bced85b4a0";
	logging-data="1886020"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1/jnunmY0jsUX3NC2U9D78fS8GHeOJl54Xfhq207hkDcA=="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
 Firefox/91.0 SeaMonkey/2.53.18.2
Cancel-Lock: sha1:wUyPll8TC7frxgDwF9Cd00b9UEc=
In-Reply-To: <5a27391589243e11b610b14c3015ec09@www.novabbs.org>
Bytes: 7142

MitchAlsup1 wrote:
> John Savard wrote:
>=20
>> On Tue, 11 Jun 2024 00:27:02 +0000, mitchalsup@aol.com (MitchAlsup1)
>> wrote:
>=20
>>> ALL I have DONE is to not have the MB write into the cache until the
>>> causing instruction retires !!
>=20
>> I suppose that depends on how you define "write".
>=20
> I mean the memory cell does not get modified.
>=20
>> If by "write" you mean store data in the cache, for eventual writing
>> out into RAM, well, since RAM doesn't contain "rename locations" to
>> play with, it seems to me that any CPU designer had better do that.
>=20
> The cache itself is not modified until the memory reference retires.
> But there is a buffer holding the data which can be accessed as if
> it were an L0 cache until the data migrates to the real cache at=20
> retirement.
>=20
>> At least, I'm not imaginative enough to think of doing it any other
>> way.
>=20
>> However, if by "write" you mean to change the state of the cache in
>> any way, such as by reading data from memory... now, _then_ you would
>> indeed have done what is necessary to combat Spectre.
>=20
> The cache is not modified, the data is available through another means.=

> a means that can be backed up like a mispredicted branch. The buffer
> I am talking about is temporally organized not spatially organized.
>=20
>> Obviously, though, a "load" instruction will _never_ retire unless it
>> can read the data from memory it is trying to put in a register.
>=20
> The LD instruction can obtain data from either the buffer or from
> the data cache itself. The buffer covers the execution window,
> allowing the LD to retire (assuming every older instruction also
> retires).
>=20
>> So apparently WHAT you have REALLY DONE is to modify how memory reads
>> work...
>=20
> I pipelined them through a temporally organized memory execution
> window. This also provides for allowing the memory system to run
> OoO wrt program order, and detect actual ordering violations, and
> rerun the memory references in a proper memory order by rerunning
> the references in order.
>=20
> You get relaxed memory order performance and precise memory order
> simultaneously.
>=20
>> if the data a load instruction requires is not already in the cache,
>> then a direct read from memory=20
>=20
> The request is forwards towards memory through the cache hierarchy
> and data arrives back at requestor (sooner or later).
>=20
>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=
=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 is performed which *completely
>> bypasses* the cache;=20
>=20
> Yes, critical word first.
>=20
>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 this data (and its assoc=
iated address) are
>> retained by the CPU to be placed in the cache _if_ the instruction is
>> actually executed and when it retires.
>=20
> Yes !! While the data resides in the buffer, the whole line can be=20
> accessed by a number of memory reference instructions.
>=20
>> And, in fact, the various cache levels have to work this way too. You
>> have an L1 cache miss, but an L2 cache hit? Fine, you take your data
>> directly from L2, and don't promote the data into L2 until instruction=

>> retirement.
>=20
> I use an exclusive cache organization. so data arriving at the CPU
> goes into buffer, which upon retirement goes into L1, which has the
> potential to push a L1->L2 line, and so forth.
>=20
>> So now the process of fetching data from memory is _not_ done by
>> fetching always from L1 and going _throughl_ L1 to access L2, and
>> going _through_ L2 to access RAM, which seems to be the usual way
>> these days.
>=20
> Its back to the Athlon/Operon organizations.
>=20
>> That certainly can be done. But it isn't quite as simple and obvious
>> as you seem to claim.
>=20
> If you had worked on them you can recognize the advantages and dis-
> advantages.
>=20
>>> My 66000 is also insensitive to RowHammer and derivatives.....
>=20
>> When I first read that sentence, I was completely incredulous. DRAM is=

>> sensitive to RowHammer because it's gone to feature sizes which are
>> beyond the state-of-the-art to do properly... so corners have been
>> cut.
>=20
>> How a CPU can be "insensitive" to it was mysterious.
>=20
>> After all, RowHammer is caused by multiple rapid-fire accesses to the
>> same address, or to related addresses, in memory.
>=20
> Yes, the write buffer in my DRAM controller is the L3 cache. Modified
> data in the L3 migrates towards DRAM as DRAM cycles permit, but there
> is no way to cause a line to be continuously be written into DRAM.
> If a modified line has migrated to DRAM, and it gets modified again
> in the L3, that 2nd write will not be performed until a refresh cycle
> on that DRAM is performed.
>=20
> Thus if one tries to RowHammer My 66000 DRAM, DRAM gets refresh cycle
> between each write.

Rowhammer can modify nearby lines, not just the ones that are being=20
hammered, right? How do you guarantee that all neighbors will also be=20
refreshed?

Similarly, if the accesses are LOCK XADD operations, and you have=20
multiple CPUs (or cores not sharing a common last level cache, then I=20
don't see any way to avoid those accesses from making it all the way to=20
the RAM chips?

Terje


--=20
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"