Path: ...!weretis.net!feeder9.news.weretis.net!news.quux.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>
Newsgroups: comp.arch
Subject: Re: Memory ordering
Date: Thu, 19 Dec 2024 16:25:23 -0800
Organization: A noiseless patient Spider
Lines: 109
Message-ID: <vk2dhj$33ckv$2@dont-email.me>
References: <vfono1$14l9r$1@dont-email.me> <vh4530$2mar5$1@dont-email.me>
 <-rKdnTO4LdoWXKj6nZ2dnZfqnPWdnZ2d@supernews.com>
 <vh5t5b$312cl$2@dont-email.me>
 <5yqdnU9eL_Y_GKv6nZ2dnZfqn_GdnZ2d@supernews.com>
 <2024Nov15.082512@mips.complang.tuwien.ac.at> <vh7ak1$3cm56$1@dont-email.me>
 <20241115152459.00004c86@yahoo.com> <vh8bn7$3j6ql$1@dont-email.me>
 <vhb2dc$73fe$1@dont-email.me> <vhct2q$lk1b$2@dont-email.me>
 <2024Nov17.161752@mips.complang.tuwien.ac.at> <vhh16e$1lp5h$1@dont-email.me>
 <2024Dec3.100144@mips.complang.tuwien.ac.at> <vin2rp$3ofc$1@dont-email.me>
 <3aa9f0a3d3dde86193abb1c01e52d03a@www.novabbs.org>
 <jwvser449xz.fsf-monnier+comp.arch@gnu.org> <vipv2t$v57m$1@dont-email.me>
 <virlki$1fhli$1@dont-email.me>
 <ad8ce8000ff1a5a708d3cca330b5861e@www.novabbs.org>
 <vk22kr$31esr$1@dont-email.me>
 <9fac22de9841dbb36f26615dbc6432db@www.novabbs.org>
 <vk2d9o$33ckv$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 20 Dec 2024 01:25:23 +0100 (CET)
Injection-Info: dont-email.me; posting-host="90d0359d260b5ab20f7104a3aa6ad836";
	logging-data="3256991"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX19wnE0PDIdl1EyKt+Vn+bJ2z0e6Bs+fKLA="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:IO56gg3V1DTVDpPQGoWjxYBKzZw=
In-Reply-To: <vk2d9o$33ckv$1@dont-email.me>
Content-Language: en-US
Bytes: 6772

On 12/19/2024 4:21 PM, Chris M. Thomasson wrote:
> On 12/19/2024 3:59 PM, MitchAlsup1 wrote:
>> On Thu, 19 Dec 2024 21:19:24 +0000, Chris M. Thomasson wrote:
>>
>>> On 12/19/2024 10:33 AM, MitchAlsup1 wrote:
>>>> On Thu, 5 Dec 2024 7:44:19 +0000, Chris M. Thomasson wrote:
>>>>
>>>>> On 12/4/2024 8:13 AM, jseigh wrote:
>>>>>> On 12/3/24 18:37, Stefan Monnier wrote:
>>>>>>>>>                                            If there are places
>>>>>>>>> in the code it doesn't know this can't happen it won't optimize
>>>>>>>>> across it, more or less.
>>>>>>>>
>>>>>>>> The problem is HOW to TELL the COMPILER that these memory 
>>>>>>>> references
>>>>>>>> are "more special" than normal--when languages give few mechanisms.
>>>>>>>
>>>>>>> We could start with something like
>>>>>>>
>>>>>>>      critical_region {
>>>>>>>        ...
>>>>>>>      }
>>>>>>>
>>>>>>> such that the compiler must refrain from any code motion within
>>>>>>> those sections but is free to move things outside of those 
>>>>>>> sections as
>>>>>>> if
>>>>>>> execution was singlethreaded.
>>>>>>>
>>>>>>
>>>>>> C/C++11 already defines what lock acquire/release semantics are.
>>>>>> Roughly you can move stuff outside of a critical section into it
>>>>>> but not vice versa.
>>>>>>
>>>>>> Java uses synchronized blocks to denote the critical section.
>>>>>> C++ (the society for using RAII for everything) has scoped_lock
>>>>>> if you want to use RAII for your critical section.  It's not
>>>>>> always obvious what the actual critical section is.  I usually
>>>>>> use it inside its own bracket section to make it more obvious.
>>>>>>    { std::scoped_lock m(mutex);
>>>>>>      // .. critical section
>>>>>>    }
>>>>>>
>>>>>> I'm not a big fan of c/c++ using acquire and release memory order
>>>>>> directives on everything since apart from a few situations it's
>>>>>> not intuitively obvious what they do in all cases.  You can
>>>>>> look a compiler assembler output but you have to be real careful
>>>>>> generalizing from what you see.
>>>>>
>>>>> The release on the unlock can allow some following stores and 
>>>>> things to
>>>>> sort of "bubble up before it?
>>>>>
>>>>> Acquire and release confines things to the "critical section", the
>>>>> release can allow for some following things to go above it, so to 
>>>>> speak.
>>>>> This is making me think of Alex over on c.p.t. !
>>>>
>>>> This sounds dangerous if the thing allowed to go above it is 
>>>> unCacheable
>>>> while the lock:release is cacheable, the cacheable lock can arrive at
>>>> another core before the unCacheable store arrives at its destination.
>>>
>>> Humm... Need to ponder on that. Wrt the sparc:
>>>
>>> membar #LoadStore | #StoreStore
>>>
>>> can allow following stores to bubble up before it. If we want to block
>>> that then we would use a #StoreLoad. However, a #StoreLoad is not
>>> required for unlocking a mutex.
>>
>> It is the cacheable locks covering unCacheable data that got MOESI
>> protocol in trouble (SPARC V8 era). MESI does not have this kind
>> of problem. {{SuperSPARC MESI did not have this problem because
>> writes to memory (via SNOOP hits) were slow, Ross MOESI did have
>> this problem because cache-to-cache transfers (SNOOP hit) were as
>> few as 6 cycles.}}
>>
>> S O , What kind of barriers a relaxed memory model needs becomes
>> dependent on the cache coherency model !?!?!?! How is software
>> going to deal with that ?!? It them becomes dependent on the
>> memory order model, as a cascade of Oh-crap-what-have-I-done-to-
>> myself ...
>>
>> It is stuff like this that lead My 66000 to alter memory models
>> as it accesses memory and mandates that all critical sections
>> are denoted (.lock) at the beginning and end of the ATOMIC event.
>> Thus, the programmer gets the performance of the relaxed memory
>> with the sanity of sequential consistency without programmer
>> inivolvement.
> 
> Well, it depends on what you need to do. Iirc, even x86 has that WB, WC 
> and WT memory. It has the lfence, sfence and mfence to handle it, non- 
> temporal instructions. I cannot exactly remember all of it right now. 
> acquire-release consistency is for MOV is the way. CLFLUSH? ;^)
> 
> I think there was a problem here, iirc something about Plan9 problem 
> rings a bell. I need to refresh my mind. Thanks Mitch! :^)
> 
> Iirc, Alex Terekhov wrote about it way back on c.p.t.

Releasing a spinlock on x86 can use a simple MOV instruction because it 
has an implied release, but that is only for WB iirc, might be wrong 
here. Damn it's been a long time since I have worked with the various 
types of memory on x86.

An older paper:

https://www.cs.cmu.edu/~410-f10/doc/Intel_Reordering_318147.pdf