Path: ...!weretis.net!feeder9.news.weretis.net!news.quux.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: "Chris M. Thomasson" Newsgroups: comp.arch Subject: Re: Memory ordering Date: Thu, 19 Dec 2024 16:25:23 -0800 Organization: A noiseless patient Spider Lines: 109 Message-ID: References: <-rKdnTO4LdoWXKj6nZ2dnZfqnPWdnZ2d@supernews.com> <5yqdnU9eL_Y_GKv6nZ2dnZfqn_GdnZ2d@supernews.com> <2024Nov15.082512@mips.complang.tuwien.ac.at> <20241115152459.00004c86@yahoo.com> <2024Nov17.161752@mips.complang.tuwien.ac.at> <2024Dec3.100144@mips.complang.tuwien.ac.at> <3aa9f0a3d3dde86193abb1c01e52d03a@www.novabbs.org> <9fac22de9841dbb36f26615dbc6432db@www.novabbs.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Date: Fri, 20 Dec 2024 01:25:23 +0100 (CET) Injection-Info: dont-email.me; posting-host="90d0359d260b5ab20f7104a3aa6ad836"; logging-data="3256991"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19wnE0PDIdl1EyKt+Vn+bJ2z0e6Bs+fKLA=" User-Agent: Mozilla Thunderbird Cancel-Lock: sha1:IO56gg3V1DTVDpPQGoWjxYBKzZw= In-Reply-To: Content-Language: en-US Bytes: 6772 On 12/19/2024 4:21 PM, Chris M. Thomasson wrote: > On 12/19/2024 3:59 PM, MitchAlsup1 wrote: >> On Thu, 19 Dec 2024 21:19:24 +0000, Chris M. Thomasson wrote: >> >>> On 12/19/2024 10:33 AM, MitchAlsup1 wrote: >>>> On Thu, 5 Dec 2024 7:44:19 +0000, Chris M. Thomasson wrote: >>>> >>>>> On 12/4/2024 8:13 AM, jseigh wrote: >>>>>> On 12/3/24 18:37, Stefan Monnier wrote: >>>>>>>>>                                            If there are places >>>>>>>>> in the code it doesn't know this can't happen it won't optimize >>>>>>>>> across it, more or less. >>>>>>>> >>>>>>>> The problem is HOW to TELL the COMPILER that these memory >>>>>>>> references >>>>>>>> are "more special" than normal--when languages give few mechanisms. >>>>>>> >>>>>>> We could start with something like >>>>>>> >>>>>>>      critical_region { >>>>>>>        ... >>>>>>>      } >>>>>>> >>>>>>> such that the compiler must refrain from any code motion within >>>>>>> those sections but is free to move things outside of those >>>>>>> sections as >>>>>>> if >>>>>>> execution was singlethreaded. >>>>>>> >>>>>> >>>>>> C/C++11 already defines what lock acquire/release semantics are. >>>>>> Roughly you can move stuff outside of a critical section into it >>>>>> but not vice versa. >>>>>> >>>>>> Java uses synchronized blocks to denote the critical section. >>>>>> C++ (the society for using RAII for everything) has scoped_lock >>>>>> if you want to use RAII for your critical section.  It's not >>>>>> always obvious what the actual critical section is.  I usually >>>>>> use it inside its own bracket section to make it more obvious. >>>>>>    { std::scoped_lock m(mutex); >>>>>>      // .. critical section >>>>>>    } >>>>>> >>>>>> I'm not a big fan of c/c++ using acquire and release memory order >>>>>> directives on everything since apart from a few situations it's >>>>>> not intuitively obvious what they do in all cases.  You can >>>>>> look a compiler assembler output but you have to be real careful >>>>>> generalizing from what you see. >>>>> >>>>> The release on the unlock can allow some following stores and >>>>> things to >>>>> sort of "bubble up before it? >>>>> >>>>> Acquire and release confines things to the "critical section", the >>>>> release can allow for some following things to go above it, so to >>>>> speak. >>>>> This is making me think of Alex over on c.p.t. ! >>>> >>>> This sounds dangerous if the thing allowed to go above it is >>>> unCacheable >>>> while the lock:release is cacheable, the cacheable lock can arrive at >>>> another core before the unCacheable store arrives at its destination. >>> >>> Humm... Need to ponder on that. Wrt the sparc: >>> >>> membar #LoadStore | #StoreStore >>> >>> can allow following stores to bubble up before it. If we want to block >>> that then we would use a #StoreLoad. However, a #StoreLoad is not >>> required for unlocking a mutex. >> >> It is the cacheable locks covering unCacheable data that got MOESI >> protocol in trouble (SPARC V8 era). MESI does not have this kind >> of problem. {{SuperSPARC MESI did not have this problem because >> writes to memory (via SNOOP hits) were slow, Ross MOESI did have >> this problem because cache-to-cache transfers (SNOOP hit) were as >> few as 6 cycles.}} >> >> S O , What kind of barriers a relaxed memory model needs becomes >> dependent on the cache coherency model !?!?!?! How is software >> going to deal with that ?!? It them becomes dependent on the >> memory order model, as a cascade of Oh-crap-what-have-I-done-to- >> myself ... >> >> It is stuff like this that lead My 66000 to alter memory models >> as it accesses memory and mandates that all critical sections >> are denoted (.lock) at the beginning and end of the ATOMIC event. >> Thus, the programmer gets the performance of the relaxed memory >> with the sanity of sequential consistency without programmer >> inivolvement. > > Well, it depends on what you need to do. Iirc, even x86 has that WB, WC > and WT memory. It has the lfence, sfence and mfence to handle it, non- > temporal instructions. I cannot exactly remember all of it right now. > acquire-release consistency is for MOV is the way. CLFLUSH? ;^) > > I think there was a problem here, iirc something about Plan9 problem > rings a bell. I need to refresh my mind. Thanks Mitch! :^) > > Iirc, Alex Terekhov wrote about it way back on c.p.t. Releasing a spinlock on x86 can use a simple MOV instruction because it has an implied release, but that is only for WB iirc, might be wrong here. Damn it's been a long time since I have worked with the various types of memory on x86. An older paper: https://www.cs.cmu.edu/~410-f10/doc/Intel_Reordering_318147.pdf