Deutsch English Français Italiano |
<vk4jch$3k04r$3@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> Newsgroups: comp.arch Subject: Re: Memory ordering Date: Fri, 20 Dec 2024 12:17:21 -0800 Organization: A noiseless patient Spider Lines: 121 Message-ID: <vk4jch$3k04r$3@dont-email.me> References: <vfono1$14l9r$1@dont-email.me> <vh4530$2mar5$1@dont-email.me> <-rKdnTO4LdoWXKj6nZ2dnZfqnPWdnZ2d@supernews.com> <vh5t5b$312cl$2@dont-email.me> <5yqdnU9eL_Y_GKv6nZ2dnZfqn_GdnZ2d@supernews.com> <2024Nov15.082512@mips.complang.tuwien.ac.at> <vh7ak1$3cm56$1@dont-email.me> <20241115152459.00004c86@yahoo.com> <vh8bn7$3j6ql$1@dont-email.me> <vhb2dc$73fe$1@dont-email.me> <vhct2q$lk1b$2@dont-email.me> <2024Nov17.161752@mips.complang.tuwien.ac.at> <vhh16e$1lp5h$1@dont-email.me> <2024Dec3.100144@mips.complang.tuwien.ac.at> <vin2rp$3ofc$1@dont-email.me> <3aa9f0a3d3dde86193abb1c01e52d03a@www.novabbs.org> <jwvser449xz.fsf-monnier+comp.arch@gnu.org> <vipv2t$v57m$1@dont-email.me> <virlki$1fhli$1@dont-email.me> <ad8ce8000ff1a5a708d3cca330b5861e@www.novabbs.org> <vk22kr$31esr$1@dont-email.me> <9Kj9P.52334$bYV2.47745@fx17.iad> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Date: Fri, 20 Dec 2024 21:17:22 +0100 (CET) Injection-Info: dont-email.me; posting-host="90d0359d260b5ab20f7104a3aa6ad836"; logging-data="3801243"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18SmVPZSwuF9YFbL0062tHFf4ly8Sn7lGU=" User-Agent: Mozilla Thunderbird Cancel-Lock: sha1:oB/4R+palC5EeqyvERx8B237liE= In-Reply-To: <9Kj9P.52334$bYV2.47745@fx17.iad> Content-Language: en-US Bytes: 7366 On 12/20/2024 11:39 AM, EricP wrote: > Chris M. Thomasson wrote: >> On 12/19/2024 10:33 AM, MitchAlsup1 wrote: >>> On Thu, 5 Dec 2024 7:44:19 +0000, Chris M. Thomasson wrote: >>> >>>> On 12/4/2024 8:13 AM, jseigh wrote: >>>>> On 12/3/24 18:37, Stefan Monnier wrote: >>>>>>>> If there are places >>>>>>>> in the code it doesn't know this can't happen it won't optimize >>>>>>>> across it, more or less. >>>>>>> >>>>>>> The problem is HOW to TELL the COMPILER that these memory references >>>>>>> are "more special" than normal--when languages give few mechanisms. >>>>>> >>>>>> We could start with something like >>>>>> >>>>>> critical_region { >>>>>> ... >>>>>> } >>>>>> >>>>>> such that the compiler must refrain from any code motion within >>>>>> those sections but is free to move things outside of those >>>>>> sections as >>>>>> if >>>>>> execution was singlethreaded. >>>>>> >>>>> >>>>> C/C++11 already defines what lock acquire/release semantics are. >>>>> Roughly you can move stuff outside of a critical section into it >>>>> but not vice versa. >>>>> >>>>> Java uses synchronized blocks to denote the critical section. >>>>> C++ (the society for using RAII for everything) has scoped_lock >>>>> if you want to use RAII for your critical section. It's not >>>>> always obvious what the actual critical section is. I usually >>>>> use it inside its own bracket section to make it more obvious. >>>>> { std::scoped_lock m(mutex); >>>>> // .. critical section >>>>> } >>>>> >>>>> I'm not a big fan of c/c++ using acquire and release memory order >>>>> directives on everything since apart from a few situations it's >>>>> not intuitively obvious what they do in all cases. You can >>>>> look a compiler assembler output but you have to be real careful >>>>> generalizing from what you see. >>>> >>>> The release on the unlock can allow some following stores and things to >>>> sort of "bubble up before it? >>>> >>>> Acquire and release confines things to the "critical section", the >>>> release can allow for some following things to go above it, so to >>>> speak. >>>> This is making me think of Alex over on c.p.t. ! >>> >>> This sounds dangerous if the thing allowed to go above it is unCacheable >>> while the lock:release is cacheable, the cacheable lock can arrive at >>> another core before the unCacheable store arrives at its destination. >> >> Humm... Need to ponder on that. Wrt the sparc: >> >> membar #LoadStore | #StoreStore >> >> can allow following stores to bubble up before it. If we want to block >> that then we would use a #StoreLoad. However, a #StoreLoad is not >> required for unlocking a mutex. > > I had an idea a few weeks back of a different way to do membars > that should be more flexible and controllable (if that's a good thing) > so I thought I'd toss it out there for comments. > > This hypothetical ISA has normal LD and ST instructions, to which I > would add a LW Load for Write instruction to optimize moving shared lines > between caches. There are also the Atomic Fetch and OP instructions > AFADD, AFAND, AFOR, AFXOR, plus ASWAP and ACAS, LL Load Locked and > SC Store Conditional, for various size of naturally aligned data, > and with various address modes. > > Here is the new part: > > To the above instructions is added a 3-bit Coherence Group (CG) field. > This allows one to specify different groups that various above data > accesses belong to. > > The ISA has a membar instruction: MBG Memory Barrier for Group > > MBG has three fields: > - one 4-bit field where each bit enables which operations this barrier > applies to, in older-younger order: Load-Load, Load-Store, Store-Load, > and Store-Store. > - two 8-bit fields where each bit selects which sets of Coherence Group(s) > this barrier applies to, one field for the older (before the membar) sets, > one for the younger (after the membar) sets. > > Also the Load Store Queue is assumed to be self coherent - that loads > and stores to the same address by a single core are performed in order, > and that nothing can bypass a load or store with an unresolved address. > > The CG numbers are assigned by convention, probably by the OS designers > when they define the ABI for this ISA. > Here I assigned CG:0 to be thread normal access, CG:1 to be atomic items, > CG:2 to be shared memory sections. The remaining 5 CG's can be used to > indicate different shared memory sections if their locks can overlap. > > Eg. An MBG with op bits for Load-Load and Load-Store, with a before CG of 1 > and after CG's 3 and 4 would block all younger loads and stores in groups > 3 and 4 from starting execution until all older loads in group 1 completed. > Loads and stores in all other groups are free to reorder, within the > LSQ self coherence rules. > An MBG with all op bits and all CG bits set is a full membar. > > Also if one is say juggling multiple shared sections with multiple > spinlocks or mutexes, then one can use multiple membars applied to > different groups to achieve specific bypassing blocking effects. > > An MBG instruction completes and retires when no older groups of > selected loads or stores are incomplete. > > Interesting! I wrote about so-called "tagged" memory order a while back on this group. Just shooting the breeze, so to speak. Having some fun.