Path: ...!eternal-september.org!feeder2.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: "Chris M. Thomasson" Newsgroups: comp.arch Subject: Re: Arm ldaxr / stxr loop question Date: Thu, 31 Oct 2024 12:39:43 -0700 Organization: A noiseless patient Spider Lines: 58 Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Date: Thu, 31 Oct 2024 20:39:44 +0100 (CET) Injection-Info: dont-email.me; posting-host="0c8ef2e446baf75b8661713d8cf5b13e"; logging-data="2968627"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/p2WOecqU9cklfw66uRrm+WLrzilDSdRo=" User-Agent: Mozilla Thunderbird Cancel-Lock: sha1:SurXMJqDbbYoCsdsiYb4HrynwmU= Content-Language: en-US In-Reply-To: Bytes: 3021 On 10/28/2024 12:13 PM, jseigh wrote: > So if were to implement a spinlock using the above instructions > something along the lines of > > .L0 >     ldaxr    -- load lockword exclusive w/ acquire membar >     cmp      -- compare to zero >     bne  .LO -- loop if currently locked >         stxr     -- store 1 >         cbnz .LO -- retry if stxr failed > > The "lock" operation has memory order acquire semantics and > we see that in part in the ldaxr but the store isn't part > of that.  We could append an additional acquire memory barrier > but would that be necessary. I am not well versed with arm. On the sparc for locking a spinlock it basically goes like: atomic logic that locks the spinlock MEMBAR #LoadStore | #LoadLoad // critical section MEMBAR #LoadStore | #StoreStore atomic logic that unlocks the spinlock Now, this is different than some spinlock logic aka, Peterson's algorithm that requires a #StoreLoad in the atomic logic itself that actually locks the spinlock. Basically, it does the same thing that the original SMR does. A store followed by a load to a different location must hold. RMO aside, even TSO cannot handle that without a membar... > > Loads from the locked critical region could move forward of > the stxr but there's a control dependency from cbnz branch > instruction so they would be speculative loads until the > loop exited. > > You'd still potentially have loads before the store of > the lockword but in this case that's not a problem > since it's known the lockword was 0 and no stores > from prior locked code could occur. > > This should be analogous to rmw atomics like CAS but > I've no idea what the internal hardware implementations > are.  Though on platforms without CAS the C11 atomics > are implemented with LD/SC logic. > > Is this sort of what's going on or is the explicit > acquire memory barrier still needed? > > Joe Seigh >