Path: ...!eternal-september.org!feeder2.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: "Chris M. Thomasson" Newsgroups: comp.arch Subject: Re: Arm ldaxr / stxr loop question Date: Sat, 2 Nov 2024 12:10:30 -0700 Organization: A noiseless patient Spider Lines: 51 Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Date: Sat, 02 Nov 2024 20:10:31 +0100 (CET) Injection-Info: dont-email.me; posting-host="6e8fcc6cbd8587a7e26447d4479abcfd"; logging-data="4123346"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+2nGWY8FBWt1uPHwAz6cBwOLyj3cwAeaI=" User-Agent: Mozilla Thunderbird Cancel-Lock: sha1:KuCROai9x9Zvldub8YMmO+vHdnU= Content-Language: en-US In-Reply-To: Bytes: 2771 On 11/1/2024 9:17 AM, aph@littlepinkcloud.invalid wrote: > jseigh wrote: >> So if were to implement a spinlock using the above instructions >> something along the lines of >> >> .L0 >> ldaxr -- load lockword exclusive w/ acquire membar >> cmp -- compare to zero >> bne .LO -- loop if currently locked >> stxr -- store 1 >> cbnz .LO -- retry if stxr failed >> >> The "lock" operation has memory order acquire semantics and >> we see that in part in the ldaxr but the store isn't part >> of that. We could append an additional acquire memory barrier >> but would that be necessary. > > After the store exclusive, you mean? No, it would not be necessary. Ahhhh! I just learned something about ARM right here. I am so used to the acquire membar being placed _after_ the atomic logic that locks the spinlock. >> .L0 >> ldaxr -- load lockword exclusive w/ acquire membar >> cmp -- compare to zero >> bne .LO -- loop if currently locked >> stxr -- store 1 >> cbnz .LO -- retry if stxr failed So this acts just like a SPARC style: atomically_lock_spinlock(); membar #LoadStore | #LoadLoad right? > >> This should be analogous to rmw atomics like CAS but >> I've no idea what the internal hardware implementations >> are. Though on platforms without CAS the C11 atomics >> are implemented with LD/SC logic. >> >> Is this sort of what's going on or is the explicit >> acquire memory barrier still needed? > > All of the implementations of things like POSIX mutexes I've seen on > AArch64 use acquire alone.