Path: ...!eternal-september.org!feeder2.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>
Newsgroups: comp.arch
Subject: Re: Arm ldaxr / stxr loop question
Date: Thu, 31 Oct 2024 12:39:43 -0700
Organization: A noiseless patient Spider
Lines: 58
Message-ID: <vg0me0$2qj1j$5@dont-email.me>
References: <vfono1$14l9r$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 31 Oct 2024 20:39:44 +0100 (CET)
Injection-Info: dont-email.me; posting-host="0c8ef2e446baf75b8661713d8cf5b13e";
	logging-data="2968627"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1/p2WOecqU9cklfw66uRrm+WLrzilDSdRo="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:SurXMJqDbbYoCsdsiYb4HrynwmU=
Content-Language: en-US
In-Reply-To: <vfono1$14l9r$1@dont-email.me>
Bytes: 3021

On 10/28/2024 12:13 PM, jseigh wrote:
> So if were to implement a spinlock using the above instructions
> something along the lines of
> 
> .L0
>      ldaxr    -- load lockword exclusive w/ acquire membar
>      cmp      -- compare to zero
>      bne  .LO -- loop if currently locked
>          stxr     -- store 1
>          cbnz .LO -- retry if stxr failed
> 
> The "lock" operation has memory order acquire semantics and
> we see that in part in the ldaxr but the store isn't part
> of that.  We could append an additional acquire memory barrier
> but would that be necessary.

I am not well versed with arm. On the sparc for locking a spinlock it 
basically goes like:

atomic logic that locks the spinlock
   MEMBAR #LoadStore | #LoadLoad

     // critical section

   MEMBAR #LoadStore | #StoreStore
atomic logic that unlocks the spinlock


Now, this is different than some spinlock logic aka, Peterson's 
algorithm that requires a #StoreLoad in the atomic logic itself that 
actually locks the spinlock. Basically, it does the same thing that the 
original SMR does. A store followed by a load to a different location 
must hold. RMO aside, even TSO cannot handle that without a membar...



> 
> Loads from the locked critical region could move forward of
> the stxr but there's a control dependency from cbnz branch
> instruction so they would be speculative loads until the
> loop exited.
> 
> You'd still potentially have loads before the store of
> the lockword but in this case that's not a problem
> since it's known the lockword was 0 and no stores
> from prior locked code could occur.
> 
> This should be analogous to rmw atomics like CAS but
> I've no idea what the internal hardware implementations
> are.  Though on platforms without CAS the C11 atomics
> are implemented with LD/SC logic.
> 
> Is this sort of what's going on or is the explicit
> acquire memory barrier still needed?
> 
> Joe Seigh
>