Path: ...!weretis.net!feeder9.news.weretis.net!news.quux.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: "Chris M. Thomasson" Newsgroups: comp.arch Subject: Re: Strange asm generated by GCC... Date: Sun, 22 Dec 2024 19:55:33 -0800 Organization: A noiseless patient Spider Lines: 63 Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Date: Mon, 23 Dec 2024 04:55:33 +0100 (CET) Injection-Info: dont-email.me; posting-host="f0e44193300486c78d7a7f6e740e8e90"; logging-data="1101792"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18yFPhup3kvVyZ6sSBHB5/b/6t3g8RXUh0=" User-Agent: Mozilla Thunderbird Cancel-Lock: sha1:PNkPVIYLqliwnkzbydqoPZNZwuU= Content-Language: en-US In-Reply-To: Bytes: 3735 On 12/22/2024 7:53 PM, Chris M. Thomasson wrote: > On 12/22/2024 7:49 PM, Chris M. Thomasson wrote: >> On 12/21/2024 2:37 AM, aph@littlepinkcloud.invalid wrote: >>> jseigh wrote: >>>> On 12/19/24 19:43, Chris M. Thomasson wrote: >>>>> Why in the world would GCC use an XCHG instruction for the following >>>>> code. The damn XCHG has an implied LOCK prefix! Yikes! >>>>> >>>> >>>> Speaking of strange code >>>> >>>> #include >>>> >>>> bool test1(std::atomic var, int addend) >>>> { >>>>      int expected = var.load(std::memory_order_relaxed); >>>>      int update = expected + addend; >>>>      return var.compare_exchange_weak(expected, update, >>>> std::memory_order_acq_rel, std::memory_order_seq_cst); >>>> } >>>> >>>> This is asm for armv8-a clang 9.0.0 >>>> >>>> test1(std::atomic, int): >>>>          ldr     w8, [x0] >>>>          ldaxr   w9, [x0] >>>>          cmp     w9, w8 >>>>          b.ne    .LBB0_3 >>>>          add     w8, w8, w1 >>>>          stlxr   w9, w8, [x0] >>>>          cbz     w9, .LBB0_4 >>>>          mov     w0, wzr >>>>          ret >>>> .LBB0_3: >>>>          clrex >>>>          mov     w0, wzr >>>>          ret >>>> .LBB0_4: >>>>          mov     w0, #1 >>>>          ret >>>> >>>> I picked a version that just did ll/sc to avoid >>>> the question of whether a failed CASAL did a store or not. >>>> >>>> I don't see anything that forces a store memory barrier >>>> on all the fail paths.  I could be missing something. >>> >>> Why would there be one? If the store does not take place, there's no >>> need for a memory barrier because there's no store for anyone to >>> synchronize with. The only effect of a failed weak CAS is a load. If >>> you really need a store on failure because of its side effect you can >>> always add one. >> >> Iirc, the membars for the success and failure can be "useful" for >> popping from a lock-free stack. Wrt the C++ API the CAS can give you >> the updated value on a failure. So, there is a load. Depending on what >> you are doing, it might require an acquire. > > Loading the head of the lock-free stack would be an acquire at the start > of the CAS loop. The CAS can use relaxed for the success and an acquire > for the failure. It's been a while since I have implemented one from scratch.