Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: jseigh Newsgroups: comp.arch Subject: Re: Strange asm generated by GCC... Date: Tue, 24 Dec 2024 08:26:05 -0500 Organization: A noiseless patient Spider Lines: 66 Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Date: Tue, 24 Dec 2024 14:26:06 +0100 (CET) Injection-Info: dont-email.me; posting-host="09c1434184498cd3069a607d4890ad61"; logging-data="1970909"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/T1R0cIZbaEcnZudeVzn5I" User-Agent: Mozilla Thunderbird Cancel-Lock: sha1:DNXu6SMdAOCAt7k5/6L/cEVwdJ4= In-Reply-To: Content-Language: en-US Bytes: 3097 On 12/23/24 21:35, Chris M. Thomasson wrote: > On 12/23/2024 5:53 PM, Chris M. Thomasson wrote: >> On 12/23/2024 5:16 PM, jseigh wrote: >>> >>> You are probably ok using relaxed loading the old value.  It's not >>> real clear how aggressive the compiler is allowed to be with relaxed >>> loads and stores.  To be super safe, you might want to add acquire >>> to all your cas loops. I usually use signal fences in loops w/ relaxed atomics. >>> >>> I would just stick with the compare_exchange w/ 1 memory order >>> parameter.  The success/fail form is just confusing, the fail >>> parameter doesn't do anything. >>> >>> >> > > Actually, can the acquire be relaxed into a consume? Compare_exchange is 2 ops. A load which happens on success and fail paths. A store which effectively only happens on success path. The memory barrier argument is decomposed into the what is valid for a load and a store respectively. The 2nd memory barrier appears to be redundant. So for arm w/o cas #include bool try_add(std::atomic& var, int addend) { int expected = var.load(std::memory_order_relaxed); int update = expected + addend; return var.compare_exchange_weak(expected, update, std::memory_order_acq_rel); } try_add(std::atomic&, int): ldr w8, [x0] ldaxr w9, [x0] cmp w9, w8 b.ne .LBB0_3 add w8, w8, w1 stlxr w9, w8, [x0] cbz w9, .LBB0_4 mov w0, wzr ret ..LBB0_3: clrex mov w0, wzr ret ..LBB0_4: mov w0, #1 ret You can see the load has acquire and the store has the release. You'd get the same thing even if you used seq_cst. Joe Seigh