Path: ...!weretis.net!feeder9.news.weretis.net!news.quux.org!eternal-september.org!feeder2.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: "Chris M. Thomasson" Newsgroups: comp.arch Subject: Re: Arm ldaxr / stxr loop question Date: Tue, 12 Nov 2024 14:10:14 -0800 Organization: A noiseless patient Spider Lines: 32 Message-ID: References: <3lGdnVvGQIAq2676nZ2dnZfqnPGdnZ2d@supernews.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Date: Tue, 12 Nov 2024 23:10:15 +0100 (CET) Injection-Info: dont-email.me; posting-host="2ffdf1eeb6f7861b52a3305ad94407ae"; logging-data="1902133"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+nfc68ci95NDink/zW0/8E4WtenihrXS8=" User-Agent: Mozilla Thunderbird Cancel-Lock: sha1:Fo50DO05EfGi8QJsOk8vTKgJg3U= Content-Language: en-US In-Reply-To: <3lGdnVvGQIAq2676nZ2dnZfqnPGdnZ2d@supernews.com> Bytes: 3081 On 11/12/2024 4:14 AM, aph@littlepinkcloud.invalid wrote: > EricP wrote: >> Any idea what is the advantage for them having all these various >> LDxxx and STxxx instructions that only seem to combine a LD or ST >> with a fence instruction? Why have >> LDAPR Load-Acquire RCpc Register >> LDAR Load-Acquire Register >> LDLAR LoadLOAcquire Register >> >> plus all the variations for byte, half, word, and pair, >> instead of just the standard LDx and a general data fence instruction? > > All this, and much more can be discovered by reading the AMBA > specifications. However, the main point is that the content of the > target address does not have to be transferred to the local cache: > these are remote atomic operations. Quite nice for things like > fire-and-forget counters, for example. > >> The execution time of each is the same, and the main cost is the fence >> synchronizing the Load Store Queue with the cache, flushing the cache >> comms queue and waiting for all outstanding cache ops to finish. > > One other thing to be aware of is that the StoreLoad barrier needed > for sequential consistency is logically part of an LDAR, not part of a > STLR. This is an optimization, because the purpose of a StoreLoad in > that situation is to prevent you from seeing your own stores to a > location before everyone else sees them. Fwiw, even x86/x64 needs StoreLoad when an algorithm depends on a store followed by a load to another location to hold. LoadStore is not strong enough. The SMR algorithm needs that. Iirc, Peterson's algorithms needs it as well.