Article <vh0jo6$1q1hl$3@dont-email.me>

Deutsch English Français Italiano
<vh0jo6$1q1hl$3@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!weretis.net!feeder9.news.weretis.net!news.quux.org!eternal-september.org!feeder2.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>
Newsgroups: comp.arch
Subject: Re: Arm ldaxr / stxr loop question
Date: Tue, 12 Nov 2024 14:10:14 -0800
Organization: A noiseless patient Spider
Lines: 32
Message-ID: <vh0jo6$1q1hl$3@dont-email.me>
References: <vfono1$14l9r$1@dont-email.me>
 <YROdnVIXfKmwYrn6nZ2dnZfqn_GdnZ2d@supernews.com>
 <vg5tf7$3tqmi$2@dont-email.me> <vgm0g1$3c2t2$3@dont-email.me>
 <zwwXO.842112$_o_3.379966@fx17.iad> <vgm4vj$3d2as$1@dont-email.me>
 <vgm5cb$3d2as$3@dont-email.me> <OnzXO.657386$1m96.281665@fx15.iad>
 <TfKXO.658488$1m96.146506@fx15.iad> <T99YO.79275$MoU3.7336@fx36.iad>
 <3lGdnVvGQIAq2676nZ2dnZfqnPGdnZ2d@supernews.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 12 Nov 2024 23:10:15 +0100 (CET)
Injection-Info: dont-email.me; posting-host="2ffdf1eeb6f7861b52a3305ad94407ae";
	logging-data="1902133"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1+nfc68ci95NDink/zW0/8E4WtenihrXS8="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:Fo50DO05EfGi8QJsOk8vTKgJg3U=
Content-Language: en-US
In-Reply-To: <3lGdnVvGQIAq2676nZ2dnZfqnPGdnZ2d@supernews.com>
Bytes: 3081

On 11/12/2024 4:14 AM, aph@littlepinkcloud.invalid wrote:
> EricP <ThatWouldBeTelling@thevillage.com> wrote:
>> Any idea what is the advantage for them having all these various
>> LDxxx and STxxx instructions that only seem to combine a LD or ST
>> with a fence instruction? Why have
>> LDAPR Load-Acquire RCpc Register
>> LDAR Load-Acquire Register
>> LDLAR LoadLOAcquire Register
>>
>> plus all the variations for byte, half, word, and pair,
>> instead of just the standard LDx and a general data fence instruction?
> 
> All this, and much more can be discovered by reading the AMBA
> specifications. However, the main point is that the content of the
> target address does not have to be transferred to the local cache:
> these are remote atomic operations. Quite nice for things like
> fire-and-forget counters, for example.
> 
>> The execution time of each is the same, and the main cost is the fence
>> synchronizing the Load Store Queue with the cache, flushing the cache
>> comms queue and waiting for all outstanding cache ops to finish.
> 
> One other thing to be aware of is that the StoreLoad barrier needed
> for sequential consistency is logically part of an LDAR, not part of a
> STLR. This is an optimization, because the purpose of a StoreLoad in
> that situation is to prevent you from seeing your own stores to a
> location before everyone else sees them.

Fwiw, even x86/x64 needs StoreLoad when an algorithm depends on a store 
followed by a load to another location to hold. LoadStore is not strong 
enough. The SMR algorithm needs that. Iirc, Peterson's algorithms needs 
it as well.