Article <vbflk4$uc98$1@dont-email.me>

Deutsch English Français Italiano
<vbflk4$uc98$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!news.nobody.at!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>
Newsgroups: comp.arch
Subject: Re: arm ldxr/stxr vs cas
Date: Fri, 6 Sep 2024 12:36:36 -0700
Organization: A noiseless patient Spider
Lines: 69
Message-ID: <vbflk4$uc98$1@dont-email.me>
References: <vb4sit$2u7e2$1@dont-email.me>
 <07d60bd0a63b903820013ae60792fb7a@www.novabbs.org>
 <vbc4u3$aj5s$1@dont-email.me>
 <898cf44224e9790b74a0269eddff095a@www.novabbs.org>
 <vbd4k1$fpn6$1@dont-email.me> <vbd91c$g5j0$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 06 Sep 2024 21:36:37 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="036d8e7d2f74c5eb41569f4eb438368f";
	logging-data="995624"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1+44EmXjriL5cdLvLSDENMIc6NO3YUQYFc="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:HD61rmsaOmmnWF0LX+LtRlxdd/I=
Content-Language: en-US
In-Reply-To: <vbd91c$g5j0$1@dont-email.me>
Bytes: 4659

On 9/5/2024 2:49 PM, jseigh wrote:
> On 9/5/24 16:34, Chris M. Thomasson wrote:
>> On 9/5/2024 12:46 PM, MitchAlsup1 wrote:
>>> On Thu, 5 Sep 2024 11:33:23 +0000, jseigh wrote:
>>>
>>>> On 9/4/2024 5:27 PM, MitchAlsup1 wrote:
>>>>> On Mon, 2 Sep 2024 17:27:57 +0000, jseigh wrote:
>>>>>
>>>>>> I read that arm added the cas instruction because they didn't think
>>>>>> ldxr/stxr would scale well.  It wasn't clear to me as to why that
>>>>>> would be the case.  I would think the memory lock mechanism would
>>>>>> have really low overhead vs cas having to do an interlocked load
>>>>>> and store.  Unless maybe the memory lock size might be large
>>>>>> enough to cause false sharing issues.  Any ideas?
>>>>>
>>>>> A pipeline lock between the LD part of a CAS and the ST part of a
>>>>> CAS is essentially FREE. But the same is true for LL followed by
>>>>> a later SC.
>>>>>
>>>>> Older machines with looser than sequential consistency memory models
>>>>> and running OoO have a myriad of problems with LL - SC. This is
>>>>> why My 66000 architecture switches from causal consistency to
>>>>> sequential consistency when it encounters <effectively> LL and
>>>>> switches bac after seeing SC.
>>>>>
>>>>> No Fences necessary with causal consistency.
>>>>>
>>>>
>>>> I'm not sure I entirely follow.  I was thinking of the effects on
>>>> cache.  In theory the SC could fail without having get the current
>>>> cache line exclusive or at all.  CAS has to get it exclusive before
>>>> it can definitively fail.
>>>
>>> A LL that takes a miss in L1 will perform a fetch with intent to modify,
>>> so will a CAS. However, LL is allowed to silently fail if exclusive is
>>> not returned from its fetch, deferring atomic failure to SC, while CAS
>>> will fail when exclusive fails to return. 
>>
>> CAS should only fail when the comparands are not equal to each other. 
>> Well, then there is the damn weak and strong CAS in C++11... ;^o
>>
>>
>>> LL-SC is designed so that
>>> when a failure happens, failure is visible at SC not necessarily at LL.
>>>
>>> There are coherence protocols that allows the 2nd party to determine
>>> if it returns exclusive or not. The example I know is when the 2nd
>>> party is already performing an atomic event and it is better to fail
>>> the starting atomic event than to fail an ongoing atomic event.
>>> In My 66000 the determination is made under the notion of priority::
>>> the higher priority thread is allows to continue while the lower
>>> priority thread takes the failure. The higher priority thread can
>>> be the requestor (1st party) or the holder of data (2nd party)
>>> while all interested observers (3rd parties) are in a position
>>> to see what transpired and act accordingly (causal).
>>>
> 
> I'm not so sure about making the memory lock granularity same as
> cache line size but that's an implementation decision I guess.
> 
> I do like the idea of detecting potential contention at the
> start of LL/SC so you can do back off.  Right now the only way I
> can detect contention is after the fact when the CAS fails and
> I probably have the cache line exclusive at that point.  It's
> pretty problematic.

I wonder if the ability to determine why a "weak" CAS failed might help. 
They (weak) can fail for other reasons besides comparing comparands... 
Well, would be a little too low level for a general atomic op in C/C++11?