Path: ...!weretis.net!feeder9.news.weretis.net!i2pn.org!i2pn2.org!.POSTED!not-for-mail From: mitchalsup@aol.com (MitchAlsup1) Newsgroups: comp.arch Subject: Re: arm ldxr/stxr vs cas Date: Fri, 6 Sep 2024 19:57:57 +0000 Organization: Rocksolid Light Message-ID: <352e80684e75a2c0a298b84e4bf840c4@www.novabbs.org> References: <07d60bd0a63b903820013ae60792fb7a@www.novabbs.org> <898cf44224e9790b74a0269eddff095a@www.novabbs.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Info: i2pn2.org; logging-data="1098587"; mail-complaints-to="usenet@i2pn2.org"; posting-account="65wTazMNTleAJDh/pRqmKE7ADni/0wesT78+pyiDW8A"; User-Agent: Rocksolid Light X-Spam-Checker-Version: SpamAssassin 4.0.0 X-Rslight-Posting-User: ac58ceb75ea22753186dae54d967fed894c3dce8 X-Rslight-Site: $2y$10$TGNUKhZET5QsUgRjWuFwZew8KfN78HijOUIxmXK1jLku/kTjWMTXe Bytes: 4872 Lines: 75 On Fri, 6 Sep 2024 19:36:36 +0000, Chris M. Thomasson wrote: > On 9/5/2024 2:49 PM, jseigh wrote: >> On 9/5/24 16:34, Chris M. Thomasson wrote: >>> On 9/5/2024 12:46 PM, MitchAlsup1 wrote: >>>> On Thu, 5 Sep 2024 11:33:23 +0000, jseigh wrote: >>>> >>>>> On 9/4/2024 5:27 PM, MitchAlsup1 wrote: >>>>>> On Mon, 2 Sep 2024 17:27:57 +0000, jseigh wrote: >>>>>> >>>>>>> I read that arm added the cas instruction because they didn't think >>>>>>> ldxr/stxr would scale well.  It wasn't clear to me as to why that >>>>>>> would be the case.  I would think the memory lock mechanism would >>>>>>> have really low overhead vs cas having to do an interlocked load >>>>>>> and store.  Unless maybe the memory lock size might be large >>>>>>> enough to cause false sharing issues.  Any ideas? >>>>>> >>>>>> A pipeline lock between the LD part of a CAS and the ST part of a >>>>>> CAS is essentially FREE. But the same is true for LL followed by >>>>>> a later SC. >>>>>> >>>>>> Older machines with looser than sequential consistency memory models >>>>>> and running OoO have a myriad of problems with LL - SC. This is >>>>>> why My 66000 architecture switches from causal consistency to >>>>>> sequential consistency when it encounters LL and >>>>>> switches bac after seeing SC. >>>>>> >>>>>> No Fences necessary with causal consistency. >>>>>> >>>>> >>>>> I'm not sure I entirely follow.  I was thinking of the effects on >>>>> cache.  In theory the SC could fail without having get the current >>>>> cache line exclusive or at all.  CAS has to get it exclusive before >>>>> it can definitively fail. >>>> >>>> A LL that takes a miss in L1 will perform a fetch with intent to modify, >>>> so will a CAS. However, LL is allowed to silently fail if exclusive is >>>> not returned from its fetch, deferring atomic failure to SC, while CAS >>>> will fail when exclusive fails to return. >>> >>> CAS should only fail when the comparands are not equal to each other. >>> Well, then there is the damn weak and strong CAS in C++11... ;^o >>> >>> >>>> LL-SC is designed so that >>>> when a failure happens, failure is visible at SC not necessarily at LL. >>>> >>>> There are coherence protocols that allows the 2nd party to determine >>>> if it returns exclusive or not. The example I know is when the 2nd >>>> party is already performing an atomic event and it is better to fail >>>> the starting atomic event than to fail an ongoing atomic event. >>>> In My 66000 the determination is made under the notion of priority:: >>>> the higher priority thread is allows to continue while the lower >>>> priority thread takes the failure. The higher priority thread can >>>> be the requestor (1st party) or the holder of data (2nd party) >>>> while all interested observers (3rd parties) are in a position >>>> to see what transpired and act accordingly (causal). >>>> >> >> I'm not so sure about making the memory lock granularity same as >> cache line size but that's an implementation decision I guess. >> >> I do like the idea of detecting potential contention at the >> start of LL/SC so you can do back off.  Right now the only way I >> can detect contention is after the fact when the CAS fails and >> I probably have the cache line exclusive at that point.  It's >> pretty problematic. > > I wonder if the ability to determine why a "weak" CAS failed might help. > They (weak) can fail for other reasons besides comparing comparands... > Well, would be a little too low level for a general atomic op in > C/C++11? One can detect that the CAS-line is no longer exclusive as a form of weak failure, rather than waiting for the data to show up and fail strongly on the compare.