Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Terje Mathisen <terje.mathisen@tmsw.no>
Newsgroups: comp.arch
Subject: Re: arm ldxr/stxr vs cas
Date: Tue, 10 Sep 2024 11:22:05 +0200
Organization: A noiseless patient Spider
Lines: 58
Message-ID: <vbp33u$2sr79$1@dont-email.me>
References: <vb4sit$2u7e2$1@dont-email.me>
 <07d60bd0a63b903820013ae60792fb7a@www.novabbs.org>
 <vbc4u3$aj5s$1@dont-email.me>
 <898cf44224e9790b74a0269eddff095a@www.novabbs.org>
 <vbd4k1$fpn6$1@dont-email.me> <vbd91c$g5j0$1@dont-email.me>
 <vbm790$2atfb$2@dont-email.me> <vbn0o6$2ed30$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: quoted-printable
Injection-Date: Tue, 10 Sep 2024 11:22:06 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="4e425136c4e077c3128c4fa4604994da";
	logging-data="3042537"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1+87ZDXTTWdGeFyOBriZXMBkCbW3ACZUuy47MKIq8RVrA=="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
 Firefox/91.0 SeaMonkey/2.53.18.2
Cancel-Lock: sha1:qG+2tB0/ZOL+I1Lj1FePIqEEgks=
In-Reply-To: <vbn0o6$2ed30$1@dont-email.me>
Bytes: 3431

jseigh wrote:
> On 9/9/24 03:14, Terje Mathisen wrote:
>> jseigh wrote:
>>>
>>> I'm not so sure about making the memory lock granularity same as
>>> cache line size but that's an implementation decision I guess.
>>
>> Just make sure you never have multiple locks residing inside the same =

>> cache line!
>=20
> This the the terminology ARM uses when describing their LL/SC
> implementation.=C2=A0 It is not the best choice in terminology.
>>
>>>
>>> I do like the idea of detecting potential contention at the
>>> start of LL/SC so you can do back off.=C3=82=C2=A0 Right now the only=
 way I
>>> can detect contention is after the fact when the CAS fails and
>>> I probably have the cache line exclusive at that point.=C3=82=C2=A0 I=
t's
>>> pretty problematic.
>>
>> I do prefer LOCK XADD instead of CAS (CmpXchg*), because the return=20
>> value will also tell you which queue entry to pick/work on.
>>
>> It will not be optimal when really contended, but at least one=20
>> participant will make forward progress, and typically several of them.=

>=20
>=20
> I'm not aware of any lock-free queue algorithms that use
> atomic_fetch_add that are actually lock-free, error free,
> and/or don't have an ABA problem.=C2=A0 I'm not saying there
> aren't, just that I'm not aware of them.

I'm not sure either: I have written test code that should be general=20
purpose but never actually used that in any production systems.

The one time I used this for critical work (a maximum throughput ntpd=20
server) I had a single writer and N-1 readers, so I actually decided to=20
create one queue per reader (cpu core), making them much easier to get=20
right. :-)

Each queue was a power of two in length, the write and read indices were =

full 32-bit unsigned variables that I masked before accessing the work=20
list, so I never needed to worry about queue end wraparound.

The writer simply allocated work to each queue in round robin fashion.

Terje


--=20
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"