Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Terje Mathisen Newsgroups: comp.arch Subject: Re: arm ldxr/stxr vs cas Date: Tue, 10 Sep 2024 11:22:05 +0200 Organization: A noiseless patient Spider Lines: 58 Message-ID: References: <07d60bd0a63b903820013ae60792fb7a@www.novabbs.org> <898cf44224e9790b74a0269eddff095a@www.novabbs.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable Injection-Date: Tue, 10 Sep 2024 11:22:06 +0200 (CEST) Injection-Info: dont-email.me; posting-host="4e425136c4e077c3128c4fa4604994da"; logging-data="3042537"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+87ZDXTTWdGeFyOBriZXMBkCbW3ACZUuy47MKIq8RVrA==" User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Firefox/91.0 SeaMonkey/2.53.18.2 Cancel-Lock: sha1:qG+2tB0/ZOL+I1Lj1FePIqEEgks= In-Reply-To: Bytes: 3431 jseigh wrote: > On 9/9/24 03:14, Terje Mathisen wrote: >> jseigh wrote: >>> >>> I'm not so sure about making the memory lock granularity same as >>> cache line size but that's an implementation decision I guess. >> >> Just make sure you never have multiple locks residing inside the same = >> cache line! >=20 > This the the terminology ARM uses when describing their LL/SC > implementation.=C2=A0 It is not the best choice in terminology. >> >>> >>> I do like the idea of detecting potential contention at the >>> start of LL/SC so you can do back off.=C3=82=C2=A0 Right now the only= way I >>> can detect contention is after the fact when the CAS fails and >>> I probably have the cache line exclusive at that point.=C3=82=C2=A0 I= t's >>> pretty problematic. >> >> I do prefer LOCK XADD instead of CAS (CmpXchg*), because the return=20 >> value will also tell you which queue entry to pick/work on. >> >> It will not be optimal when really contended, but at least one=20 >> participant will make forward progress, and typically several of them.= >=20 >=20 > I'm not aware of any lock-free queue algorithms that use > atomic_fetch_add that are actually lock-free, error free, > and/or don't have an ABA problem.=C2=A0 I'm not saying there > aren't, just that I'm not aware of them. I'm not sure either: I have written test code that should be general=20 purpose but never actually used that in any production systems. The one time I used this for critical work (a maximum throughput ntpd=20 server) I had a single writer and N-1 readers, so I actually decided to=20 create one queue per reader (cpu core), making them much easier to get=20 right. :-) Each queue was a power of two in length, the write and read indices were = full 32-bit unsigned variables that I masked before accessing the work=20 list, so I never needed to worry about queue end wraparound. The writer simply allocated work to each queue in round robin fashion. Terje --=20 - "almost all programming can be viewed as an exercise in caching"