Path: ...!news.mixmin.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: jseigh Newsgroups: comp.arch Subject: Re: arm ldxr/stxr vs cas Date: Tue, 10 Sep 2024 10:40:05 -0400 Organization: A noiseless patient Spider Lines: 75 Message-ID: References: <07d60bd0a63b903820013ae60792fb7a@www.novabbs.org> <898cf44224e9790b74a0269eddff095a@www.novabbs.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Date: Tue, 10 Sep 2024 16:40:06 +0200 (CEST) Injection-Info: dont-email.me; posting-host="962e6e0f08d33e3da6320defbc27b398"; logging-data="3166338"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/fiTC+bfy3NJpg5LGJNbJE" User-Agent: Mozilla Thunderbird Cancel-Lock: sha1:9LXj4aBBeCSQKVpprXV8mjY8LxY= Content-Language: en-US In-Reply-To: Bytes: 3963 On 9/9/24 17:09, Chris M. Thomasson wrote: > On 9/9/2024 7:29 AM, jseigh wrote: >> On 9/9/24 03:14, Terje Mathisen wrote: >>> jseigh wrote: >>>> >>>> I'm not so sure about making the memory lock granularity same as >>>> cache line size but that's an implementation decision I guess. >>> >>> Just make sure you never have multiple locks residing inside the same >>> cache line! >> >> This the the terminology ARM uses when describing their LL/SC >> implementation.  It is not the best choice in terminology. >>> >>>> >>>> I do like the idea of detecting potential contention at the >>>> start of LL/SC so you can do back off.  Right now the only way I >>>> can detect contention is after the fact when the CAS fails and >>>> I probably have the cache line exclusive at that point.  It's >>>> pretty problematic. >>> >>> I do prefer LOCK XADD instead of CAS (CmpXchg*), because the return >>> value will also tell you which queue entry to pick/work on. >>> >>> It will not be optimal when really contended, but at least one >>> participant will make forward progress, and typically several of them. >> >> >> I'm not aware of any lock-free queue algorithms that use >> atomic_fetch_add that are actually lock-free, error free, >> and/or don't have an ABA problem.  I'm not saying there >> aren't, just that I'm not aware of them. > > Here is an interesting one I did. A tweak from another algorithm. > Basically a bakery algorithm: > > > ______________________________________________ > struct cell { uint32_t ver; double state; }; > > uint32_t head = 0; > uint32_t tail = 0; > cell cells[N]; // N must be a power of 2 > > void init() { >     for (uint32_t i = 0; i < N; ++i) cells[i].ver = i; > } > > void producer(double state) { >     uint32_t ver = XADD(&head, 1); >     cell& c = cells[ver & (N - 1)]; >     while (LOAD(&c.ver) != ver) backoff(); >     c.state = state; >     STORE(&c.ver, ver + 1); > } > > double consumer() { >     uint32_t ver = XADD(&tail, 1); >     cell& c = cells[ver & (N - 1)]; >     while (LOAD(&c.ver) != ver + 1) backoff(); >     double state = c.state; >     STORE(&c.ver, ver + N); >     return state; > } > ______________________________________________ One of the things I have in my implementation is that an enqueue operation can detect if wrap occurred if it got a interrupted by a time slice end and log how far behind it got. I'm seeing about 200,000 enqueue operations in those cases. That would have been a huge performance hit if my queue wasn't lock-free. Joe Seigh