Deutsch English Français Italiano |
<vbrhl3$3fvib$1@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!news.mixmin.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> Newsgroups: comp.arch Subject: Re: arm ldxr/stxr vs cas Date: Wed, 11 Sep 2024 00:42:27 -0700 Organization: A noiseless patient Spider Lines: 72 Message-ID: <vbrhl3$3fvib$1@dont-email.me> References: <vb4sit$2u7e2$1@dont-email.me> <07d60bd0a63b903820013ae60792fb7a@www.novabbs.org> <vbc4u3$aj5s$1@dont-email.me> <898cf44224e9790b74a0269eddff095a@www.novabbs.org> <vbd4k1$fpn6$1@dont-email.me> <vbd91c$g5j0$1@dont-email.me> <vbm790$2atfb$2@dont-email.me> <vbr81o$3ekr7$1@dont-email.me> <vbrf73$3fb6u$2@dont-email.me> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Date: Wed, 11 Sep 2024 09:42:28 +0200 (CEST) Injection-Info: dont-email.me; posting-host="233a7ecb793af72ad112e5f4147874d3"; logging-data="3669579"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/TiSV7uYhUSUF7uO0v6swjzs3zaz9pxIY=" User-Agent: Mozilla Thunderbird Cancel-Lock: sha1:1LSl+5QEJxiJmZtopjVjyzVi2u0= In-Reply-To: <vbrf73$3fb6u$2@dont-email.me> Content-Language: en-US Bytes: 4860 On 9/11/2024 12:00 AM, Chris M. Thomasson wrote: > On 9/10/2024 9:15 PM, Paul A. Clayton wrote: >> On 9/9/24 3:14 AM, Terje Mathisen wrote: >>> jseigh wrote: >>>> >>>> I'm not so sure about making the memory lock granularity same as >>>> cache line size but that's an implementation decision I guess. >>> >>> Just make sure you never have multiple locks residing inside the same >>> cache line! >> >> Never? >> >> I suspect at least theoretically conditions could exist where >> having more than one lock within a cache line would be beneficial. >> >> If lock B is always acquired after lock A, then sharing a cache >> line might (I think) improve performance. One would lose >> prefetched capacity for the data protected by lock A and lock B. >> This assumes simple locks (e.g., not readers-writer locks). >> >> It seems to me that the pingpong problem may be less important >> than spatial locality depending on the contention for the cache >> line and the cache hierarchy locality of the contention >> (pingponging from a shared level of cache would be less >> expensive). >> >> If work behind highly active locks is preferentially or forcefully >> localized, pingponging would be less of a problem, it seems. >> Instead of an arbitrary core acquiring a lock's cache line and >> doing some work, the core could send a message to the natural owner of >> the cache line to do the work. >> >> If communication between cores was low latency and simple messages >> used little bandwidth, one might also conceive of having a lock >> manager that tracks the lock state and sends a granted or not- >> granted message back. This assumes that the memory location of the >> lock itself is separate from the data guarded by the lock. >> >> Being able to grab a snapshot of some data briefly without >> requiring (longer-term) ownership change might be useful even >> beyond lock probing (where a conventional MESI would change the >> M-state cache to S forcing a request for ownership when the lock >> is released). I recall some paper proposed expiring cache line >> ownership to reduce coherence overhead. >> >> Within a multiple-cache-line atomic operation/memory transaction, >> I _think_ if the write set is owned, the read set could be grabbed >> as such snapshots. I.e., I think any remote write to the read set >> could be "after" the atomic/transaction commits. (Such might be >> too difficult to get right while still providing any benefit.) >> >> (Weird side-thought: I wonder if a conservative filter might be >> useful for locking, particularly for writer locks. On the one >> hand, such would increase the pingpong in the filter when writer >> locks are set/cleared; on the other hand, reader locks could use >> a remote increment within the filter check atomic to avoid slight >> cache pollution.) > > Generally one wants the mutex state to be completely isolated. Padded up > to at least a L2 cache line, or if using LL/SC perhaps even a > reservation granule... Not only properly padded, but correctly aligned > on a L2 cache line or a reservation granule boundary. This helps prevent > false sharing and makes life a little better for the underlying > architecture... > > You also don't want mutex traffic to interfere with the critical > section, or locked region if you will... Another little trick... Sometimes when we over-allocate and align on a large enough boundary, we can steal some bits of the pointers... They can be used for fun things indeed... ;^)