Deutsch English Français Italiano |
<vbr81o$3ekr7$1@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!2.eu.feeder.erje.net!feeder.erje.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: "Paul A. Clayton" <paaronclayton@gmail.com> Newsgroups: comp.arch Subject: Re: arm ldxr/stxr vs cas Date: Wed, 11 Sep 2024 00:15:35 -0400 Organization: A noiseless patient Spider Lines: 56 Message-ID: <vbr81o$3ekr7$1@dont-email.me> References: <vb4sit$2u7e2$1@dont-email.me> <07d60bd0a63b903820013ae60792fb7a@www.novabbs.org> <vbc4u3$aj5s$1@dont-email.me> <898cf44224e9790b74a0269eddff095a@www.novabbs.org> <vbd4k1$fpn6$1@dont-email.me> <vbd91c$g5j0$1@dont-email.me> <vbm790$2atfb$2@dont-email.me> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Date: Wed, 11 Sep 2024 06:58:33 +0200 (CEST) Injection-Info: dont-email.me; posting-host="8b89b83ddd103c86af1a1fcd1397fdcc"; logging-data="3625831"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+5cPqKTir0rJVqLXyYCaeW+iC7QldRt2o=" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.0 Cancel-Lock: sha1:/xOiIGVt6JNeiuDFXDCKx4l0x2c= In-Reply-To: <vbm790$2atfb$2@dont-email.me> Bytes: 3897 On 9/9/24 3:14 AM, Terje Mathisen wrote: > jseigh wrote: >> >> I'm not so sure about making the memory lock granularity same as >> cache line size but that's an implementation decision I guess. > > Just make sure you never have multiple locks residing inside the > same cache line! Never? I suspect at least theoretically conditions could exist where having more than one lock within a cache line would be beneficial. If lock B is always acquired after lock A, then sharing a cache line might (I think) improve performance. One would lose prefetched capacity for the data protected by lock A and lock B. This assumes simple locks (e.g., not readers-writer locks). It seems to me that the pingpong problem may be less important than spatial locality depending on the contention for the cache line and the cache hierarchy locality of the contention (pingponging from a shared level of cache would be less expensive). If work behind highly active locks is preferentially or forcefully localized, pingponging would be less of a problem, it seems. Instead of an arbitrary core acquiring a lock's cache line and doing some work, the core could send a message to the natural owner of the cache line to do the work. If communication between cores was low latency and simple messages used little bandwidth, one might also conceive of having a lock manager that tracks the lock state and sends a granted or not- granted message back. This assumes that the memory location of the lock itself is separate from the data guarded by the lock. Being able to grab a snapshot of some data briefly without requiring (longer-term) ownership change might be useful even beyond lock probing (where a conventional MESI would change the M-state cache to S forcing a request for ownership when the lock is released). I recall some paper proposed expiring cache line ownership to reduce coherence overhead. Within a multiple-cache-line atomic operation/memory transaction, I _think_ if the write set is owned, the read set could be grabbed as such snapshots. I.e., I think any remote write to the read set could be "after" the atomic/transaction commits. (Such might be too difficult to get right while still providing any benefit.) (Weird side-thought: I wonder if a conservative filter might be useful for locking, particularly for writer locks. On the one hand, such would increase the pingpong in the filter when writer locks are set/cleared; on the other hand, reader locks could use a remote increment within the filter check atomic to avoid slight cache pollution.)