Path: ...!weretis.net!feeder9.news.weretis.net!news.quux.org!eternal-september.org!feeder2.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: jseigh Newsgroups: comp.lang.c++ Subject: Re: smrproxy v2 Date: Mon, 28 Oct 2024 21:17:55 -0400 Organization: A noiseless patient Spider Lines: 73 Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Date: Tue, 29 Oct 2024 02:17:55 +0100 (CET) Injection-Info: dont-email.me; posting-host="202aad1d58ab64e9007fba256d5e44aa"; logging-data="1317796"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18J1Ris7OMRgMHOOBPxl+uv" User-Agent: Mozilla Thunderbird Cancel-Lock: sha1:7gKZAwbG75J8e08wzbUPzWa1KSI= Content-Language: en-US In-Reply-To: Bytes: 4345 On 10/28/24 17:57, Chris M. Thomasson wrote: > On 10/28/2024 4:45 AM, jseigh wrote: >> On 10/28/24 00:02, Chris M. Thomasson wrote: >>> On 10/27/2024 5:35 PM, jseigh wrote: >>>> On 10/27/24 18:32, Chris M. Thomasson wrote: >> >>>> >>>> The membar version?  That's a store/load membar so it is expensive. >>> >>> I was wondering in your c++ version if you had to use any seq_cst >>> barriers. I think acquire/release should be good enough. Now, when I >>> say C++, I mean pure C++, no calls to FlushProcessWriteBuffers and >>> things like that. >>> >>> I take it that your pure C++ version has no atomic RMW, right? Just >>> loads and stores? >> >> While a lock action has acquire memory order semantics, if the >> implementation has internal stores, you have to those stores >> are complete before any access from the critical section. >> So you may need a store/load memory barrier. > > Wrt acquiring a lock the only class of mutex logic that comes to mind > that requires an explicit storeload style membar is Petersons, and some > others along those lines, so to speak. This is for the store and load > version. Now, RMW on x86 basically implies a StoreLoad wrt the LOCK > prefix, XCHG aside for it has an implied LOCK prefix. For instance the > original SMR algo requires a storeload as is on x86/x64. MFENCE or LOCK > prefix. > > Fwiw, my experimental pure C++ proxy works fine with XADD, or atomic > fetch-add. It needs an explicit membars (no #StoreLoad) on SPARC in RMO > mode. On x86, the LOCK prefix handles that wrt the RMW's themselves. > This is a lot different than using stores and loads. The original SMR > and Peterson's algo needs that "store followed by a load to a different > location" action to hold true, aka, storeload... > > Now, I don't think that a data-dependant load can act like a storeload. > I thought that they act sort of like an acquire, aka #LoadStore | > #LoadLoad wrt SPARC. SPARC in RMO mode honors data-dependencies. Now, > the DEC Alpha is a different story... ;^) > fwiw, here's the lock and unlock logic from smrproxy rewrite inline void lock() { epoch_t _epoch = shadow_epoch.load(std::memory_order_relaxed); _ref_epoch.store(_epoch, std::memory_order_relaxed); std::atomic_signal_fence(std::memory_order_acquire); } inline void unlock() { _ref_epoch.store(0, std::memory_order_release); } epoch_t is interesting. It's uint64_t but handles wrapped compares, ie. for an epoch_t x1 and uint64_t n x1 < (x1 + n) for any value of x1 and any value of n from 0 to 2**63; eg. 0xfffffffffffffff0 < 0x0000000000000001 The rewrite is almost complete except for some thread_local stuff. I think I might break off there. Most of the additional work is writing the test code. I'm considering rewriting it in Rust. Joe Seigh