Path: ...!eternal-september.org!feeder2.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: "Chris M. Thomasson" Newsgroups: comp.lang.c++ Subject: Re: smrproxy v2 Date: Tue, 29 Oct 2024 15:05:47 -0700 Organization: A noiseless patient Spider Lines: 77 Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Date: Tue, 29 Oct 2024 23:05:48 +0100 (CET) Injection-Info: dont-email.me; posting-host="09272f6de4c78f3ebf892cfc59a706c2"; logging-data="1827994"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+70MLWMqaF2aISrCErn8gAjosMSIY4BhY=" User-Agent: Mozilla Thunderbird Cancel-Lock: sha1:YRvZtL/IAwf7pQOufQymECUL6c4= In-Reply-To: Content-Language: en-US Bytes: 4986 On 10/28/2024 9:41 PM, Chris M. Thomasson wrote: > On 10/28/2024 6:17 PM, jseigh wrote: >> On 10/28/24 17:57, Chris M. Thomasson wrote: >>> On 10/28/2024 4:45 AM, jseigh wrote: >>>> On 10/28/24 00:02, Chris M. Thomasson wrote: >>>>> On 10/27/2024 5:35 PM, jseigh wrote: >>>>>> On 10/27/24 18:32, Chris M. Thomasson wrote: >>>> >>>>>> >>>>>> The membar version?  That's a store/load membar so it is expensive. >>>>> >>>>> I was wondering in your c++ version if you had to use any seq_cst >>>>> barriers. I think acquire/release should be good enough. Now, when >>>>> I say C++, I mean pure C++, no calls to FlushProcessWriteBuffers >>>>> and things like that. >>>>> >>>>> I take it that your pure C++ version has no atomic RMW, right? Just >>>>> loads and stores? >>>> >>>> While a lock action has acquire memory order semantics, if the >>>> implementation has internal stores, you have to those stores >>>> are complete before any access from the critical section. >>>> So you may need a store/load memory barrier. >>> >>> Wrt acquiring a lock the only class of mutex logic that comes to mind >>> that requires an explicit storeload style membar is Petersons, and >>> some others along those lines, so to speak. This is for the store and >>> load version. Now, RMW on x86 basically implies a StoreLoad wrt the >>> LOCK prefix, XCHG aside for it has an implied LOCK prefix. For >>> instance the original SMR algo requires a storeload as is on x86/x64. >>> MFENCE or LOCK prefix. >>> >>> Fwiw, my experimental pure C++ proxy works fine with XADD, or atomic >>> fetch-add. It needs an explicit membars (no #StoreLoad) on SPARC in >>> RMO mode. On x86, the LOCK prefix handles that wrt the RMW's >>> themselves. This is a lot different than using stores and loads. The >>> original SMR and Peterson's algo needs that "store followed by a load >>> to a different location" action to hold true, aka, storeload... >>> >>> Now, I don't think that a data-dependant load can act like a >>> storeload. I thought that they act sort of like an acquire, aka >>> #LoadStore | #LoadLoad wrt SPARC. SPARC in RMO mode honors data- >>> dependencies. Now, the DEC Alpha is a different story... ;^) >>> >> >> fwiw, here's the lock and unlock logic from smrproxy rewrite >> >>      inline void lock() >>      { >>          epoch_t _epoch = shadow_epoch.load(std::memory_order_relaxed); >>          _ref_epoch.store(_epoch, std::memory_order_relaxed); >>          std::atomic_signal_fence(std::memory_order_acquire); >>      } >> >>      inline void unlock() >>      { >>          _ref_epoch.store(0, std::memory_order_release); >>      } >> >> epoch_t is interesting.  It's uint64_t but handles wrapped >> compares, ie. for an epoch_t x1 and uint64_t n > > [...] > > Humm... I am not sure if it would work with just the release. The > polling thread would read from these per thread epochs, _ref_epoch, > using an acquire barrier? Still. Not sure if that would work. Need to > put my thinking cap on. ;^) Ahhh, if you are using an async membar in your upcoming C++ version, then it would be fine. No problem. A compiler fence ala atomic_signal_fence, and the the explicit release, well, it will work. I don't see why it would not work. For some reason, I thought you were going to not use an async membar in your C++ version. Sorry. However, it still would be fun to test against... ;^)