Path: ...!weretis.net!feeder9.news.weretis.net!news.quux.org!eternal-september.org!feeder2.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: jseigh <jseigh_es00@xemaps.com>
Newsgroups: comp.lang.c++
Subject: Re: smrproxy v2
Date: Mon, 28 Oct 2024 21:17:55 -0400
Organization: A noiseless patient Spider
Lines: 73
Message-ID: <vfpd43$186t4$1@dont-email.me>
References: <vequrc$2o7qc$1@dont-email.me> <verr04$2stfq$1@dont-email.me>
 <verubk$2t9bs$1@dont-email.me> <ves78h$2ugvm$2@dont-email.me>
 <vetj1f$39iuv$1@dont-email.me> <vfh4dh$3bnuq$1@dont-email.me>
 <vfh7mg$3c2hs$1@dont-email.me> <vfm4iq$ill4$1@dont-email.me>
 <vfmesn$k6mn$1@dont-email.me> <vfmf21$kavl$1@dont-email.me>
 <vfmm9a$lob3$1@dont-email.me> <vfn2di$r8ca$1@dont-email.me>
 <vfntgb$vete$1@dont-email.me> <vfp1c3$16d9f$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 29 Oct 2024 02:17:55 +0100 (CET)
Injection-Info: dont-email.me; posting-host="202aad1d58ab64e9007fba256d5e44aa";
	logging-data="1317796"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX18J1Ris7OMRgMHOOBPxl+uv"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:7gKZAwbG75J8e08wzbUPzWa1KSI=
Content-Language: en-US
In-Reply-To: <vfp1c3$16d9f$1@dont-email.me>
Bytes: 4345

On 10/28/24 17:57, Chris M. Thomasson wrote:
> On 10/28/2024 4:45 AM, jseigh wrote:
>> On 10/28/24 00:02, Chris M. Thomasson wrote:
>>> On 10/27/2024 5:35 PM, jseigh wrote:
>>>> On 10/27/24 18:32, Chris M. Thomasson wrote:
>>
>>>>
>>>> The membar version?  That's a store/load membar so it is expensive.
>>>
>>> I was wondering in your c++ version if you had to use any seq_cst 
>>> barriers. I think acquire/release should be good enough. Now, when I 
>>> say C++, I mean pure C++, no calls to FlushProcessWriteBuffers and 
>>> things like that.
>>>
>>> I take it that your pure C++ version has no atomic RMW, right? Just 
>>> loads and stores?
>>
>> While a lock action has acquire memory order semantics, if the
>> implementation has internal stores, you have to those stores
>> are complete before any access from the critical section.
>> So you may need a store/load memory barrier.
> 
> Wrt acquiring a lock the only class of mutex logic that comes to mind 
> that requires an explicit storeload style membar is Petersons, and some 
> others along those lines, so to speak. This is for the store and load 
> version. Now, RMW on x86 basically implies a StoreLoad wrt the LOCK 
> prefix, XCHG aside for it has an implied LOCK prefix. For instance the 
> original SMR algo requires a storeload as is on x86/x64. MFENCE or LOCK 
> prefix.
> 
> Fwiw, my experimental pure C++ proxy works fine with XADD, or atomic 
> fetch-add. It needs an explicit membars (no #StoreLoad) on SPARC in RMO 
> mode. On x86, the LOCK prefix handles that wrt the RMW's themselves. 
> This is a lot different than using stores and loads. The original SMR 
> and Peterson's algo needs that "store followed by a load to a different 
> location" action to hold true, aka, storeload...
> 
> Now, I don't think that a data-dependant load can act like a storeload. 
> I thought that they act sort of like an acquire, aka #LoadStore | 
> #LoadLoad wrt SPARC. SPARC in RMO mode honors data-dependencies. Now, 
> the DEC Alpha is a different story... ;^)
> 

fwiw, here's the lock and unlock logic from smrproxy rewrite

     inline void lock()
     {
         epoch_t _epoch = shadow_epoch.load(std::memory_order_relaxed);
         _ref_epoch.store(_epoch, std::memory_order_relaxed);
         std::atomic_signal_fence(std::memory_order_acquire);
     }

     inline void unlock()
     {
         _ref_epoch.store(0, std::memory_order_release);
     }

epoch_t is interesting.  It's uint64_t but handles wrapped
compares, ie. for an epoch_t x1 and uint64_t n

	x1 < (x1 + n)

for any value of x1 and any value of n from 0 to 2**63;
eg.
    0xfffffffffffffff0 < 0x0000000000000001


The rewrite is almost complete except for some thread_local
stuff.  I think I might break off there.  Most of the
additional work is writing the test code.  I'm considering
rewriting it in Rust.

Joe Seigh