Path: ...!eternal-september.org!feeder2.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: "Chris M. Thomasson" Newsgroups: comp.lang.c++ Subject: Re: smrproxy v2 Date: Sun, 24 Nov 2024 16:47:06 -0800 Organization: A noiseless patient Spider Lines: 77 Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Date: Mon, 25 Nov 2024 01:47:06 +0100 (CET) Injection-Info: dont-email.me; posting-host="752c11c3033f6dd442669f85f21b84b4"; logging-data="2593065"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/pmoVuKuGLXLrdz2J+nbeHYN02454ddQs=" User-Agent: Mozilla Thunderbird Cancel-Lock: sha1:kkljDszwqDUVyvx5QbVITkw6g9U= Content-Language: en-US In-Reply-To: Bytes: 4836 On 11/24/2024 4:09 PM, jseigh wrote: > On 11/24/24 18:19, Chris M. Thomasson wrote: >> On 11/24/2024 12:14 PM, jseigh wrote: >>> On 11/23/24 11:10, jseigh wrote: >>>> On 11/21/24 15:17, jseigh wrote: >>>>> On 10/17/24 08:10, jseigh wrote: >>>>>> I replaced the hazard pointer logic in smrproxy.  It's now wait-free >>>>>> instead of mostly wait-free.  The reader lock logic after loading >>>>>> the address of the reader lock object into a register is now 2 >>>>>> instructions a load followed by a store.  The unlock is same >>>>>> as before, just a store. >>>>>> >>>>>> It's way faster now. >>>>>> >>>>>> It's on the feature/003 branch as a POC.   I'm working on porting >>>>>> it to c++ and don't want to waste any more time on c version. >>>>>> >>>>>> No idea of it's a new algorithm.  I suspect that since I use >>>>>> the term epoch that it will be claimed that it's ebr, epoch >>>>>> based reclamation, and that all ebr algorithms are equivalent. >>>>>> Though I suppose you could argue it's qsbr if I point out what >>>>>> the quiescent states are. >>>>>> >>>>> >>>>> I got a port to c++ working now. There are 5 proxy implementations >>>>> 1) smrproxy v2 >>>>> 2) arcproxy - reference counted proxy >>>>> 3) rwlock based proxy >>>>> 4) mutex based proxy >>>>> 5) an unsafe proxy with no locking >>>>> >>>>> The testcase is templated so you can use any of the >>>>> 5 proxy implementations without rewriting for each proxy >>>>> type.  You can do apple to apple comparisons.  I >>>>> realize that's the complete antithesis of current >>>>> programming practices but there you have it.  :) >>>>> >>>>> A bit of clean up and performance tuning now. >>>>> >>>> >>>> Ok, smrproxy lock/unlock is down to 0.6 nanoseconds now, >>>> about what the C version was. >>>> >>> >>> I've been using cpu time to measure performance. That's ok >>> for lock-free/wait-free locking.  For normal mutexes and >>> shared locks, it doesn't measure wait time so those didn't >>> look as bad as they really were.  You can add logic >>> to measure how long it takes to acquire a lock but that >>> adds significant overhead. >> >> I remember back in the day when I was comparing and contrasting >> various lock/wait-free algorithms with their 100% lock-based counter >> parts. Some of the lock-based tests too so long that I just terminated >> the damn program. Iirc, a lock-free test would take around 5 minutes. >> The lock- based test would be around 30+ minutes. This was way back on >> c.p.t. > > I set the iteration count as a parameter.  Mutex can be particularly > slow with a lot of reader threads.  I usually see about 1000 - 10000 > times slower than smrproxy.   rwlocks aren't as bad, about 200 x > slower. > > Mutex, rwlock, and arcproxy use interlocked instructions so you > can get a really wide performance range based on cache geometry > and processor sets you run on. Actually, I remember way back where a scenario that had a lot of writes would start to mess with a deferred reclamation wrt a polling thread type of “scheme”. Too many deferred nodes would start to "pile up". Basically, the single polling thread was having trouble keeping up with all of them. The interlocked versions seemed to perform sort of "better" in a sense during periods that had a lot of frequent “writes”. Of course clever use of node caches helps heavy write periods. Anyway, some of the tests just used a mutex for writes, others used lock-free and would generate high loads of them that would push and pop nodes and defer them to the poll thread to test how much load it (poll thread) could take.