Deutsch English Français Italiano |
<vj7l1b$im6l$1@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> Newsgroups: comp.arch Subject: Re: portable proxy collector test... Date: Mon, 9 Dec 2024 12:47:39 -0800 Organization: A noiseless patient Spider Lines: 95 Message-ID: <vj7l1b$im6l$1@dont-email.me> References: <viquuj$16v40$1@dont-email.me> <vivkqq$2hpdg$1@dont-email.me> <vivpcg$2in59$2@dont-email.me> <vivuml$2k486$1@dont-email.me> <vj5a7q$2b9v$3@dont-email.me> <vj6npm$d944$1@dont-email.me> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Date: Mon, 09 Dec 2024 21:47:44 +0100 (CET) Injection-Info: dont-email.me; posting-host="4ab86222a7b6c89d2f3815566551fac9"; logging-data="612565"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/pZP8lnC773Npl7ie4g0EABZoflee5F0I=" User-Agent: Mozilla Thunderbird Cancel-Lock: sha1:HyqHaDjdS0AgEHEsJuVBAkAu2NA= Content-Language: en-US In-Reply-To: <vj6npm$d944$1@dont-email.me> Bytes: 4741 On 12/9/2024 4:28 AM, jseigh wrote: > On 12/8/24 18:31, Chris M. Thomasson wrote: >> On 12/6/2024 2:43 PM, jseigh wrote: >>> On 12/6/24 16:12, Chris M. Thomasson wrote: >>>> On 12/6/2024 11:55 AM, Brett wrote: >>>>> Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote: >>>>>> I am wondering if anybody can try to compile and run this C++11 >>>>>> code of >>>>>> mine for a portable word-based proxy collector, a sort of poor >>>>>> mans RCU, >>>>>> on an ARM based system? I don't have access to one. I am >>>>>> interested in >>>>>> the resulting output. >>>>> >>>>> https://godbolt.org >>>>> >>>>>> https://pastebin.com/raw/CYZ78gVj >>>>>> (raw text link, no ads... :^) >>>> [...] >>>> >>>> It seems that all of the atomics are LDREX/STREX wrt fetch_add/sub . >>>> Even with relaxed memory order. Are the LDREX/STREX similar to the >>>> LOCK prefix on an x86/64? >>>> >>>> https://godbolt.org/z/EPGYWve71 >>>> >>>> It has loops for this in the ASM code. Adding a loop in there can >>>> change things from wait-free to lock-free. Humm... >>>> >>> Which compiler did you choose. armv8? Try ARM64. >>> >>> The newer arms have new atomics >>> cas and atomic fetch ops. >>> >>> LDREX/STREX is the older load store reserved. >>> On the newer stuff it's ldxr/stxr not ldrex/strex. >> >> Well, for a ARM64 gcc 14.2.0 a relaxed fetch_add I get >> __aarch64_ldadd8_acq_rel in the asm. >> >> https://godbolt.org/z/YzPdM8j33 >> >> acq_rel barrier for a relaxed membar? Well, that makes me go grrrrrr! >> >> It has to be akin to the LOCK prefix over on x86. I want it relaxed >> damn it! ;^) > > Apart of the memory ordering, if you are using atomic_fetch_add you > are going to get an interlocked instruction which is probably > overkill and has more overhead than you want. Atomic ops > assume other cpus might be trying atomic rmw ops on other > cpus which is not the case for userspace rcu. You want > an atomic relaxed load, and atomic relaxed store of the > incrmented value. It will be faster. I can keep the debug statistics on a per-thread basis. Instead of using a global counter, each thread has a per thread counter. Then those are all summed up at the end of the program to gain the real counts. I forgot what that was called. Split counters? It's a well known technique. These statistics are only there to give me a feel as to what is going on. They can be completely removed for a release build, so to speak. Fwiw my proxy wrt this particular test needs fetch_add for the way it acquires and releases a collector object: ______________________ collector& acquire() { // increment the master count _and_ obtain current collector. std::uint32_t current = m_current.fetch_add(ct_ref_inc, std::memory_order_acquire); // decode the collector index. return m_collectors[current & ct_proxy_mask]; } void release(collector& c) { // decrement the collector. std::uint32_t count = c.m_count.fetch_sub(ct_ref_inc, std::memory_order_release); // check for the completion of the quiescence process. if ((count & ct_ref_mask) == ct_ref_complete) { // odd reference count and drop-to-zero condition detected! g_debug_release_collect.fetch_add(1, std::memory_order_relaxed); prv_quiesce_complete(c); } } ______________________ Damn! I need to find more time to work on this. ;^o