Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: "Chris M. Thomasson" Newsgroups: comp.arch Subject: Re: Is Intel exceptionally unsuccessful as an architecture designer? Date: Sun, 22 Sep 2024 12:55:45 -0700 Organization: A noiseless patient Spider Lines: 67 Message-ID: References: <21028ed32d20f0eea9a754fafdb64e45@www.novabbs.org> <20240918190027.00003e4e@yahoo.com> <920c561c4e39e91d3730b6aab103459b@www.novabbs.org> <%dAHO.54667$S9Vb.39628@fx45.iad> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Date: Sun, 22 Sep 2024 21:55:46 +0200 (CEST) Injection-Info: dont-email.me; posting-host="0029950ff4e92ba21a7d99fa35b943c5"; logging-data="2451690"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18XHHq+5kRnNEPW6PceLtNsGok4vMwRvnc=" User-Agent: Mozilla Thunderbird Cancel-Lock: sha1:zN3a66FSewtJ/K91h+vwVO+KAD4= In-Reply-To: Content-Language: en-US Bytes: 4034 On 9/22/2024 12:37 PM, Paul A. Clayton wrote: > On 9/21/24 4:45 PM, MitchAlsup1 wrote: >> On Sat, 21 Sep 2024 20:26:13 +0000, Chris M. Thomasson wrote: >> >>> On 9/21/2024 6:54 AM, Scott Lurndal wrote: >>>> mitchalsup@aol.com (MitchAlsup1) writes: >>>> https://www.marvell.com/products/cxl.html >>> >>> What about a weak coherency where a programmer has to use the correct >>> membars to get the coherency required for their specific needs? Along >>> the lines of UltraSPARC in RMO mode? >> >> In my case, I suffered through enough of these to implement a >> memory hierarchy free from the need of any MemBars yet provide >> the performance of relaxed memory order, except when >> certain kinds of addresses are touched {MMI/O, configuration >> space, ATOMIC accesses,...} In these cases, the core becomes >> {sequentially consistent, or strongly ordered} depending on the >> touched address. > > If I understand correctly, atomic accesses (Enhances > Synchronization Facility) effective use a complete memory barrier; > software could effectively provide a memory barrier "instruction" > by performing an otherwise pointless atomic/ESF operation. > > Are there no cases where an atomic operation is desired but > sequential consistency is not required? Or is this a tradeoff of > frequency/criticality and the expected overhead of the implicit > memory barrier? (Memory barriers may be similar to context > switches, not needing to be as expensive as they are in most > implementations.) [...] Fwiw, the SPARC has different flavors of them: #LoadLoad #StoreStore #LoadStore #StoreLoad (some others I forgot about, #Mem something, cannot remember right now. Shit! Anyway...) Afaict x86/x64 follows acquire release wrt (TSO): acquire = #LoadStore | #LoadLoad release = #LoadStore | #StoreStore Notice no StoreLoad? MFENCE is one that is akin to a #StoreLoad. This is why one can release a spinlock on x86/x64 with a simple store, for it already as implied release semantics... An algorithm called SMR (aka, hazard pointers) in its original form requires a #StoreLoad style membar on an x86. This can be an MFENCE or a LOCKED RMW. XCHG is implied wrt the LOCK pefix. Also, I think Peterson's algo needs a barrier even on x86/x64. Getting rid of that #StoreLoad makes things run MUCH faster... :^) >