Path: ...!news.roellig-ltd.de!open-news-network.org!weretis.net!feeder8.news.weretis.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: "Chris M. Thomasson" Newsgroups: comp.arch Subject: Re: Is Intel exceptionally unsuccessful as an architecture designer? Date: Mon, 23 Sep 2024 15:19:37 -0700 Organization: A noiseless patient Spider Lines: 88 Message-ID: References: <21028ed32d20f0eea9a754fafdb64e45@www.novabbs.org> <20240918190027.00003e4e@yahoo.com> <920c561c4e39e91d3730b6aab103459b@www.novabbs.org> <%dAHO.54667$S9Vb.39628@fx45.iad> <4f84910a01d7db353eedadd7c471d7d3@www.novabbs.org> <20240923105336.0000119b@yahoo.com> <6577e60bd63883d1a7bd51c717531f38@www.novabbs.org> <23d9473740db6c0ecc7e1d4a2179c75e@www.novabbs.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Date: Tue, 24 Sep 2024 00:19:39 +0200 (CEST) Injection-Info: dont-email.me; posting-host="cc0aa948cfee330c0e613beeb38c6255"; logging-data="3032365"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18CSk2/ohl2OdqbgHI6rnbYmQglLpI0sbY=" User-Agent: Mozilla Thunderbird Cancel-Lock: sha1:N8yjVCFqoWM+aLTrYPM0J9p6eZo= Content-Language: en-US In-Reply-To: <23d9473740db6c0ecc7e1d4a2179c75e@www.novabbs.org> Bytes: 5985 On 9/23/2024 2:58 PM, MitchAlsup1 wrote: > On Mon, 23 Sep 2024 21:35:53 +0000, Chris M. Thomasson wrote: > >> On 9/23/2024 1:59 PM, MitchAlsup1 wrote: >>> On Mon, 23 Sep 2024 7:53:36 +0000, Michael S wrote: >>> >>>> On Mon, 23 Sep 2024 01:34:55 +0000 >>>> mitchalsup@aol.com (MitchAlsup1) wrote: >>>> >>>>> On Mon, 23 Sep 2024 0:53:35 +0000, jseigh wrote: >>>>> >>>>>> On 9/22/2024 5:39 PM, MitchAlsup1 wrote: >>>>> >>>>>> Speaking of memory models, remember when x86 didn't have >>>>>> a formal memory model.  They didn't put one in until >>>>>> after itanium.  Before that it was a sort of processor >>>>>> consistency type 2 which was a real impedance mismatch >>>>>> with what most concurrent software used a a memory model. >>>>> >>>>> When only 1 x86 would fit on a die, it really did not mater >>>>> much. I was at AMD when they were designing their memory >>>>> model. >>>>> >>>>>> Joe Seigh >>>> >>>> >>>> Why # of CPU cores on die is of particular importance? >>> >>> Prior to multi-CPUs on a die; 99% of all x86 systems were >>> mono-CPU systems, and the necessity of having a well known >>> memory model was more vague. Although there were servers >>> with multiple CPUs in them they represented "an afternoon >>> in the FAB" compared to the PC oriented x86s. >>> >>> That is "we did not see the problem until it hit us in >>> the face." Once it did, we understood what we had to do: >>> presto memory model. >>> >>> Also note: this was just after the execution pipeline went >>> Great Big Our of Order, and thus made the lack of order >>> problems much more visible to applications. {Pentium Pro} >> >> Iirc, been a while, I think there was a problem on one of the Pentiums, >> might be the pro, where it had an issue with releasing a spinlock with a >> normal store. I am most likely misremembering, but it is sparking some >> strange memories. Way back on c.p.t, Alex Terekhov (hope I did not >> butcher the spelling of his name), anyway, wrote about it, I think... >> Way back. early 2000's I think. > > Many ATOMIC sequences start or end without any note on the memory > reference that it bounds an ATOMIC event. CAS has this problem > on the value to ultimately be compared (the start), T&S has this > problem on ST that unlocks the lock (the end). It is like using > indentation as the only means of signaling block structure in > your language of choice. _Strong_ CAS in C++ terms, ala cmpxchg, will only fail if the comparands are different. This can be implemented with LL/SC for sure. Scott mentioned something about a bus lock after a certain amount of failures... (side note) Weak CAS can fail even if the comparands are identical to each other ala LL/SC. This reminds me of LL/SC. the ABA problem can worked around and/or eliminated without using LL/SC. I remember reading papers about LL/SC getting around ABA, but then read about how they can have their own can of worms. Pessimistic vs optimistic sync... Wait/ Lock / Obstruction free things... ;^) Fwiw, getting rid of the StoreLoad membar in algorithms like SMR is great. There is a way to do this in existing systems. So, no hardware changes required, and makes the system run fast. Think of allowing a rouge thread to pound a CAS with random data wrt the comparand, trying to get it to fail... Of course this can be modifying a reservation granule wrt LL/SC side of things, right? Pessimistic (CAS) vs Optimistic (LL/SC)? > > Both are bad practice in making HW that can perform these things > efficiently. But notice that LL-SC does not have this problem. > Neither does ESM. > >>>> According to my understanding, what matters is # of CPU cores with >>>> coherent access to the same memory+IO. >>>> For x86, 4 cores (CPUs) were relatively common since 1996. There >>>> existed few odd 8-core systems too, still back in the last century.