Path: ...!news.roellig-ltd.de!open-news-network.org!weretis.net!feeder8.news.weretis.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>
Newsgroups: comp.arch
Subject: Re: Is Intel exceptionally unsuccessful as an architecture designer?
Date: Mon, 23 Sep 2024 15:19:37 -0700
Organization: A noiseless patient Spider
Lines: 88
Message-ID: <vcsphq$2sh9d$1@dont-email.me>
References: <memo.20240913205156.19028s@jgd.cix.co.uk>
 <vcda96$3p3a7$2@dont-email.me>
 <21028ed32d20f0eea9a754fafdb64e45@www.novabbs.org>
 <RECGO.45463$xO0f.22925@fx48.iad> <20240918190027.00003e4e@yahoo.com>
 <vcfp2q$8glq$5@dont-email.me> <jwv34lumjz7.fsf-monnier+comp.arch@gnu.org>
 <vckpkg$18k7r$2@dont-email.me> <vckqus$18j12$2@dont-email.me>
 <920c561c4e39e91d3730b6aab103459b@www.novabbs.org>
 <vcl6i6$1ad9e$1@dont-email.me>
 <d3b9fc944f708546e4fbe5909c748ba3@www.novabbs.org>
 <%dAHO.54667$S9Vb.39628@fx45.iad> <vcna56$1nlod$2@dont-email.me>
 <a7708487530552a53732070fe08d9458@www.novabbs.org>
 <vcprkv$2asrd$1@dont-email.me>
 <e2c993172c11a221c4dcb9973f9cdb86@www.novabbs.org>
 <vcqe6f$2d8oa$1@dont-email.me>
 <4f84910a01d7db353eedadd7c471d7d3@www.novabbs.org>
 <20240923105336.0000119b@yahoo.com>
 <6577e60bd63883d1a7bd51c717531f38@www.novabbs.org>
 <vcsmvq$2s1qd$2@dont-email.me>
 <23d9473740db6c0ecc7e1d4a2179c75e@www.novabbs.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 24 Sep 2024 00:19:39 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="cc0aa948cfee330c0e613beeb38c6255";
	logging-data="3032365"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX18CSk2/ohl2OdqbgHI6rnbYmQglLpI0sbY="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:N8yjVCFqoWM+aLTrYPM0J9p6eZo=
Content-Language: en-US
In-Reply-To: <23d9473740db6c0ecc7e1d4a2179c75e@www.novabbs.org>
Bytes: 5985

On 9/23/2024 2:58 PM, MitchAlsup1 wrote:
> On Mon, 23 Sep 2024 21:35:53 +0000, Chris M. Thomasson wrote:
> 
>> On 9/23/2024 1:59 PM, MitchAlsup1 wrote:
>>> On Mon, 23 Sep 2024 7:53:36 +0000, Michael S wrote:
>>>
>>>> On Mon, 23 Sep 2024 01:34:55 +0000
>>>> mitchalsup@aol.com (MitchAlsup1) wrote:
>>>>
>>>>> On Mon, 23 Sep 2024 0:53:35 +0000, jseigh wrote:
>>>>>
>>>>>> On 9/22/2024 5:39 PM, MitchAlsup1 wrote:
>>>>>
>>>>>> Speaking of memory models, remember when x86 didn't have
>>>>>> a formal memory model.  They didn't put one in until
>>>>>> after itanium.  Before that it was a sort of processor
>>>>>> consistency type 2 which was a real impedance mismatch
>>>>>> with what most concurrent software used a a memory model.
>>>>>
>>>>> When only 1 x86 would fit on a die, it really did not mater
>>>>> much. I was at AMD when they were designing their memory
>>>>> model.
>>>>>
>>>>>> Joe Seigh
>>>>
>>>>
>>>> Why # of CPU cores on die is of particular importance?
>>>
>>> Prior to multi-CPUs on a die; 99% of all x86 systems were
>>> mono-CPU systems, and the necessity of having a well known
>>> memory model was more vague. Although there were servers
>>> with multiple CPUs in them they represented "an afternoon
>>> in the FAB" compared to the PC oriented x86s.
>>>
>>> That is "we did not see the problem until it hit us in
>>> the face." Once it did, we understood what we had to do:
>>> presto memory model.
>>>
>>> Also note: this was just after the execution pipeline went
>>> Great Big Our of Order, and thus made the lack of order
>>> problems much more visible to applications. {Pentium Pro}
>>
>> Iirc, been a while, I think there was a problem on one of the Pentiums,
>> might be the pro, where it had an issue with releasing a spinlock with a
>> normal store. I am most likely misremembering, but it is sparking some
>> strange memories. Way back on c.p.t, Alex Terekhov (hope I did not
>> butcher the spelling of his name), anyway, wrote about it, I think...
>> Way back. early 2000's I think.
> 
> Many ATOMIC sequences start or end without any note on the memory
> reference that it bounds an ATOMIC event. CAS has this problem
> on the value to ultimately be compared (the start), T&S has this
> problem on ST that unlocks the lock (the end). It is like using
> indentation as the only means of signaling block structure in
> your language of choice.

_Strong_ CAS in C++ terms, ala cmpxchg, will only fail if the comparands 
are different. This can be implemented with LL/SC for sure. Scott 
mentioned something about a bus lock after a certain amount of 
failures... (side note) Weak CAS can fail even if the comparands are 
identical to each other ala LL/SC. This reminds me of LL/SC. the ABA 
problem can worked around and/or eliminated without using LL/SC. I 
remember reading papers about LL/SC getting around ABA, but then read 
about how they can have their own can of worms. Pessimistic vs 
optimistic sync... Wait/ Lock / Obstruction free things... ;^)

Fwiw, getting rid of the StoreLoad membar in algorithms like SMR is 
great. There is a way to do this in existing systems. So, no hardware 
changes required, and makes the system run fast.

Think of allowing a rouge thread to pound a CAS with random data wrt the 
comparand, trying to get it to fail... Of course this can be modifying a 
reservation granule wrt LL/SC side of things, right? Pessimistic (CAS) 
vs Optimistic (LL/SC)?




> 
> Both are bad practice in making HW that can perform these things
> efficiently. But notice that LL-SC does not have this problem.
> Neither does ESM.
> 
>>>> According to my understanding, what matters is # of CPU cores with
>>>> coherent access to the same memory+IO.
>>>> For x86, 4 cores (CPUs) were relatively common since 1996. There
>>>> existed few odd 8-core systems too, still back in the last century.