Path: ...!feeds.phibee-telecom.net!news.mixmin.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>
Newsgroups: comp.arch
Subject: Re: Is Intel exceptionally unsuccessful as an architecture designer?
Date: Sat, 21 Sep 2024 21:12:16 -0700
Organization: A noiseless patient Spider
Lines: 131
Message-ID: <vco5f0$21dsm$4@dont-email.me>
References: <memo.20240913205156.19028s@jgd.cix.co.uk>
 <vcd3ds$3o6ae$2@dont-email.me>
 <2935676af968e40e7cad204d40cafdcf@www.novabbs.org>
 <vcd7pr$3op6a$3@dont-email.me>
 <a20365f1bdcad769edd9e1f840edb2fe@www.novabbs.org>
 <vcda96$3p3a7$2@dont-email.me>
 <21028ed32d20f0eea9a754fafdb64e45@www.novabbs.org>
 <RECGO.45463$xO0f.22925@fx48.iad> <20240918190027.00003e4e@yahoo.com>
 <vcfp2q$8glq$5@dont-email.me> <jwv34lumjz7.fsf-monnier+comp.arch@gnu.org>
 <vckpkg$18k7r$2@dont-email.me> <vckqus$18j12$2@dont-email.me>
 <920c561c4e39e91d3730b6aab103459b@www.novabbs.org>
 <vcl6i6$1ad9e$1@dont-email.me>
 <d3b9fc944f708546e4fbe5909c748ba3@www.novabbs.org>
 <vclb16$1etc7$1@dont-email.me> <vcmssa$1lpa4$1@dont-email.me>
 <vcna2k$1nlod$1@dont-email.me> <vco0ik$20e64$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 22 Sep 2024 06:12:17 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="0029950ff4e92ba21a7d99fa35b943c5";
	logging-data="2144150"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX19SUzMGI7ngrqW+N9s98SH/iK3mbigEBo0="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:9hONt0w8hJYL/DMugQsb7MdGQDo=
Content-Language: en-US
In-Reply-To: <vco0ik$20e64$1@dont-email.me>
Bytes: 6655

On 9/21/2024 7:48 PM, Brett wrote:
> Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
>> On 9/21/2024 9:39 AM, Brett wrote:
>>> Chris M. Thomasson <chris.m.thomasson.1@gmail.com> wrote:
>>>> On 9/20/2024 6:48 PM, MitchAlsup1 wrote:
>>>>> On Sat, 21 Sep 2024 1:12:38 +0000, Brett wrote:
>>>>>
>>>>>> MitchAlsup1 <mitchalsup@aol.com> wrote:
>>>>>>> On Fri, 20 Sep 2024 21:54:36 +0000, Chris M. Thomasson wrote:
>>>>>>>
>>>>>>>> On 9/20/2024 2:32 PM, Lawrence D'Oliveiro wrote:
>>>>>>>>> On Fri, 20 Sep 2024 11:21:52 -0400, Stefan Monnier wrote:
>>>>>>>>>
>>>>>>>>>>> The basic issue is:
>>>>>>>>>>> * CPU+motherboard RAM -- usually upgradeable
>>>>>>>>>>> * Addon coprocessor RAM -- usually not upgradeable
>>>>>>>>>>
>>>>>>>>>> Maybe the RAM of the "addon coprocessor" is not upgradeable, but the
>>>>>>>>>> addon board itself can be replaced with another one (one with more
>>>>>>>>>> RAM).
>>>>>>>>>
>>>>>>>>> Yes, but that’s a lot more expensive.
>>>>>>>>
>>>>>>>> I had this crazy idea of putting cpus right on the ram. So, if you add
>>>>>>>> more memory to your system you automatically get more cpu's... Think
>>>>>>>> NUMA for a moment... ;^)
>>>>>>>
>>>>>>> Can software use the extra CPUs ?
>>>>>>>
>>>>>>> Also note: DRAMs are made on P-Channel process (leakage) with only a few
>>>>>>> layer of metal while CPUs are based on a N-Channel process (speed) with
>>>>>>> many layers of metal.
>>>>>>
>>>>>> Didn’t you work on the MC68000 which had one layer of metal?
>>>>>
>>>>> Yes, but it was the 68020 and had polysilicide which we used as
>>>>> a second layer of metal.
>>>>>
>>>>> Mc88100 had 2 layers of metal and silicide.
>>>>>
>>>>> The number of metal layers went about::
>>>>> 1978: 1
>>>>> 1980: 1+silicide
>>>>> 1982: 2+silicide
>>>>> 1988: 3+silicide
>>>>> 1990: 4+silicide
>>>>> 1995: 6
>>>>> ..
>>>>>
>>>>>> This could be fine if you are going for the AI market of slow AI cpu
>>>>>> with huge memory and bandwidth.
>>>>>>
>>>>>> The AI market is bigger than the general server market as seen in
>>>>>> NVidea’s sales.
>>>>>>
>>>>>>> Bus interconnects are not setup to take a CPU cache miss from one
>>>>>>> DRAM to a different DRAM on behalf of its contained CPU(s).
>>>>>>> {Chicken and egg problem}
>>>>>
>>>>> Thus a problem with the CPU on DRAM approach.
>>>>
>>>> It would be HIGHLY local wrt its processing units and its memory for
>>>> they would all be one.
>>>>
>>>> The programming for it would not be all that easy... It would be like a
>>>> NUMA where a program can divide itself up and run parts of itself on
>>>> each slot (aka memory-cpu hybrid unit card if you will). If a program
>>>> can be embarrassingly parallel, well that would be great! The Cell
>>>> processors comes to mind. But it failed. Shit.
>>>
>>> Cell was in the PlayStation which Sony sold a huge number of and made
>>> billions of dollars, so successful, not failed.
>>
>> Touche! :^)
>>
>> However, iirc, not all the games for it even used the SPE's. Instead
>> they used the PPC. I guess that might have been due to the "complexity"
>> of the programming? Not sure.
> 
> ALL games used the SPE’s, the PPC was not fast enough for a AAA game.
> SPE is more powerful and flexible than a vertex shader on the graphics
> chip.

Still not sure 100% of the games used the SPE's, AAA games aside for a 
moment...


>>> I programmed for Cell, it was actually a nice architecture for what it did.
>>
>> Iirc, you had to use DMA to communicate with the SPE's?
> 
> You have to built DMA lists for the graphics chip anyway, the SPE’s are
> just more of the same. Today the vertex shaders are on the graphics chip,
> instead of SPE, same difference.

Ture. I got to play around with a Cell a long time ago. I wrote about it 
way back on this group a little bit.


>>> If you think programming for AI is easy, I have news for you…
>>>
>>> Those NVidia AI chips are at the brain damaged level for programming.
>>
>> No shit? I was thinking along the lines of compute shaders in the GPU?
>>
>>
>>> 10’s of billions of dollars are invested in this market.
>>>
>>>> A system with a mother board that has slots for several GPUS (think
>>>> crossfire) and slots for memory+CPU units. The kicker is that adding
>>>> more memory gives you more cpus...
>>>>
>>>> How crazy is this? Well, on a scale from:
>>>>
>>>> Retarded to Moronic?
>>>>
>>>> Pretty bad? Shit...
>>>>
>>>> Shit man, remember all of the slots in the old Apple IIgs's?
>>>>
>>>> ;^o
>>>>
>>>>
>>>>>
>>>>>> Such a dram would be on the PCIE busses, and the main CPU’s would barely
>>>>>> touch that ram, and the AI only searches locally.
>>>>>
>>>>> Better make it PCIe+CXL so the downstream CPU is cache coherent.
> 
>