Path: ...!news.nobody.at!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Thomas Koenig <tkoenig@netcologne.de>
Newsgroups: comp.arch
Subject: Re: Continuations
Date: Thu, 18 Jul 2024 06:00:46 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 68
Message-ID: <v7ab2e$29lv1$1@dont-email.me>
References: <v6tbki$3g9rg$1@dont-email.me>
 <47689j5gbdg2runh3t7oq2thodmfkalno6@4ax.com> <v71vqu$gomv$9@dont-email.me>
 <116d9j5651mtjmq4bkjaheuf0pgpu6p0m8@4ax.com>
 <f8c6c5b5863ecfc1ad45bb415f0d2b49@www.novabbs.org>
 <7u7e9j5dthm94vb2vdsugngjf1cafhu2i4@4ax.com>
 <0f7b4deb1761f4c485d1dc3b21eb7cb3@www.novabbs.org>
 <v78soj$1tn73$1@dont-email.me>
 <4bbc6af7baab612635eef0de4847ba5b@www.novabbs.org>
 <v792kn$1v70t$1@dont-email.me>
 <ef12aa647464a3ebe3bd208c13a3c40c@www.novabbs.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 18 Jul 2024 08:00:46 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="dcaf4e807253839291fe21870f3c64fa";
	logging-data="2414561"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX19wtd9KYV5pvUyVYAOZr4/imwSLMPsI+RQ="
User-Agent: slrn/1.0.3 (Linux)
Cancel-Lock: sha1:ciyBkBeyswMNya36nHGUB5dW1Aw=
Bytes: 3920

MitchAlsup1 <mitchalsup@aol.com> schrieb:
> On Wed, 17 Jul 2024 18:30:47 +0000, Stephen Fuld wrote:
>
>> MitchAlsup1 wrote:
>>
>>> On Wed, 17 Jul 2024 16:50:27 +0000, Thomas Koenig wrote:
>>>
>>>>MitchAlsup1 <mitchalsup@aol.com> schrieb:
>>>>
>>>> > What I am talking about is to improve their performance until a
>>>> > sin() takes about the same number of cycles of FDIV, not 10× more.
>>>>
>>>> Maybe time for a little story.
>>>>
>>>> Some unspecified time ago, a colleague did CFD calculations which
>>>> included fluid flow (including turbulence modelling and diffusion)
>>>> and quite a few chemical reactions together.  So, he evaluated a
>>>> huge number of Arrhenius equations,
>>>>
>>>> k = A * exp(-E_a/(R*T))
>>>>
>>>> and because some of the reactions he looked at were highly
>>>> exothermic or endothermic, he needed tiny relaxation factors (aka
>>>> small steps).  His calculaiton spent most of the time evaluating
>>>> the Arrhenius equation above many, many, many, many times.
>>>>
>>>> A single calculation took months, and he didn't use weak hardware.
>>>>
>>>> A fully pipelined evaluation of, let's say, four parallel exp and
>>>> four parallel fdiv instructions would have reduced his calculation
>>>> time by orders of magnitude, and allowed him to explore the design
>>>> space instead of just scratching the surface.
>>>>
>>>> (By the way, if I had found a reasonable way to incorporate the
>>>> Arrhenius equation into your ISA, I would have done so already :-)
>>>
>>>     FMUL     Rt,RR,RT
>>>     FDIV     Rt,-RE,Rt
>>>     EXP      Rt,Rt
>>>     FMUL     Rk,RA,Rt
>>>
>>> Does not look "all that bad" to me.
>>
>> So for your GbOoO CPU, how many of the various FP operations, and the
>> EXP instruction can be done in parallel?
>
> FMUL is   4 cycles of latency fully pipelined
> FDIV is ~20 cycles of latency not   pipelined
> EXP  is ~16 cycles of latency not   pipelined

Ah, OK.

>
> They are all performed in the FMAC unit and here the instructions are
> serially dependent.

A loop containing the calculation could be unrolled, but without
a large effect.

>
> So, 44 cycles of latency, a 1-wide machine and a 6-wide machine would
> see the same latency; that is, GBOoO is not a differentiator.

What about SIMD width underlying the the VVM implementation?
All SIMD implementations I know of allow performing floating point
ops in paralell.  Is it planned that My 66000 can also do that?
(If not, that would be a big disadvantage for scientific/technical
work).