Article <2bdee5008840a5584e9de557a5dfd88d@www.novabbs.org>

Warning: mysqli::__construct(): (HY000/1203): User howardkn already has more than 'max_user_connections' active connections in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\includes\artfuncs.php on line 21
Failed to connect to MySQL: (1203) User howardkn already has more than 'max_user_connections' active connections
Warning: mysqli::query(): Couldn't fetch mysqli in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\index.php on line 66
Article <2bdee5008840a5584e9de557a5dfd88d@www.novabbs.org>

Deutsch English Français Italiano

<2bdee5008840a5584e9de557a5dfd88d@www.novabbs.org>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!weretis.net!feeder9.news.weretis.net!i2pn.org!i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup1)
Newsgroups: comp.arch
Subject: Re: Continuations
Date: Thu, 18 Jul 2024 16:40:11 +0000
Organization: Rocksolid Light
Message-ID: <2bdee5008840a5584e9de557a5dfd88d@www.novabbs.org>
References: <v6tbki$3g9rg$1@dont-email.me> <47689j5gbdg2runh3t7oq2thodmfkalno6@4ax.com> <v71vqu$gomv$9@dont-email.me> <116d9j5651mtjmq4bkjaheuf0pgpu6p0m8@4ax.com> <f8c6c5b5863ecfc1ad45bb415f0d2b49@www.novabbs.org> <7u7e9j5dthm94vb2vdsugngjf1cafhu2i4@4ax.com> <0f7b4deb1761f4c485d1dc3b21eb7cb3@www.novabbs.org> <v78soj$1tn73$1@dont-email.me> <4bbc6af7baab612635eef0de4847ba5b@www.novabbs.org> <v792kn$1v70t$1@dont-email.me> <ef12aa647464a3ebe3bd208c13a3c40c@www.novabbs.org> <tD7mO.11270$Z2s2.1953@fx05.iad>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
	logging-data="3730630"; mail-complaints-to="usenet@i2pn2.org";
	posting-account="65wTazMNTleAJDh/pRqmKE7ADni/0wesT78+pyiDW8A";
User-Agent: Rocksolid Light
X-Rslight-Site: $2y$10$nJSlntlFWColI8uu5dPVp.P7Y92qPg3.i5z1T.gLGTQQix95lVmCW
X-Spam-Checker-Version: SpamAssassin 4.0.0
X-Rslight-Posting-User: ac58ceb75ea22753186dae54d967fed894c3dce8
Bytes: 4357
Lines: 71

On Thu, 18 Jul 2024 12:10:45 +0000, EricP wrote:

> MitchAlsup1 wrote:
>> On Wed, 17 Jul 2024 18:30:47 +0000, Stephen Fuld wrote:
>>
>>> MitchAlsup1 wrote:
>>>
>>>> On Wed, 17 Jul 2024 16:50:27 +0000, Thomas Koenig wrote:
>>>>
>>>>> MitchAlsup1 <mitchalsup@aol.com> schrieb:
>>>>>
>>>>> > What I am talking about is to improve their performance until a
>>>>> > sin() takes about the same number of cycles of FDIV, not 10× more.
>>>>>
>>>>> Maybe time for a little story.
>>>>>
>>>>> Some unspecified time ago, a colleague did CFD calculations which
>>>>> included fluid flow (including turbulence modelling and diffusion)
>>>>> and quite a few chemical reactions together.  So, he evaluated a
>>>>> huge number of Arrhenius equations,
>>>>>
>>>>> k = A * exp(-E_a/(R*T))
>>>>>
>>>>> and because some of the reactions he looked at were highly
>>>>> exothermic or endothermic, he needed tiny relaxation factors (aka
>>>>> small steps).  His calculaiton spent most of the time evaluating
>>>>> the Arrhenius equation above many, many, many, many times.
>>>>>
>>>>> A single calculation took months, and he didn't use weak hardware.
>>>>>
>>>>> A fully pipelined evaluation of, let's say, four parallel exp and
>>>>> four parallel fdiv instructions would have reduced his calculation
>>>>> time by orders of magnitude, and allowed him to explore the design
>>>>> space instead of just scratching the surface.
>>>>>
>>>>> (By the way, if I had found a reasonable way to incorporate the
>>>>> Arrhenius equation into your ISA, I would have done so already :-)
>>>>
>>>>     FMUL     Rt,RR,RT
>>>>     FDIV     Rt,-RE,Rt
>>>>     EXP      Rt,Rt
>>>>     FMUL     Rk,RA,Rt
>>>>
>>>> Does not look "all that bad" to me.
>>>
>>> So for your GbOoO CPU, how many of the various FP operations, and the
>>> EXP instruction can be done in parallel?
>>
>> FMUL is   4 cycles of latency fully pipelined
>> FDIV is ~20 cycles of latency not   pipelined
>> EXP  is ~16 cycles of latency not   pipelined
>>
>> They are all performed in the FMAC unit and here the instructions are
>> serially dependent.
>>
>> So, 44 cycles of latency, a 1-wide machine and a 6-wide machine would
>> see the same latency; that is, GBOoO is not a differentiator.
>
> If the FP multiplier is a 4-stage pipeline, and FDIV is iterating using
> the multiplier, can the pipeline get a mix of multiple operations going
> at once? FDIV for both Newton–Raphson and Goldschmidt iterates serially
> so each can only use one of the 4 pipeline slots.

Over the 20 cycles the multiplier is doing Goldschmidt iterations, there
are only 3 slots where a different instruction could sneak through.

Note: the multiplier used in Goldschmidt iterations is used every cycle
first for the denominator being driven towards 1.0, the second driving
the numerator towards quotient.

That is, its a 4 cycle pipeline unit from the outside, but a 2 cycle
pipeline unit from within the function unit.