Deutsch   English   Français   Italiano  
<ef12aa647464a3ebe3bd208c13a3c40c@www.novabbs.org>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!weretis.net!feeder9.news.weretis.net!news.nk.ca!rocksolid2!i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup1)
Newsgroups: comp.arch
Subject: Re: Continuations
Date: Wed, 17 Jul 2024 19:22:52 +0000
Organization: Rocksolid Light
Message-ID: <ef12aa647464a3ebe3bd208c13a3c40c@www.novabbs.org>
References: <v6tbki$3g9rg$1@dont-email.me> <47689j5gbdg2runh3t7oq2thodmfkalno6@4ax.com> <v71vqu$gomv$9@dont-email.me> <116d9j5651mtjmq4bkjaheuf0pgpu6p0m8@4ax.com> <f8c6c5b5863ecfc1ad45bb415f0d2b49@www.novabbs.org> <7u7e9j5dthm94vb2vdsugngjf1cafhu2i4@4ax.com> <0f7b4deb1761f4c485d1dc3b21eb7cb3@www.novabbs.org> <v78soj$1tn73$1@dont-email.me> <4bbc6af7baab612635eef0de4847ba5b@www.novabbs.org> <v792kn$1v70t$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
	logging-data="3631347"; mail-complaints-to="usenet@i2pn2.org";
	posting-account="65wTazMNTleAJDh/pRqmKE7ADni/0wesT78+pyiDW8A";
User-Agent: Rocksolid Light
X-Spam-Checker-Version: SpamAssassin 4.0.0
X-Rslight-Site: $2y$10$gyIFnW.fM6YBOD5CmcabV.uzEQ3OLGDdtdGWQfwByqRJ5fUf90RFu
X-Rslight-Posting-User: ac58ceb75ea22753186dae54d967fed894c3dce8
Bytes: 3369
Lines: 53

On Wed, 17 Jul 2024 18:30:47 +0000, Stephen Fuld wrote:

> MitchAlsup1 wrote:
>
>> On Wed, 17 Jul 2024 16:50:27 +0000, Thomas Koenig wrote:
>>
>>>MitchAlsup1 <mitchalsup@aol.com> schrieb:
>>>
>>> > What I am talking about is to improve their performance until a
>>> > sin() takes about the same number of cycles of FDIV, not 10× more.
>>>
>>> Maybe time for a little story.
>>>
>>> Some unspecified time ago, a colleague did CFD calculations which
>>> included fluid flow (including turbulence modelling and diffusion)
>>> and quite a few chemical reactions together.  So, he evaluated a
>>> huge number of Arrhenius equations,
>>>
>>> k = A * exp(-E_a/(R*T))
>>>
>>> and because some of the reactions he looked at were highly
>>> exothermic or endothermic, he needed tiny relaxation factors (aka
>>> small steps).  His calculaiton spent most of the time evaluating
>>> the Arrhenius equation above many, many, many, many times.
>>>
>>> A single calculation took months, and he didn't use weak hardware.
>>>
>>> A fully pipelined evaluation of, let's say, four parallel exp and
>>> four parallel fdiv instructions would have reduced his calculation
>>> time by orders of magnitude, and allowed him to explore the design
>>> space instead of just scratching the surface.
>>>
>>> (By the way, if I had found a reasonable way to incorporate the
>>> Arrhenius equation into your ISA, I would have done so already :-)
>>
>>     FMUL     Rt,RR,RT
>>     FDIV     Rt,-RE,Rt
>>     EXP      Rt,Rt
>>     FMUL     Rk,RA,Rt
>>
>> Does not look "all that bad" to me.
>
> So for your GbOoO CPU, how many of the various FP operations, and the
> EXP instruction can be done in parallel?

FMUL is   4 cycles of latency fully pipelined
FDIV is ~20 cycles of latency not   pipelined
EXP  is ~16 cycles of latency not   pipelined

They are all performed in the FMAC unit and here the instructions are
serially dependent.

So, 44 cycles of latency, a 1-wide machine and a 6-wide machine would
see the same latency; that is, GBOoO is not a differentiator.