Deutsch English Français Italiano |
<2bdee5008840a5584e9de557a5dfd88d@www.novabbs.org> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!weretis.net!feeder9.news.weretis.net!i2pn.org!i2pn2.org!.POSTED!not-for-mail From: mitchalsup@aol.com (MitchAlsup1) Newsgroups: comp.arch Subject: Re: Continuations Date: Thu, 18 Jul 2024 16:40:11 +0000 Organization: Rocksolid Light Message-ID: <2bdee5008840a5584e9de557a5dfd88d@www.novabbs.org> References: <v6tbki$3g9rg$1@dont-email.me> <47689j5gbdg2runh3t7oq2thodmfkalno6@4ax.com> <v71vqu$gomv$9@dont-email.me> <116d9j5651mtjmq4bkjaheuf0pgpu6p0m8@4ax.com> <f8c6c5b5863ecfc1ad45bb415f0d2b49@www.novabbs.org> <7u7e9j5dthm94vb2vdsugngjf1cafhu2i4@4ax.com> <0f7b4deb1761f4c485d1dc3b21eb7cb3@www.novabbs.org> <v78soj$1tn73$1@dont-email.me> <4bbc6af7baab612635eef0de4847ba5b@www.novabbs.org> <v792kn$1v70t$1@dont-email.me> <ef12aa647464a3ebe3bd208c13a3c40c@www.novabbs.org> <tD7mO.11270$Z2s2.1953@fx05.iad> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Info: i2pn2.org; logging-data="3730630"; mail-complaints-to="usenet@i2pn2.org"; posting-account="65wTazMNTleAJDh/pRqmKE7ADni/0wesT78+pyiDW8A"; User-Agent: Rocksolid Light X-Rslight-Site: $2y$10$nJSlntlFWColI8uu5dPVp.P7Y92qPg3.i5z1T.gLGTQQix95lVmCW X-Spam-Checker-Version: SpamAssassin 4.0.0 X-Rslight-Posting-User: ac58ceb75ea22753186dae54d967fed894c3dce8 Bytes: 4357 Lines: 71 On Thu, 18 Jul 2024 12:10:45 +0000, EricP wrote: > MitchAlsup1 wrote: >> On Wed, 17 Jul 2024 18:30:47 +0000, Stephen Fuld wrote: >> >>> MitchAlsup1 wrote: >>> >>>> On Wed, 17 Jul 2024 16:50:27 +0000, Thomas Koenig wrote: >>>> >>>>> MitchAlsup1 <mitchalsup@aol.com> schrieb: >>>>> >>>>> > What I am talking about is to improve their performance until a >>>>> > sin() takes about the same number of cycles of FDIV, not 10× more. >>>>> >>>>> Maybe time for a little story. >>>>> >>>>> Some unspecified time ago, a colleague did CFD calculations which >>>>> included fluid flow (including turbulence modelling and diffusion) >>>>> and quite a few chemical reactions together. So, he evaluated a >>>>> huge number of Arrhenius equations, >>>>> >>>>> k = A * exp(-E_a/(R*T)) >>>>> >>>>> and because some of the reactions he looked at were highly >>>>> exothermic or endothermic, he needed tiny relaxation factors (aka >>>>> small steps). His calculaiton spent most of the time evaluating >>>>> the Arrhenius equation above many, many, many, many times. >>>>> >>>>> A single calculation took months, and he didn't use weak hardware. >>>>> >>>>> A fully pipelined evaluation of, let's say, four parallel exp and >>>>> four parallel fdiv instructions would have reduced his calculation >>>>> time by orders of magnitude, and allowed him to explore the design >>>>> space instead of just scratching the surface. >>>>> >>>>> (By the way, if I had found a reasonable way to incorporate the >>>>> Arrhenius equation into your ISA, I would have done so already :-) >>>> >>>> FMUL Rt,RR,RT >>>> FDIV Rt,-RE,Rt >>>> EXP Rt,Rt >>>> FMUL Rk,RA,Rt >>>> >>>> Does not look "all that bad" to me. >>> >>> So for your GbOoO CPU, how many of the various FP operations, and the >>> EXP instruction can be done in parallel? >> >> FMUL is 4 cycles of latency fully pipelined >> FDIV is ~20 cycles of latency not pipelined >> EXP is ~16 cycles of latency not pipelined >> >> They are all performed in the FMAC unit and here the instructions are >> serially dependent. >> >> So, 44 cycles of latency, a 1-wide machine and a 6-wide machine would >> see the same latency; that is, GBOoO is not a differentiator. > > If the FP multiplier is a 4-stage pipeline, and FDIV is iterating using > the multiplier, can the pipeline get a mix of multiple operations going > at once? FDIV for both Newton–Raphson and Goldschmidt iterates serially > so each can only use one of the 4 pipeline slots. Over the 20 cycles the multiplier is doing Goldschmidt iterations, there are only 3 slots where a different instruction could sneak through. Note: the multiplier used in Goldschmidt iterations is used every cycle first for the denominator being driven towards 1.0, the second driving the numerator towards quotient. That is, its a 4 cycle pipeline unit from the outside, but a 2 cycle pipeline unit from within the function unit.