Path: ...!news.nobody.at!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Thomas Koenig Newsgroups: comp.arch Subject: Re: Continuations Date: Thu, 18 Jul 2024 06:00:46 -0000 (UTC) Organization: A noiseless patient Spider Lines: 68 Message-ID: References: <47689j5gbdg2runh3t7oq2thodmfkalno6@4ax.com> <116d9j5651mtjmq4bkjaheuf0pgpu6p0m8@4ax.com> <7u7e9j5dthm94vb2vdsugngjf1cafhu2i4@4ax.com> <0f7b4deb1761f4c485d1dc3b21eb7cb3@www.novabbs.org> <4bbc6af7baab612635eef0de4847ba5b@www.novabbs.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Injection-Date: Thu, 18 Jul 2024 08:00:46 +0200 (CEST) Injection-Info: dont-email.me; posting-host="dcaf4e807253839291fe21870f3c64fa"; logging-data="2414561"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19wtd9KYV5pvUyVYAOZr4/imwSLMPsI+RQ=" User-Agent: slrn/1.0.3 (Linux) Cancel-Lock: sha1:ciyBkBeyswMNya36nHGUB5dW1Aw= Bytes: 3920 MitchAlsup1 schrieb: > On Wed, 17 Jul 2024 18:30:47 +0000, Stephen Fuld wrote: > >> MitchAlsup1 wrote: >> >>> On Wed, 17 Jul 2024 16:50:27 +0000, Thomas Koenig wrote: >>> >>>>MitchAlsup1 schrieb: >>>> >>>> > What I am talking about is to improve their performance until a >>>> > sin() takes about the same number of cycles of FDIV, not 10× more. >>>> >>>> Maybe time for a little story. >>>> >>>> Some unspecified time ago, a colleague did CFD calculations which >>>> included fluid flow (including turbulence modelling and diffusion) >>>> and quite a few chemical reactions together. So, he evaluated a >>>> huge number of Arrhenius equations, >>>> >>>> k = A * exp(-E_a/(R*T)) >>>> >>>> and because some of the reactions he looked at were highly >>>> exothermic or endothermic, he needed tiny relaxation factors (aka >>>> small steps). His calculaiton spent most of the time evaluating >>>> the Arrhenius equation above many, many, many, many times. >>>> >>>> A single calculation took months, and he didn't use weak hardware. >>>> >>>> A fully pipelined evaluation of, let's say, four parallel exp and >>>> four parallel fdiv instructions would have reduced his calculation >>>> time by orders of magnitude, and allowed him to explore the design >>>> space instead of just scratching the surface. >>>> >>>> (By the way, if I had found a reasonable way to incorporate the >>>> Arrhenius equation into your ISA, I would have done so already :-) >>> >>> FMUL Rt,RR,RT >>> FDIV Rt,-RE,Rt >>> EXP Rt,Rt >>> FMUL Rk,RA,Rt >>> >>> Does not look "all that bad" to me. >> >> So for your GbOoO CPU, how many of the various FP operations, and the >> EXP instruction can be done in parallel? > > FMUL is 4 cycles of latency fully pipelined > FDIV is ~20 cycles of latency not pipelined > EXP is ~16 cycles of latency not pipelined Ah, OK. > > They are all performed in the FMAC unit and here the instructions are > serially dependent. A loop containing the calculation could be unrolled, but without a large effect. > > So, 44 cycles of latency, a 1-wide machine and a 6-wide machine would > see the same latency; that is, GBOoO is not a differentiator. What about SIMD width underlying the the VVM implementation? All SIMD implementations I know of allow performing floating point ops in paralell. Is it planned that My 66000 can also do that? (If not, that would be a big disadvantage for scientific/technical work).