Deutsch English Français Italiano |
<20240519162333.00006023@yahoo.com> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Michael S <already5chosen@yahoo.com> Newsgroups: comp.arch Subject: Re: Making Lemonade (Floating-point format changes) Date: Sun, 19 May 2024 16:23:33 +0300 Organization: A noiseless patient Spider Lines: 66 Message-ID: <20240519162333.00006023@yahoo.com> References: <abe04jhkngt2uun1e7ict8vmf1fq8p7rnm@4ax.com> <memo.20240512203459.16164W@jgd.cix.co.uk> <v1rab7$2vt3u$1@dont-email.me> <20240513151647.0000403f@yahoo.com> <v1to2h$3km86$1@dont-email.me> <20240514221659.00001094@yahoo.com> <v234nr$12p27$1@dont-email.me> <20240516001628.00001031@yahoo.com> <v2cn4l$3bpov$1@dont-email.me> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Injection-Date: Sun, 19 May 2024 15:23:25 +0200 (CEST) Injection-Info: dont-email.me; posting-host="7ca739330e1d452bcaf3fa4a81da6824"; logging-data="3515959"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+uNzLVmdezbM5utZCcERLmZi6gw9uwxm0=" Cancel-Lock: sha1:XEdmMqP/nOIGUoxFr5/EtuzvUf0= X-Newsreader: Claws Mail 3.19.1 (GTK+ 2.24.33; x86_64-w64-mingw32) Bytes: 4127 On Sun, 19 May 2024 11:17:41 -0000 (UTC) Thomas Koenig <tkoenig@netcologne.de> wrote: > So, I did some more measurements on the POWER9 machine, and it came > to around 18 cycles per FMA. Compared to the 13 cycles for the > FMA instruction, this actually sounds reasonable. > I.e. your actual running frequency was 3700 MHz? > The big problem appears to be that, in this particular > implementation, multiplication is not pipelined, but done by > piecewise by addition. This can be explained by the fact that > this is mostly a decimal unit, with the 128-bit QP just added as > an afterthought, and decimal multiplication does not happen all > that often. > > A fully pipelined FMA unit capable of 128-bit arithmetic would be > an entirely different beast, I would expect a throughput of 1 per > cycle and a latency of (maybe) one cycle more than 64-bit FMA. There exists a middle ground between none-pipelined and fully pipelined multiplier/FMA units. In fact, more than one middle ground. Here the mid-middle ground that can imagine not being a real hardware guy: 1 - take a pair of exiting VSU multipliers. By now they can do 53x53=>125bit unsigned multiplication. Enhance them to 57x57=>113bit 2 - during quad-precision FMA split 113x113 multiplication into 4 pieces and run them through pair of multiplies each two at once. That would produce all parts of 225-bit product at rate of 1 product per 2 clocks 3 - build adders just sufficient for the same throughput of 1 result per 2 clocks. Such combined multiplier will have 2 clocks higher latency than DP multiplier. After that we'll need matching alignment and addition/subtraction blocks, but by doing them half-pipelined we can utilize majority of existing dual-DP hardware and would need very little else, except of control signals and probably of new feedback data path on the upper side of the adder. All that could cost us another clock of latency over DP FMA, but not necessarily so. Bottom line: QP FMA with throughput of 1 result per 2 clocks and latency of 8 or 9 clocks. For POWER8, that has less distributed VSU, such modification would be somewhat easier than for POWER9. That's what I call a mid-middle ground. Low-middle ground would be leaving 53x53=>125bit multipliers unmodified. 113x113 multiplication is split into 9 pieces and product is delivered every 5 clocks. High-middle ground is enhancing both VSU pipes and using them to process two QP FMAs simultaneously for combined throughput equivalent to fully pipelined. Another possible high-middle ground is, again, enhancing both VSU pipes and using them together on a single QP FMA. That would be potentially best for latency, but does not fit well into philosophy of POWER9 design that tries to minimize high-speed interaction between various pipes.