Article <vcht3a$mbru$2@dont-email.me>

Deutsch English Français Italiano
<vcht3a$mbru$2@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!news.roellig-ltd.de!open-news-network.org!weretis.net!feeder8.news.weretis.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Brett <ggtgp@yahoo.com>
Newsgroups: comp.arch
Subject: Re: Computer architects leaving Intel...
Date: Thu, 19 Sep 2024 19:12:42 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 50
Message-ID: <vcht3a$mbru$2@dont-email.me>
References: <vaqgtl$3526$1@dont-email.me>
 <p1cvdjpqjg65e6e3rtt4ua6hgm79cdfm2n@4ax.com>
 <2024Sep10.101932@mips.complang.tuwien.ac.at>
 <ygn8qvztf16.fsf@y.z>
 <2024Sep11.123824@mips.complang.tuwien.ac.at>
 <vbsoro$3ol1a$1@dont-email.me>
 <867cbhgozo.fsf@linuxsc.com>
 <20240912142948.00002757@yahoo.com>
 <vbuu5n$9tue$1@dont-email.me>
 <20240915001153.000029bf@yahoo.com>
 <vc6jbk$5v9f$1@paganini.bofh.team>
 <20240915154038.0000016e@yahoo.com>
 <vc70sl$285g2$4@dont-email.me>
 <vc73bl$28v0v$1@dont-email.me>
 <OvEFO.70694$EEm7.38286@fx16.iad>
 <32a15246310ea544570564a6ea100cab@www.novabbs.org>
 <vc7a6h$2afrl$2@dont-email.me>
 <50cd3ba7c0cbb587a55dd67ae46fc9ce@www.novabbs.org>
 <vc8qic$2od19$1@dont-email.me>
 <fCXFO.4617$9Rk4.4393@fx37.iad>
 <vcb730$3ci7o$1@dont-email.me>
 <7cBGO.169512$_o_3.43954@fx17.iad>
 <vcffub$77jk$1@dont-email.me>
 <7ff7dcd83e9e436a707dd2d5ed66d03e@www.novabbs.org>
 <hsXGO.12649$T4b4.5417@fx34.iad>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 19 Sep 2024 21:12:43 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="f5096f7998b9f29573d3e22405ab7db3";
	logging-data="733054"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX18efn2ZvPreJ5KfKDYWZ68U"
User-Agent: NewsTap/5.5 (iPad)
Cancel-Lock: sha1:fe7gx9r3iIQXl/ARSp1eo28ho1s=
	sha1:hLp7fgwTH4JdbCv8VFCkY74d86Q=
Bytes: 4334

EricP <ThatWouldBeTelling@thevillage.com> wrote:
> MitchAlsup1 wrote:
>> On Wed, 18 Sep 2024 21:15:55 +0000, Brett wrote:
>> 
>>> EricP <ThatWouldBeTelling@thevillage.com> wrote:
>>>> Terje Mathisen wrote:
>>>>> EricP wrote:
>> 
>>> I always assumed that MULH just grabbed the part that would have been
>>> thrown away. And that is how at least one RISC-V core does it:
>>> 
>>> https://www.digikey.com/en/blog/how-the-risc-v-multiply-extension-adds-an-efficient-32-bit
>>> 
>>> 
>>> 
>>> They claim 5 cycles, should be six, five for the multiply and one more
>>> for the second result, unless the next instruction does not need a write
>>> port, and does not use the result. You can get a throughput of 5 cycles
>>> with
>>> smart coding, but that rarely happens without effort.
>> 
>> It is easy enough in the decoder to recognize a MUL followed by MULH
>> (and vice versa) as using the multiplier tree once and delivering 2
>> results. So the first result is 6 cycles, the second result on the 6th
>> cycle. {you ALMOST have to do this to avoid large wastes in power.}
> 
> Yes, but then you *require* a macro-op fuser to function efficiently.
> Probably... assuming it works.
> 
> OR one can give up the cherished 1-dest,2-source self imposed ISA design
> limitation and have a 32-bit instruction with four 5-bit registers,
> 2 source, 2 dest, leaving 12 bits for opcode and function code
> that you know will calculate multiply once, and can write back
> the result in 1 clock if it has two write ports (which it needs
> anyway if it wants any hope of catching up after a stall bubble).

You already have 2 source, 2 dest if you have load with address update.
A low end CPU is going to have a shared INT/FPU pipeline so you have the
hardware to do three sources for MAC. You might as well do 3 source 2 dest
on the int side as well. And ARM does Add with Shift which is 3 sources,
though one is a constant if you want one cycle uncracked throughput in most
designs.

> Also in the case of Alpha they only had unsigned MUL,MULH and
> for signed multiply it had to use branchy code (pre-CMOV) to
> do the signed correction subtracts, so fusion would be too complex.
> That design decision is as baffling as HP-PA originally leaving
> a MUL instruction out entirely because "it violated the 1-clock per
> instruction design philosophy". (HP quickly fixed it, but still...)