Article <35b8ff2e6baa54c7aa22ec4edf45c3f9@www.novabbs.org>

Warning: mysqli::__construct(): (HY000/1203): User howardkn already has more than 'max_user_connections' active connections in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\includes\artfuncs.php on line 21
Failed to connect to MySQL: (1203) User howardkn already has more than 'max_user_connections' active connections
Warning: mysqli::query(): Couldn't fetch mysqli in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\index.php on line 66
Article <35b8ff2e6baa54c7aa22ec4edf45c3f9@www.novabbs.org>

Deutsch English Français Italiano

<35b8ff2e6baa54c7aa22ec4edf45c3f9@www.novabbs.org>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!weretis.net!feeder9.news.weretis.net!i2pn.org!i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup1)
Newsgroups: comp.arch
Subject: Re: Computer architects leaving Intel...
Date: Thu, 19 Sep 2024 16:01:48 +0000
Organization: Rocksolid Light
Message-ID: <35b8ff2e6baa54c7aa22ec4edf45c3f9@www.novabbs.org>
References: <vaqgtl$3526$1@dont-email.me> <p1cvdjpqjg65e6e3rtt4ua6hgm79cdfm2n@4ax.com> <2024Sep10.101932@mips.complang.tuwien.ac.at> <ygn8qvztf16.fsf@y.z> <2024Sep11.123824@mips.complang.tuwien.ac.at> <vbsoro$3ol1a$1@dont-email.me> <867cbhgozo.fsf@linuxsc.com> <20240912142948.00002757@yahoo.com> <vbuu5n$9tue$1@dont-email.me> <20240915001153.000029bf@yahoo.com> <vc6jbk$5v9f$1@paganini.bofh.team> <20240915154038.0000016e@yahoo.com> <vc70sl$285g2$4@dont-email.me> <vc73bl$28v0v$1@dont-email.me> <OvEFO.70694$EEm7.38286@fx16.iad> <32a15246310ea544570564a6ea100cab@www.novabbs.org> <vc7a6h$2afrl$2@dont-email.me> <50cd3ba7c0cbb587a55dd67ae46fc9ce@www.novabbs.org> <vc8qic$2od19$1@dont-email.me> <fCXFO.4617$9Rk4.4393@fx37.iad> <vcb730$3ci7o$1@dont-email.me> <7cBGO.169512$_o_3.43954@fx17.iad> <vcffub$77jk$1@dont-email.me> <n7XGO.89096$15a6.87061@fx12.iad>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
	logging-data="2662534"; mail-complaints-to="usenet@i2pn2.org";
	posting-account="65wTazMNTleAJDh/pRqmKE7ADni/0wesT78+pyiDW8A";
User-Agent: Rocksolid Light
X-Rslight-Posting-User: ac58ceb75ea22753186dae54d967fed894c3dce8
X-Rslight-Site: $2y$10$fgb3OB10o68XHQw9bjC9d.ORJBRzabTK/Gyxlb9zmhh8ozkdQOp92
X-Spam-Checker-Version: SpamAssassin 4.0.0
Bytes: 5916
Lines: 96

On Thu, 19 Sep 2024 15:07:11 +0000, EricP wrote:

> Brett wrote:
>> EricP <ThatWouldBeTelling@thevillage.com> wrote:
>>
>> They claim 5 cycles, should be six, five for the multiply and one more
>> for
>> the second result, unless the next instruction does not need a write
>> port,
>> and does not use the result. You can get a throughput of 5 cycles with
>> smart coding, but that rarely happens without effort.
>
> That article is ignoring multiplier pipelining.
> If the multiplier is pipelined with a latency of 5 and throughput of 1,
> then MULL takes 5 cycles and MULL,MULH takes 6.
>
> But those two multiplies still are tossing away 50% of their work.
> And if it does fuse them then the internal uArch cost is the same as if
> you had designed it optimally from the start, except now you have
> to pay for a fuser.

You failed to recognize the critical part of my comment on this::

When the IMUL function unit sees MULL and MULH back to back AND
when both operands are the same for both instructions; it KNOWS
that the second multiply has the same result as the first and
thereby that the second multiply can be suppressed and the first
multiply used twice. {{In pure CMOS, if you drop the same operands
twice into the multiplier tree, the multiplier tree burns no power
in any event, just the operand delivery power.}}

You may call this fusion, but it is the very lowest level of it
and was not called such when first used.

> <sound of soap box being dragged out>
> This idea that macro-op fusion is some magic solution is bullshit.
Agreed
> 1) It's not free.
Far from it.
> 2) It only works where Decode can see *all* the required lookahead
>     instructions, which means you have to pay for an N-lane decoder
>     but only get 1 lane.
I think it is but a crutch for a misdesigned ISA
> 3) It's probabilistic as it depends on how the fetch buffers get loaded.
>     Eg if the fetch buffer contains a valid instruction but does not
> have
>     a next instruction, do you stall Decode to see if a fuser might
> arrive
>     or dispatch it anyway.
It can be worse than that
> 4) It gets exponentially expensive if you start doing multiple
> instruction
>     lanes because decode has to deal with all the permutations of
>     fusion possibilities.
All the more reason to have a better ISA
> 5) Any fused instructions leave (multiple) bubbles that should be
>     compacted out or there wasn't much point to doing the fusion.

One of the interesting things I have noticed with my ISA is that
when one has a properly designed higher level ISA, one gets rid
of so many of the "easy to schedule" instructions that one ends
up with 30 FMAC instructions in a row, with no other instruction
to occupy any of the other function units.

> In my opinion it is better to have an ISA that is optimal by design
> rather than being patched up by fusion later.

Indeed.

> Some of this inefficiency is caused by clinging to now 40 year old
> risc design *guidelines* (ie not even rules) that:
> - instructions have at most 1 dest and 2 source registers
Makes FMAC had
> - register specifier fields are either source or dest, never both
I happen to be wishywashy on this
> - instructions should take at most 1 clock (they never did)
This never worked for floating point anyway...and many consider
branches and memory references as not fitting that tenet either.

What is required is that each instruction can be decoded in a single
cycle and delivered to whichever function unit in one cycle.

> These self imposed design restrictions cause ISA designers to miss
> some possible more optimal solutions. The result is things like
> RISC-V's memory reference linkage structures taking 6 instructions
> to build a 64-bit PC-relative address. And I'm pretty sure we won't
> see any 6 instruction fusers for quite some time.

And it is just "so unnecessary".

I suspect that RISC-V will end up choosing AUPIC-LD-JMP instead
loosing the PIC nature of flow control.

Doing it right the first time is so much easier for everyone now
and down the line.
>
> <sound of soap box being dragged back to cupboard>