Path: ...!news.mixmin.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: "Paul A. Clayton" <paaronclayton@gmail.com>
Newsgroups: comp.arch
Subject: Re: Computer architects leaving Intel...
Date: Sun, 1 Sep 2024 17:02:16 -0400
Organization: A noiseless patient Spider
Lines: 34
Message-ID: <vb7stc$3fn7b$1@dont-email.me>
References: <vajo7i$2s028$1@dont-email.me>
 <memo.20240827205925.19028i@jgd.cix.co.uk> <valki8$35fk2$1@dont-email.me>
 <2644ef96e12b369c5fce9231bfc8030d@www.novabbs.org>
 <vam5qo$3bb7o$1@dont-email.me>
 <2f1a154a34f72709b0a23ac8e750b02b@www.novabbs.org>
 <vaoqcf$3r1u3$1@dont-email.me> <vavgq7$12u29$1@dont-email.me>
 <vb002r$156ge$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 03 Sep 2024 22:51:57 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="a620a73ff5d72ac87d55127b0fd959d2";
	logging-data="3661035"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1+GOZujd8kKtNJJYSscxkgwff3XZowBN2A="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
 Thunderbird/91.0
Cancel-Lock: sha1:+Ow7qjIfmqMzy5NGiA+95MvCo/I=
In-Reply-To: <vb002r$156ge$1@dont-email.me>
Bytes: 3019

On 8/31/24 4:56 PM, BGB wrote:
[snip]
> I was mostly doing dual-issue with a 4R2W design.
> 
> Initially, 6R3W won out mostly because 4R2W disallows an indexed 
> store to be run in parallel with another op; but 6R3W did allow 
> this. 

Stores and MADD allow one register read to be delayed by at least
one cycle. If the following cycle had a free read port, that could
be stolen to complete the store/MADD. This could be viewed as
cracking a three-source operation into a two-source operation and
a one-source operation that reads source operands in a following
cycle except that this operation never uses a result from the
previous cycle.

In a VLIW, one could even imagine the register name for the
delayed read being in the next instruction word if the available
read port was always from using an immediate or having fewer
source operands. This would add complexity for exceptions,
branches, and even instruction cache misses. With a small
buffer, a VLIW could also borrow from a previous cycle; an
operation with one register source could include a "load into
buffer" operation. (I do not recall ever reading about cross-
cycle/-instruction-word register fields in any VLIW. While it
seems to fit the VLIW model of static resource management, it
breaks the "atomic" view of an instruction word and of the
operation components — even borrowing within an instruction
word seems not to have been considered.)

Relying on forwarding or stealing from a future surplus would
result in variable performance unless the opportunities were
guaranteed (at least for enough cases that performance glitches
would not be significant).