Article <84999afd1377326f1e5e96040c46b992@www.novabbs.org>

Deutsch English Français Italiano
<84999afd1377326f1e5e96040c46b992@www.novabbs.org>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!news.misty.com!weretis.net!feeder9.news.weretis.net!news.nk.ca!rocksolid2!i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup1)
Newsgroups: comp.arch
Subject: Re: Instruction Tracing
Date: Sun, 11 Aug 2024 21:09:02 +0000
Organization: Rocksolid Light
Message-ID: <84999afd1377326f1e5e96040c46b992@www.novabbs.org>
References: <v970s3$flpo$1@dont-email.me> <2024Aug10.121802@mips.complang.tuwien.ac.at> <v995pm$1cni$2@gal.iecc.com> <2024Aug11.164438@mips.complang.tuwien.ac.at>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
	logging-data="2241623"; mail-complaints-to="usenet@i2pn2.org";
	posting-account="65wTazMNTleAJDh/pRqmKE7ADni/0wesT78+pyiDW8A";
User-Agent: Rocksolid Light
X-Rslight-Site: $2y$10$sG34souEdrkpEy/LAsaG3uRnXssmPQwLTTvFnjYTxeVWU3VnhgA/.
X-Spam-Checker-Version: SpamAssassin 4.0.0
X-Rslight-Posting-User: ac58ceb75ea22753186dae54d967fed894c3dce8
Bytes: 3500
Lines: 51

On Sun, 11 Aug 2024 14:44:38 +0000, Anton Ertl wrote:

> John Levine <johnl@taugh.com> writes:
>>As far as the delayed branches and such, they made sense in the narrow
>>time window when it was too expensive to put a cache on a workstation
>>but that time came and went by the time the RT shipped.
>
> Delayed branches were put in the first commercial generation of RISCs
> (except ARM), which all shipped with caches (except ARM).  Delayed
> branches are a natural consequence of the 5-stage (Or, in the 88100
> case, four-stage) pipeline.

Delayed branches are wonderful to the pipeline, very much less so for
the architecture overall as it makes wide issue "all that much harder"
It was truly a pain in the ass on Mc88120 a 6-wide machine.

Neither nullification or inverse nullification helped much and both
hurt at wide issue, too. At least Mc88100 had a bit to indicate
the delay slot was not being used.

Looking back, I wish we had not been forced to do them--I think many
of the 1st generation architects wish similarly. Delayed branches
were supposed to bring a 16% gain in performance. After looking at
the utility rates slightly less than 50% useful instructions, with
something slightly over 70% fill rate; they only brought 8%-ish.
{{A useful instruction is useful in both taken and non-taken paths.}}

> IIRC ARM used a 3-stage implementation for the ARM1/2, which may be a
> consequence of them rejecting delayed branches; and they did not have
> caches, so they could not have made use of the higher clock rate that
> a longer pipeline could have affored.  So it seems that the connection
> between cache and delayed branches, if there is any, is the opposite
> of what you suggest.
>
> Delayed branches provided a speedup on these early 5-stage
> implementations.  They also provided a big headache for more
> sophisticated implementations, and therefore soon fell out of favour.

Much like virtual caches...

The only thing that has persisted is LDs being longer than 2 cycles.
Squashing {forward, ADD, SRAM, LDalign} into 2 cycles is proving
to be a frequency headache in the simpler RISC-V implementations
even now. with wires getting slower and gates getting faster, that
trade off is getting worse. Many of the Intel x86s use 4 cycle LDs.
{the cost of frequency is efficiency}

> Power (IIRC) and Alpha don't have delayed branches.

Non of the modern RISCs have them either.

> - anton