Article <fb85aae33e177f015053da6c6470aa23@www.novabbs.org>

Deutsch English Français Italiano
<fb85aae33e177f015053da6c6470aa23@www.novabbs.org>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!weretis.net!feeder9.news.weretis.net!i2pn.org!i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup1)
Newsgroups: comp.arch
Subject: Re: 88xxx or PPC
Date: Sun, 26 May 2024 03:14:30 +0000
Organization: Rocksolid Light
Message-ID: <fb85aae33e177f015053da6c6470aa23@www.novabbs.org>
References: <uigus7$1pteb$1@dont-email.me> <ac55c75a923144f72d204c801ff7f984@www.novabbs.org> <20240303165533.00004104@yahoo.com> <2024Mar3.173345@mips.complang.tuwien.ac.at> <20240303203052.00007c61@yahoo.com> <2024Mar3.232237@mips.complang.tuwien.ac.at> <20240304171457.000067ea@yahoo.com> <2024Mar4.191835@mips.complang.tuwien.ac.at> <20240305001833.000027a9@yahoo.com> <0c2e37386287e8a0303191dc7b989c76@www.novabbs.org> <us5t1c$36voh$1@dont-email.me> <df173cbc4fb74394f9d03f285f9381f3@www.novabbs.org> <3hGFN.115182$m4d.77183@fx43.iad> <0cc87b9d559c4f79b9b2d7663fa3ccbf@www.novabbs.org> <us8l5d$6ae9$1@dont-email.me> <bbecdccd4319e935fd2a50f97664d6ea@www.novabbs.org> <usgid8$20vho$2@dont-email.me> <ebbef6dff0079e70dd333726d5c963bd@www.novabbs.org> <v038qn$bmtm$2@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
	logging-data="2242465"; mail-complaints-to="usenet@i2pn2.org";
	posting-account="65wTazMNTleAJDh/pRqmKE7ADni/0wesT78+pyiDW8A";
User-Agent: Rocksolid Light
X-Rslight-Site: $2y$10$7q9HKV1YX6RYZcytYxtsIuTbIwmkSoWrL9wChuZfP53E8Lpzjs0Kq
X-Rslight-Posting-User: ac58ceb75ea22753186dae54d967fed894c3dce8
X-Spam-Checker-Version: SpamAssassin 4.0.0
Bytes: 5232
Lines: 70

Paul A. Clayton wrote:

> On 3/8/24 11:14 PM, MitchAlsup1 wrote:


> Even with My 66000's variable length instructions, most (by
> frequency of occurrence) 32-bit immediates would be illegal
> instructions and more significant 32-bit words in 64-bit
> immediates would usually be illegal instructions, so one could
> probably have highly accurate speculative predecode-on-fill.

Since the variable length decoder is only 32 gates (equivalent in
size to 3 1-bit flip-flops) one can simply attach said decoder
to every word of storage in the instruction buffer. And arrange
a tree of "If I get picked, here are my follow on instructions"

Now, once one has a unary pointer into the IB, one gets 2 inst
in 1 gate of delay, 4 in 2 gates, 8 in 3 gates,...until you
get eaten alive with wire delay.

Thus, if length decoding is easy, predecoding (into some kind of
able) is unnecessary.

> If branch prediction fetch ahead used instruction addresses
> (rather than cache block addresses), a valid target prediction
> would provide accurate predecode for the following instructions
> and constrain the possible decodings for preceding instructions.

> Mistakes in predecode that mistook an immediate 32-bit word for an
> opcode-containing word might not be particularly painful.

Now when these are mask out by the actual decode selection tree.

> Mistakenly "finding" a branch in predecode might not be that
> painful even if predicted taken — similar to a false BTB hit
> corrected in decode. Wrongly "finding" an optimizable load
> instruction might waste resources and introduce a minor glitch in
> decode (where the "instruction" has to be retranslated into an
> immediate component).

> It *feels* attractive to me to have predecode fill a BTB-like
> structure to reduce redundant data storage. Filling the "BTB" with
> less critical instruction data when there are few (immediate-
> based) branches seems less hurtful than losing some taken branch
> targets, though a parallel ordinary BTB (redundant storage) might
> compensate. The BTB-like structure might hold more diverse
> information that could benefit from early availability; e.g.,
> loads from something like a "Knapsack Cache". (Even loads from a
> more variable base might be sped by having a future file of two or
> three such base addresses — or even just the least significant
> bits — which could be accessed more quickly and earlier than the
> general register file. Bases that are changed frequently with
> dynamic values [not immediate addition] would rarely update the
> future file fast enough to be useful. I think some x86
> implementations did something similar by adding segment base and
> displacement early in the pipeline.) More generally, it seems that
> the instruction stream could be parsed and stored into components
> with different tradeoffs in latency, capacity, etc.

> I do not know if such "aggressive" predecode would be worthwhile
> nor what in-memory format would best manage the tradeoffs of
> density, parallelism, criticality, etc. or what "L1 cache" format
> would be best (with added specialization/utilization tradeoffs).

It is a trade-off:: in a GBOoO design, adding a pipe stage cost 
around 2% (in an LBIO design around 5%) so the predictor has to
buy more than 2% to "make the cut". It definitely would not make
cut in the LBIO design, it may or may not make the cut in a GBOoO 
design. What we can say is: that the GBOoO design has to have some
kind of branch prediction and not go so far as to assign is a name
or a class.