Deutsch English Français Italiano |
<fb85aae33e177f015053da6c6470aa23@www.novabbs.org> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!weretis.net!feeder9.news.weretis.net!i2pn.org!i2pn2.org!.POSTED!not-for-mail From: mitchalsup@aol.com (MitchAlsup1) Newsgroups: comp.arch Subject: Re: 88xxx or PPC Date: Sun, 26 May 2024 03:14:30 +0000 Organization: Rocksolid Light Message-ID: <fb85aae33e177f015053da6c6470aa23@www.novabbs.org> References: <uigus7$1pteb$1@dont-email.me> <ac55c75a923144f72d204c801ff7f984@www.novabbs.org> <20240303165533.00004104@yahoo.com> <2024Mar3.173345@mips.complang.tuwien.ac.at> <20240303203052.00007c61@yahoo.com> <2024Mar3.232237@mips.complang.tuwien.ac.at> <20240304171457.000067ea@yahoo.com> <2024Mar4.191835@mips.complang.tuwien.ac.at> <20240305001833.000027a9@yahoo.com> <0c2e37386287e8a0303191dc7b989c76@www.novabbs.org> <us5t1c$36voh$1@dont-email.me> <df173cbc4fb74394f9d03f285f9381f3@www.novabbs.org> <3hGFN.115182$m4d.77183@fx43.iad> <0cc87b9d559c4f79b9b2d7663fa3ccbf@www.novabbs.org> <us8l5d$6ae9$1@dont-email.me> <bbecdccd4319e935fd2a50f97664d6ea@www.novabbs.org> <usgid8$20vho$2@dont-email.me> <ebbef6dff0079e70dd333726d5c963bd@www.novabbs.org> <v038qn$bmtm$2@dont-email.me> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Info: i2pn2.org; logging-data="2242465"; mail-complaints-to="usenet@i2pn2.org"; posting-account="65wTazMNTleAJDh/pRqmKE7ADni/0wesT78+pyiDW8A"; User-Agent: Rocksolid Light X-Rslight-Site: $2y$10$7q9HKV1YX6RYZcytYxtsIuTbIwmkSoWrL9wChuZfP53E8Lpzjs0Kq X-Rslight-Posting-User: ac58ceb75ea22753186dae54d967fed894c3dce8 X-Spam-Checker-Version: SpamAssassin 4.0.0 Bytes: 5232 Lines: 70 Paul A. Clayton wrote: > On 3/8/24 11:14 PM, MitchAlsup1 wrote: > Even with My 66000's variable length instructions, most (by > frequency of occurrence) 32-bit immediates would be illegal > instructions and more significant 32-bit words in 64-bit > immediates would usually be illegal instructions, so one could > probably have highly accurate speculative predecode-on-fill. Since the variable length decoder is only 32 gates (equivalent in size to 3 1-bit flip-flops) one can simply attach said decoder to every word of storage in the instruction buffer. And arrange a tree of "If I get picked, here are my follow on instructions" Now, once one has a unary pointer into the IB, one gets 2 inst in 1 gate of delay, 4 in 2 gates, 8 in 3 gates,...until you get eaten alive with wire delay. Thus, if length decoding is easy, predecoding (into some kind of able) is unnecessary. > If branch prediction fetch ahead used instruction addresses > (rather than cache block addresses), a valid target prediction > would provide accurate predecode for the following instructions > and constrain the possible decodings for preceding instructions. > Mistakes in predecode that mistook an immediate 32-bit word for an > opcode-containing word might not be particularly painful. Now when these are mask out by the actual decode selection tree. > Mistakenly "finding" a branch in predecode might not be that > painful even if predicted taken — similar to a false BTB hit > corrected in decode. Wrongly "finding" an optimizable load > instruction might waste resources and introduce a minor glitch in > decode (where the "instruction" has to be retranslated into an > immediate component). > It *feels* attractive to me to have predecode fill a BTB-like > structure to reduce redundant data storage. Filling the "BTB" with > less critical instruction data when there are few (immediate- > based) branches seems less hurtful than losing some taken branch > targets, though a parallel ordinary BTB (redundant storage) might > compensate. The BTB-like structure might hold more diverse > information that could benefit from early availability; e.g., > loads from something like a "Knapsack Cache". (Even loads from a > more variable base might be sped by having a future file of two or > three such base addresses — or even just the least significant > bits — which could be accessed more quickly and earlier than the > general register file. Bases that are changed frequently with > dynamic values [not immediate addition] would rarely update the > future file fast enough to be useful. I think some x86 > implementations did something similar by adding segment base and > displacement early in the pipeline.) More generally, it seems that > the instruction stream could be parsed and stored into components > with different tradeoffs in latency, capacity, etc. > I do not know if such "aggressive" predecode would be worthwhile > nor what in-memory format would best manage the tradeoffs of > density, parallelism, criticality, etc. or what "L1 cache" format > would be best (with added specialization/utilization tradeoffs). It is a trade-off:: in a GBOoO design, adding a pipe stage cost around 2% (in an LBIO design around 5%) so the predictor has to buy more than 2% to "make the cut". It definitely would not make cut in the LBIO design, it may or may not make the cut in a GBOoO design. What we can say is: that the GBOoO design has to have some kind of branch prediction and not go so far as to assign is a name or a class.