Deutsch English Français Italiano |
<84999afd1377326f1e5e96040c46b992@www.novabbs.org> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!news.misty.com!weretis.net!feeder9.news.weretis.net!news.nk.ca!rocksolid2!i2pn2.org!.POSTED!not-for-mail From: mitchalsup@aol.com (MitchAlsup1) Newsgroups: comp.arch Subject: Re: Instruction Tracing Date: Sun, 11 Aug 2024 21:09:02 +0000 Organization: Rocksolid Light Message-ID: <84999afd1377326f1e5e96040c46b992@www.novabbs.org> References: <v970s3$flpo$1@dont-email.me> <2024Aug10.121802@mips.complang.tuwien.ac.at> <v995pm$1cni$2@gal.iecc.com> <2024Aug11.164438@mips.complang.tuwien.ac.at> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Info: i2pn2.org; logging-data="2241623"; mail-complaints-to="usenet@i2pn2.org"; posting-account="65wTazMNTleAJDh/pRqmKE7ADni/0wesT78+pyiDW8A"; User-Agent: Rocksolid Light X-Rslight-Site: $2y$10$sG34souEdrkpEy/LAsaG3uRnXssmPQwLTTvFnjYTxeVWU3VnhgA/. X-Spam-Checker-Version: SpamAssassin 4.0.0 X-Rslight-Posting-User: ac58ceb75ea22753186dae54d967fed894c3dce8 Bytes: 3500 Lines: 51 On Sun, 11 Aug 2024 14:44:38 +0000, Anton Ertl wrote: > John Levine <johnl@taugh.com> writes: >>As far as the delayed branches and such, they made sense in the narrow >>time window when it was too expensive to put a cache on a workstation >>but that time came and went by the time the RT shipped. > > Delayed branches were put in the first commercial generation of RISCs > (except ARM), which all shipped with caches (except ARM). Delayed > branches are a natural consequence of the 5-stage (Or, in the 88100 > case, four-stage) pipeline. Delayed branches are wonderful to the pipeline, very much less so for the architecture overall as it makes wide issue "all that much harder" It was truly a pain in the ass on Mc88120 a 6-wide machine. Neither nullification or inverse nullification helped much and both hurt at wide issue, too. At least Mc88100 had a bit to indicate the delay slot was not being used. Looking back, I wish we had not been forced to do them--I think many of the 1st generation architects wish similarly. Delayed branches were supposed to bring a 16% gain in performance. After looking at the utility rates slightly less than 50% useful instructions, with something slightly over 70% fill rate; they only brought 8%-ish. {{A useful instruction is useful in both taken and non-taken paths.}} > IIRC ARM used a 3-stage implementation for the ARM1/2, which may be a > consequence of them rejecting delayed branches; and they did not have > caches, so they could not have made use of the higher clock rate that > a longer pipeline could have affored. So it seems that the connection > between cache and delayed branches, if there is any, is the opposite > of what you suggest. > > Delayed branches provided a speedup on these early 5-stage > implementations. They also provided a big headache for more > sophisticated implementations, and therefore soon fell out of favour. Much like virtual caches... The only thing that has persisted is LDs being longer than 2 cycles. Squashing {forward, ADD, SRAM, LDalign} into 2 cycles is proving to be a frequency headache in the simpler RISC-V implementations even now. with wires getting slower and gates getting faster, that trade off is getting worse. Many of the Intel x86s use 4 cycle LDs. {the cost of frequency is efficiency} > Power (IIRC) and Alpha don't have delayed branches. Non of the modern RISCs have them either. > - anton