| Deutsch English Français Italiano |
|
<88f842b71e49ef45e13df3b2081e7f7d@www.novabbs.org> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!weretis.net!feeder9.news.weretis.net!news.nk.ca!rocksolid2!i2pn2.org!.POSTED!not-for-mail From: mitchalsup@aol.com (MitchAlsup1) Newsgroups: comp.arch Subject: Re: Microarchitectural support for counting Date: Thu, 2 Jan 2025 19:45:36 +0000 Organization: Rocksolid Light Message-ID: <88f842b71e49ef45e13df3b2081e7f7d@www.novabbs.org> References: <2024Oct3.160055@mips.complang.tuwien.ac.at> <vdmrk6$3rksr$1@dont-email.me> <LyELO.69485$2nv5.62232@fx39.iad> <TdWLO.282116$FzW1.158190@fx14.iad> <963a276fd8d43e4212477cefae7f6e46@www.novabbs.org> <8IcMO.249144$v8v2.147178@fx18.iad> <vkhgkn$2g9gm$1@dont-email.me> <7bffca4c284d329c60d8e93c7382c30f@www.novabbs.org> <vl1nr6$2diq8$1@dont-email.me> <b8760a721775ebc2e1f232c2edae4be9@www.novabbs.org> <6CBdP.20426$nlJ1.2963@fx41.iad> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Info: i2pn2.org; logging-data="1700161"; mail-complaints-to="usenet@i2pn2.org"; posting-account="o5SwNDfMfYu6Mv4wwLiW6e/jbA93UAdzFodw5PEa6eU"; User-Agent: Rocksolid Light X-Rslight-Posting-User: cb29269328a20fe5719ed6a1c397e21f651bda71 X-Rslight-Site: $2y$10$FalCNxab4Ha0hlCDdNpW/uzctJ/gLC8zKC413U/zsniFFqBlFxXNm X-Spam-Checker-Version: SpamAssassin 4.0.0 Bytes: 5663 Lines: 98 On Thu, 2 Jan 2025 19:14:50 +0000, EricP wrote: > MitchAlsup1 wrote: >> On Tue, 31 Dec 2024 2:02:05 +0000, Paul A. Clayton wrote: >>> On 12/25/24 1:30 PM, MitchAlsup1 wrote: >>>> >>>> Sooner or later an ISR has to actually deal with the MMI/O >>>> control registers associated with the <ahem> interrupt. >>> >>> Yes, but multithreading could hide some of those latencies in >>> terms of throughput. >> >> EricP is the master proponent of finishing the instructions in the >> execution window that are finishable. I, merely, have no problem >> in allowing the pipe to complete or take a flush based on the kind >> of pipeline being engineered. >> >> With 300-odd instructions in the window this thesis has merit, >> with a 5-stage pipeline 1-wide, it does not have merit but is >> not devoid of merit either. > > It is also possible that the speculation barriers I describe below > will limit the benefits that pipelining exceptions and interrupts > might be able to see. > > The issue is that both exception handlers and interrupts usually read > and > write Privileged Control Registers (PCR) and/or MMIO device registers > very > early into the handler. Most MMIO device registers and cpu PCR cannot be > speculatively read as that may cause a state transition. > Of course all stores are never speculated and can only be initiated > at commit/retire. This becomes a question of "who knows what when". At the point of interrupt recognition (It has been raised, and I am going to take that interrupt) the pipeline has instructions retiring from the execution window, and instructions being performed, and instructions waiting for "things to happen". After interrupt recognition, you are inserting instructions into the execution window--but these are not speculative--they are known to not be under any speculation--they WILL execute to completion--regard- less of whether speculative instructions from before recognition are performed or flushed. This property is known until the ISR performs a predicted branch. So, it is possible to stream right onto an ISR--but few pipelines do. > The normal memory coherence rules assume that loads are to memory-like > locations that do not state transition on reads and that therefore > memory loads can be harmlessly replayed if needed. > While memory stores are not performed speculatively, an implementation > might speculatively prefetch a cache line as soon as a store is queued > and cause cache lines to ping-pong. > > But for loads to many MMIO devices and PCR effectively require a > speculation barrier in front of them to prevent replays. My 66000 architecture specifies that accesses to MMI/O space is performed as if the core were performing memory references sequentially consistent; obviating a need for SPCB instruction there. There is only 1 instruction used to read/write control registers. It reads the operand registers and the control register at the beginning of execution, but does not write the control register until retirement; obviating a need for SPCB instruction there. Also note: core[i] can access core[j] control registers, but this access takes place in MMI/O space (and is sequentially consistent). > A SPCB Speculation Barrier instruction could block speculation. > It stalls execution until all older conditional branches are resolved > and > all older instructions that might throw an exception have determined > they won't do so. > > The core could have an internal lookup table telling it which PCR can be > read speculatively because there are no side effects to doing so. > Those PCR would not require an SPCB to guard them. > > For MMIO device registers I think having an explicit SPCB instruction > might be better than putting a "no-speculate" flag on the PTE for the > device register address as that flag would be difficult to propagate > backwards from address translate to all the parts of the core that > we might have to sync with. I am curious. Is "unCacheable and MMI/O space" insufficient to figure out "Hey, it's non-speculative" too ?? > This all means that there may be very little opportunity for speculative > execution of their handlers, no matter how much hardware one tosses at > them. Good point, often unseen or unstated.