Article <88f842b71e49ef45e13df3b2081e7f7d@www.novabbs.org>

Deutsch English Français Italiano
<88f842b71e49ef45e13df3b2081e7f7d@www.novabbs.org>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!weretis.net!feeder9.news.weretis.net!news.nk.ca!rocksolid2!i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup1)
Newsgroups: comp.arch
Subject: Re: Microarchitectural support for counting
Date: Thu, 2 Jan 2025 19:45:36 +0000
Organization: Rocksolid Light
Message-ID: <88f842b71e49ef45e13df3b2081e7f7d@www.novabbs.org>
References: <2024Oct3.160055@mips.complang.tuwien.ac.at> <vdmrk6$3rksr$1@dont-email.me> <LyELO.69485$2nv5.62232@fx39.iad> <TdWLO.282116$FzW1.158190@fx14.iad> <963a276fd8d43e4212477cefae7f6e46@www.novabbs.org> <8IcMO.249144$v8v2.147178@fx18.iad> <vkhgkn$2g9gm$1@dont-email.me> <7bffca4c284d329c60d8e93c7382c30f@www.novabbs.org> <vl1nr6$2diq8$1@dont-email.me> <b8760a721775ebc2e1f232c2edae4be9@www.novabbs.org> <6CBdP.20426$nlJ1.2963@fx41.iad>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
	logging-data="1700161"; mail-complaints-to="usenet@i2pn2.org";
	posting-account="o5SwNDfMfYu6Mv4wwLiW6e/jbA93UAdzFodw5PEa6eU";
User-Agent: Rocksolid Light
X-Rslight-Posting-User: cb29269328a20fe5719ed6a1c397e21f651bda71
X-Rslight-Site: $2y$10$FalCNxab4Ha0hlCDdNpW/uzctJ/gLC8zKC413U/zsniFFqBlFxXNm
X-Spam-Checker-Version: SpamAssassin 4.0.0
Bytes: 5663
Lines: 98

On Thu, 2 Jan 2025 19:14:50 +0000, EricP wrote:

> MitchAlsup1 wrote:
>> On Tue, 31 Dec 2024 2:02:05 +0000, Paul A. Clayton wrote:
>>> On 12/25/24 1:30 PM, MitchAlsup1 wrote:
>>>>
>>>> Sooner or later an ISR has to actually deal with the MMI/O
>>>> control registers associated with the <ahem> interrupt.
>>>
>>> Yes, but multithreading could hide some of those latencies in
>>> terms of throughput.
>>
>> EricP is the master proponent of finishing the instructions in the
>> execution window that are finishable. I, merely, have no problem
>> in allowing the pipe to complete or take a flush based on the kind
>> of pipeline being engineered.
>>
>> With 300-odd instructions in the window this thesis has merit,
>> with a 5-stage pipeline 1-wide, it does not have merit but is
>> not devoid of merit either.
>
> It is also possible that the speculation barriers I describe below
> will limit the benefits that pipelining exceptions and interrupts
> might be able to see.
>
> The issue is that both exception handlers and interrupts usually read
> and
> write Privileged Control Registers (PCR) and/or MMIO device registers
> very
> early into the handler. Most MMIO device registers and cpu PCR cannot be
> speculatively read as that may cause a state transition.
> Of course all stores are never speculated and can only be initiated
> at commit/retire.

This becomes a question of "who knows what when".

At the point of interrupt recognition (It has been raised, and I am
going
to take that interrupt) the pipeline has instructions retiring from the
execution window, and instructions being performed, and instructions
waiting for "things to happen".

After interrupt recognition, you are inserting instructions into the
execution window--but these are not speculative--they are known to
not be under any speculation--they WILL execute to completion--regard-
less of whether speculative instructions from before recognition are
performed or flushed. This property is known until the ISR performs
a predicted branch.

So, it is possible to stream right onto an ISR--but few pipelines do.

> The normal memory coherence rules assume that loads are to memory-like
> locations that do not state transition on reads and that therefore
> memory loads can be harmlessly replayed if needed.
> While memory stores are not performed speculatively, an implementation
> might speculatively prefetch a cache line as soon as a store is queued
> and cause cache lines to ping-pong.
>
> But for loads to many MMIO devices and PCR effectively require a
> speculation barrier in front of them to prevent replays.

My 66000 architecture specifies that accesses to MMI/O space is
performed
as if the core were performing memory references sequentially
consistent;
obviating a need for SPCB instruction there.

There is only 1 instruction used to read/write control registers. It
reads the operand registers and the control register at the beginning
of execution, but does not write the control register until retirement;
obviating a need for SPCB instruction there.

Also note: core[i] can access core[j] control registers, but this access
takes place in MMI/O space (and is sequentially consistent).

> A SPCB Speculation Barrier instruction could block speculation.
> It stalls execution until all older conditional branches are resolved
> and
> all older instructions that might throw an exception have determined
> they won't do so.
>
> The core could have an internal lookup table telling it which PCR can be
> read speculatively because there are no side effects to doing so.
> Those PCR would not require an SPCB to guard them.
>
> For MMIO device registers I think having an explicit SPCB instruction
> might be better than putting a "no-speculate" flag on the PTE for the
> device register address as that flag would be difficult to propagate
> backwards from address translate to all the parts of the core that
> we might have to sync with.

I am curious. Is "unCacheable and MMI/O space" insufficient to figure
out "Hey, it's non-speculative" too ??

> This all means that there may be very little opportunity for speculative
> execution of their handlers, no matter how much hardware one tosses at
> them.

Good point, often unseen or unstated.