Article <4a51688d608fb3164e8eab049810ed38@www.novabbs.org>

Deutsch English Français Italiano
<4a51688d608fb3164e8eab049810ed38@www.novabbs.org>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!weretis.net!feeder9.news.weretis.net!news.nk.ca!rocksolid2!i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup1)
Newsgroups: comp.arch
Subject: Re: MSI interrupts
Date: Tue, 18 Mar 2025 19:19:41 +0000
Organization: Rocksolid Light
Message-ID: <4a51688d608fb3164e8eab049810ed38@www.novabbs.org>
References: <vqto79$335c6$1@dont-email.me> <53b8227eba214e0340cad309241af7b5@www.novabbs.org> <3pXAP.584096$FVcd.26370@fx10.iad> <795b541375e3e0f53e2c76a55ffe3f20@www.novabbs.org> <vNZAP.37553$D_V4.18229@fx39.iad> <aceeec2839b8824d52f0cbe709af51e1@www.novabbs.org> <eM_AP.81303$8rz3.7843@fx37.iad> <vr2nj9$2goqe$1@dont-email.me> <f2cb846242dbfcef1efa59b92763a965@www.novabbs.org> <vr4ovm$9fl5$1@dont-email.me> <1681197d3c1af131d6b8cae884f7c9ca@www.novabbs.org> <vr7g76$2jnqm$1@dont-email.me> <8BVBP.816276$eNx6.247046@fx14.iad> <20250317161132.00004dd9@yahoo.com> <1WZBP.558392$SZca.243157@fx13.iad> <ef12021b16a514c71a5cab2f0efa60c7@www.novabbs.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
	logging-data="750090"; mail-complaints-to="usenet@i2pn2.org";
	posting-account="o5SwNDfMfYu6Mv4wwLiW6e/jbA93UAdzFodw5PEa6eU";
User-Agent: Rocksolid Light
X-Rslight-Site: $2y$10$wBp/AKV3EvmKZpAEyv8emOVD4TJ.rAp/g600F8fwVXAAB4RltpkJm
X-Spam-Checker-Version: SpamAssassin 4.0.0
X-Rslight-Posting-User: cb29269328a20fe5719ed6a1c397e21f651bda71
Bytes: 7109
Lines: 119

On Mon, 17 Mar 2025 21:49:18 +0000, MitchAlsup1 wrote:

> On Mon, 17 Mar 2025 18:33:09 +0000, EricP wrote:
>
>> Michael S wrote:
>>> On Mon, 17 Mar 2025 13:38:12 GMT
>>> scott@slp53.sl.home (Scott Lurndal) wrote:
> ------------------
>>>
>>> The problem Robert is talking about arises when there are many
>>> interrupt source and many target CPUs.
>>> The required routing/prioritization/acknowledgment logic (at least a
>>> naive logic I am having in mind) would be either non-scalable or
>>> relatively complicated. Process of selection for the second case will
>>> take multiple cycles (I am thinking about ring).
>>
>> Another problem is what does the core do with the in flight
>> instructions.
>>
>> Method 1 is simplest, it injects the interrupt request at Retire
>> as that's where the state of everything is synchronized.
>> The consequence is that, like exceptions, the in flight instructions all
>> get purged, and we save the committed RIP, RSP and interrupt control
>> word.
>> While that might be acceptable for a 5 stage in-order pipeline,
>> it could be pretty expensive for an OoO 200+ instruction queue
>> potentially tossing hundreds of cycles of near finished work.
>
> Lowest interrupt Latency
> Highest waste of power (i.e., work)
>
>> Method 2 pipelines the switch by injecting the interrupt request at
>> Fetch.
>> Decode converts the request to a special uOp that travels down the IQ
>> to Retire and allows all the older work to complete.
>> This is more complex as it requires a two phase hand-off from the
>> Interrupt Control Unit (ICU) to the core as a branch mispredict in the
>> in flight instructions might cause a tentative interrupt acceptance to
>> later be withdrawn.
>
> Interrupt latency is dependent on executing instructions,
> Lowest waste of power
Highest chance of mucking it all up
>
> But note: In most cases, it already took the interrupt ~150 nanoseconds
> to arrive at the interrupt service port. 1 trip from device to DRAM
> (possibly serviced by L3), 1 trip from DRAM back to device, 1 tip from
> device to interrupt service port; and 4 DRAM (or L3) accesses to log
> interrupt into table.
>
> Also, in most cases, the 200-odd instructions in the window will finish
> in 100-cycles or as little as 20ns--but if the FDIV unit is saturated,
> interrupt latency could be as high as 640 cycles and as long as 640ns.

There is the other issue adding complexity to method 2,
The instructions in the window may raise exceptions.

The general rule is that the instruction raising the first exception
has its address in IP, and then the retire logic has to flush subseq-
uent instructions until it arrives at the first instruction of the
Interrupt dispatcher.

The exception will be handled when control returns from interruption.

>
>> The ICU believes the core is in a state to accept a higher priority
>> interrupt. It sends a request to core, which checks its current state
>> and
>> sends back an immediate INT_ACK if _might_ accept and stalls Fetch, or a
>> NAK.
>
> In My 66000, ICU knows nothing about the priority level (or state)
> of any core in the system. Instead, when a new higher priority
> interrupt is raised, the ISP broadcasts a 64-bit mask indicating
> which priority levels in the interrupt table have pending inter-
> rupts with an MMI/O message to the address of the interrupt table.
>
> All cores monitoring that interrupt table capture the broadcast,
> and each core decides to negotiate for an (not that) interrupt
> by requesting the highest priority interrupt from the table.
>
> When the request returns, and it is still at a higher priority
> than the core is running, core performs interrupt control transfer.
> If the interrupt is below the core's priority it is returned to
> ISP as if NAKed.
>
> Prior to interrupt control transfer, core remains running what-
> ever it was running--and all the interrupt stuff is done by state
> machines at the edge of the core and the L3/DRAM controller.
>
>> When the special uOp reaches Retire, it sends a signal to Fetch which
>> then sends an INT_ACCEPT signal to ICU to complete the handoff.
>> If a branch mispredict occurs that causes interrupts to be disabled,
>> then Fetch sends an INT_REJECT to ICU, and unstalls its fetching.
>> (Yes that is not optimal - make it work first, make it work well
>> second.)
>>
>> This also raises a question about what the ICU is doing during this
>> long latency handoff. One wouldn't want ICU to sit idle so it might
>> have to manage the handoff of multiple interrupts to multiple cores
>> at the same time, each as its own little state machine.
>
> One must assume that ISP is capable of taking a new interrupt
> from a device every 5-ish cycles and interrupt handoff is in the
> range of 50 cycles, and that each interrupt could be to a different
> interrupt table.
>
> My 66000 ISP treats successive requests to any one table as strongly
> ordered, and requests to different tables as completely unordered.
>
>> One should see that this decision on how the core handles the
>> handoff has a large impact on the design complexity of the ICU.
>
> I did not "see" that in My 66000's interrupt architecture. The ISP
> complexity is fixed, and the core's interrupt negotiator is a small
> state machine (~10-states).
>
> ISP essentially performs 4-5 64-bit memory accesses, and possibly
> 1 MMI/O 64-bit broadcast on arrival of MSI-X interrupt. Then if
> a core negotiates, it performs 3 more memory accesses per negotiator.