Path: ...!weretis.net!feeder9.news.weretis.net!panix!.POSTED.spitfire.i.gajendra.net!not-for-mail From: cross@spitfire.i.gajendra.net (Dan Cross) Newsgroups: comp.arch Subject: Re: MSI interrupts Date: Sat, 29 Mar 2025 20:58:44 -0000 (UTC) Organization: PANIX Public Access Internet and UNIX, NYC Message-ID: References: <1b9a9644c3f5cbd2985b89443041e01a@www.novabbs.org> Injection-Date: Sat, 29 Mar 2025 20:58:44 -0000 (UTC) Injection-Info: reader1.panix.com; posting-host="spitfire.i.gajendra.net:166.84.136.80"; logging-data="7471"; mail-complaints-to="abuse@panix.com" X-Newsreader: trn 4.0-test77 (Sep 1, 2010) Originator: cross@spitfire.i.gajendra.net (Dan Cross) Bytes: 6369 Lines: 141 In article <1b9a9644c3f5cbd2985b89443041e01a@www.novabbs.org>, MitchAlsup1 wrote: >On Sat, 29 Mar 2025 14:28:02 +0000, Dan Cross wrote: >> In article , >> MitchAlsup1 wrote: >>>On Fri, 28 Mar 2025 2:53:57 +0000, Dan Cross wrote: >>>> [snip] >>>> What I really wanted was an example that exceeded that limit as >>>> an expository vehicle for understanding what happens when the >>>> limit is exceeded. What does the hardware do in that case? >>> >>>Raise OPERATION (instruction) Fault. >> >> Ok, very good. The software implications could be profound. >> >>>>>[snip]--------- >>>> >>>> Doesn't this code now assume that `an->prev` is in the same >>>> cache line as `an`, and similarly that `bn` is in the same line >>>> as `bn->prev`? Absent an alignment constraint on the `Node` >>>> type, that's not guaranteed given the definition posted earlier. >>> >>>Everything you state is true, I just tried to move the esmLOCKxxxx >>>up to the setup phase to obey ESM rules. >> >> Yeah, that makes sense. Describing it as a rule, however, >> raises the question of whether one must touch a line before a >> store to a different line, or just before a store to that line? > >Right from the spec:: > >"There is an order of instructions imposed upon software compiler> where: > >• all participating inbound memory reference instructions shall be >performed prior to manifestation; >• non-participating inbound memory reference instructions should be >performed prior to manifestation; >• the query instruction denotes the boundary between setup >and manifestation; >• the first Outbound to a participating cache line begins >manifestation >• the only Outbound with Lock bit set (L ≡ 1) completes the event > >The processor monitors the instruction sequence of the ATOMIC event and >will raise the OPERATION exception when the imposed order is violated." > >Basically, all participating page-faults happen prior to any >modification >attempts. Ok. >>>> That may be an odd and ill-advised thing for a programmer to do >>>> if they want their list type to work with atomic events, but it >>>> is possible. >>>> >>>> The node pointers and their respective `next` pointers are okay, >>>> so I wonder if perhaps this might have been written as: >>>> >>>> void >>>> swap_places(Node **head, Node *a, Node *b) >>>> { >>>> Node *hp, *an, *ap, *bn, *bp; >>>> >>>> assert(head != NULL); >>>> assert(a != NULL); >>>> assert(b != NULL); >>>> >>>> if (a == b) >>>> return; >>>> >>>> hp = esmLOCKload(*head); >>>> esmLOCKprefetch(an = esmLOCKload(a->next)); >>>> ap = esmLOCKload(a->prev); >>>> esmLOCKprefetch(bn = esmLOCKload(b->next)); >>>> bp = esmLOCKload(b->prev); >>>> >>>> if (an != NULL) // I see what you did >>>> esmLOCKprefetch(an->prev); >>>> if (bn != NULL) { >>>> esmLOCKprefetch(bn->prev); >>>> bn->prev = a; >>>> } >>>> >>>> if (hp == a) >>>> *head = b; >>>> else if (hp == b) >>>> *head = a; >>>> >>>> b->next = an; >>>> if (an != NULL) >>>> an->prev = b; >>>> b->prev = ap; >>>> if (ap != NULL) >>>> ap->next = b; >>>> // illustrative code >>>> a->next = bn; // ST Rbp,[Ra,#next] >>>> if (bp != NULL) // PNE0 Rbp,T >>>> bp->next = a; // ST Ra,[Rbp,#next] >>>> >>>> esmLOCKstore(a->prev, bp); >>>> } >>>> >>>> But now the conditional testing whether or not `an` is `NULL` is >>>> repeated. Is the additional branch overhead worth it here? >>> >>>In My 66000 ISA, a compare against zero (or NULL) is just a branch >>>instruction, so the CMP zero is performed twice, but each use is >>>but a single Branch-on-condition instruction (or Predicate-on- >>>Condition instruction). >>> >>>In the case of using predicates, FETCH/DECODE will simple issue >>>both then and else clauses into the execution window (else-clause >>>is empty here) and let the reservation stations handle execution >>>order. And the condition latency is purely the register dependence >>>chain. A 6-wide machine should have no trouble in inserting two >>>copies of the code commented by "illustrative code" above--in >>>this case 6-instructions or 2 sets of {ST, PNE0, ST}. >>> >>>In the case of using a real branch, latency per use is likely to >>>be 2-cycles, moderated by typical branch prediction. The condition >>>will have resolved early, so we are simply FETCH/DECODE/TAKE bound. >>> >>>{{That is: PRED should be faster in almost all conceivable cases.}} >> >> Okay, that all makes sense. Thanks for the detailed >> explanation. I agree it's very slick; is this documented >> somewhere publicly? If not, why not? > >Documented: Yes 21 pages. >Public: I keep holding back as if I were to attempt to patent certain >aspects. But I don't seem to have the energy to do such. I as if you >wanted to see it. (I assume you meant "ask if you want to see it"?) Sure, I think it would be interesting. Should I send you an email? - Dan C.