Deutsch English Français Italiano |
<vs901i$f7e$1@reader1.panix.com> View for Bookmarking (what is this?) Look up another Usenet article |
Path: news.eternal-september.org!eternal-september.org!feeder3.eternal-september.org!panix!.POSTED.spitfire.i.gajendra.net!not-for-mail From: cross@spitfire.i.gajendra.net (Dan Cross) Newsgroups: comp.arch Subject: Re: MSI interrupts Date: Sat, 29 Mar 2025 14:28:02 -0000 (UTC) Organization: PANIX Public Access Internet and UNIX, NYC Message-ID: <vs901i$f7e$1@reader1.panix.com> References: <vqto79$335c6$1@dont-email.me> <34434320650f5844b18b1c0b684acf43@www.novabbs.org> <vs5305$rd4$1@reader1.panix.com> <cb049d5490b541878e264cedf95168e1@www.novabbs.org> Injection-Date: Sat, 29 Mar 2025 14:28:02 -0000 (UTC) Injection-Info: reader1.panix.com; posting-host="spitfire.i.gajendra.net:166.84.136.80"; logging-data="15598"; mail-complaints-to="abuse@panix.com" X-Newsreader: trn 4.0-test77 (Sep 1, 2010) Originator: cross@spitfire.i.gajendra.net (Dan Cross) In article <cb049d5490b541878e264cedf95168e1@www.novabbs.org>, MitchAlsup1 <mitchalsup@aol.com> wrote: >On Fri, 28 Mar 2025 2:53:57 +0000, Dan Cross wrote: >> [snip] >> What I really wanted was an example that exceeded that limit as >> an expository vehicle for understanding what happens when the >> limit is exceeded. What does the hardware do in that case? > >Raise OPERATION (instruction) Fault. Ok, very good. The software implications could be profound. >>>[snip] >>>// by placing all the the touching before any manifestation, you put >>>// all the touch latency* in the code before it has tried to damage any >>>// participating memory location. (*) and TLB latency and 2nd party >>>// observation of your event. >>> >>>// this would be the point where you would insert if( esmINTERFERENCE( >>>)) >>>// if you wanted control at a known failure point rather than at the >>>// top of the event on failure. >>> >>>> if (Ehead == a) >>>> *head = b; >>>> else if (Ehead == b) >>>> *head = a; >>>> >>>> b->next = an; >>>> if (an != NULL) { >>>> an->prev = b; >>>> } >>>> b->prev = ap; >>>> if (ap != NULL) { >>>> ap->next = b; >>>> } >>>> >>>> a->next = bn; >>>> if (bn != NULL) { >>>> bn->prev = a; >>>> } >>>> if (bp != NULL) { >>>> bp->next = a; >>>> } >>>> esmLOCKstore(a->prev, bp); >>>> } >>> >>>// now manifestation has lowest possible latency (as seen by this core >>>alone) >> >> Doesn't this code now assume that `an->prev` is in the same >> cache line as `an`, and similarly that `bn` is in the same line >> as `bn->prev`? Absent an alignment constraint on the `Node` >> type, that's not guaranteed given the definition posted earlier. > >Everything you state is true, I just tried to move the esmLOCKxxxx >up to the setup phase to obey ESM rules. Yeah, that makes sense. Describing it as a rule, however, raises the question of whether one must touch a line before a store to a different line, or just before a store to that line? >> That may be an odd and ill-advised thing for a programmer to do >> if they want their list type to work with atomic events, but it >> is possible. >> >> The node pointers and their respective `next` pointers are okay, >> so I wonder if perhaps this might have been written as: >> >> void >> swap_places(Node **head, Node *a, Node *b) >> { >> Node *hp, *an, *ap, *bn, *bp; >> >> assert(head != NULL); >> assert(a != NULL); >> assert(b != NULL); >> >> if (a == b) >> return; >> >> hp = esmLOCKload(*head); >> esmLOCKprefetch(an = esmLOCKload(a->next)); >> ap = esmLOCKload(a->prev); >> esmLOCKprefetch(bn = esmLOCKload(b->next)); >> bp = esmLOCKload(b->prev); >> >> if (an != NULL) // I see what you did >> esmLOCKprefetch(an->prev); >> if (bn != NULL) { >> esmLOCKprefetch(bn->prev); >> bn->prev = a; >> } >> >> if (hp == a) >> *head = b; >> else if (hp == b) >> *head = a; >> >> b->next = an; >> if (an != NULL) >> an->prev = b; >> b->prev = ap; >> if (ap != NULL) >> ap->next = b; >> // illustrative code >> a->next = bn; // ST Rbp,[Ra,#next] >> if (bp != NULL) // PNE0 Rbp,T >> bp->next = a; // ST Ra,[Rbp,#next] >> >> esmLOCKstore(a->prev, bp); >> } >> >> But now the conditional testing whether or not `an` is `NULL` is >> repeated. Is the additional branch overhead worth it here? > >In My 66000 ISA, a compare against zero (or NULL) is just a branch >instruction, so the CMP zero is performed twice, but each use is >but a single Branch-on-condition instruction (or Predicate-on- >Condition instruction). > >In the case of using predicates, FETCH/DECODE will simple issue >both then and else clauses into the execution window (else-clause >is empty here) and let the reservation stations handle execution >order. And the condition latency is purely the register dependence >chain. A 6-wide machine should have no trouble in inserting two >copies of the code commented by "illustrative code" above--in >this case 6-instructions or 2 sets of {ST, PNE0, ST}. > >In the case of using a real branch, latency per use is likely to >be 2-cycles, moderated by typical branch prediction. The condition >will have resolved early, so we are simply FETCH/DECODE/TAKE bound. > >{{That is: PRED should be faster in almost all conceivable cases.}} Okay, that all makes sense. Thanks for the detailed explanation. I agree it's very slick; is this documented somewhere publicly? If not, why not? >Generally when I write queue-code, I use a dummy Node front/rear >such that the checks for Null are unnecessary (at the cost of >following 1 more ->next or ->prev). That is Q->head and Q->tail >are never NULL and when the queue is empty there is a Node which >carries the fact the queue is empty (not using NULL as a pointer). > >But that is just my style. Ok. - Dan C.