Article <vs901i$f7e$1@reader1.panix.com>

Warning: mysqli::__construct(): (HY000/1203): User howardkn already has more than 'max_user_connections' active connections in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\includes\artfuncs.php on line 21
Failed to connect to MySQL: (1203) User howardkn already has more than 'max_user_connections' active connections
Warning: mysqli::query(): Couldn't fetch mysqli in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\index.php on line 66
Article <vs901i$f7e$1@reader1.panix.com>

Deutsch English Français Italiano

<vs901i$f7e$1@reader1.panix.com>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: news.eternal-september.org!eternal-september.org!feeder3.eternal-september.org!panix!.POSTED.spitfire.i.gajendra.net!not-for-mail
From: cross@spitfire.i.gajendra.net (Dan Cross)
Newsgroups: comp.arch
Subject: Re: MSI interrupts
Date: Sat, 29 Mar 2025 14:28:02 -0000 (UTC)
Organization: PANIX Public Access Internet and UNIX, NYC
Message-ID: <vs901i$f7e$1@reader1.panix.com>
References: <vqto79$335c6$1@dont-email.me> <34434320650f5844b18b1c0b684acf43@www.novabbs.org> <vs5305$rd4$1@reader1.panix.com> <cb049d5490b541878e264cedf95168e1@www.novabbs.org>
Injection-Date: Sat, 29 Mar 2025 14:28:02 -0000 (UTC)
Injection-Info: reader1.panix.com; posting-host="spitfire.i.gajendra.net:166.84.136.80";
	logging-data="15598"; mail-complaints-to="abuse@panix.com"
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
Originator: cross@spitfire.i.gajendra.net (Dan Cross)

In article <cb049d5490b541878e264cedf95168e1@www.novabbs.org>,
MitchAlsup1 <mitchalsup@aol.com> wrote:
>On Fri, 28 Mar 2025 2:53:57 +0000, Dan Cross wrote:
>> [snip]
>> What I really wanted was an example that exceeded that limit as
>> an expository vehicle for understanding what happens when the
>> limit is exceeded.  What does the hardware do in that case?
>
>Raise OPERATION (instruction) Fault.

Ok, very good.  The software implications could be profound.

>>>[snip]
>>>// by placing all the the touching before any manifestation, you put
>>>// all the touch latency* in the code before it has tried to damage any
>>>// participating memory location. (*) and TLB latency and 2nd party
>>>// observation of your event.
>>>
>>>// this would be the point where you would insert if( esmINTERFERENCE(
>>>))
>>>// if you wanted control at a known failure point rather than at the
>>>// top of the event on failure.
>>>
>>>>         if (Ehead == a)
>>>>                 *head = b;
>>>>         else if (Ehead == b)
>>>>                 *head = a;
>>>>
>>>>         b->next = an;
>>>>         if (an != NULL) {
>>>>                 an->prev = b;
>>>> 		}
>>>>         b->prev = ap;
>>>>         if (ap != NULL) {
>>>>                 ap->next = b;
>>>> 		}
>>>>
>>>>         a->next = bn;
>>>>         if (bn != NULL) {
>>>>                 bn->prev = a;
>>>> 		}
>>>>         if (bp != NULL) {
>>>>                 bp->next = a;
>>>> 		}
>>>>         esmLOCKstore(a->prev, bp);
>>>> }
>>>
>>>// now manifestation has lowest possible latency (as seen by this core
>>>alone)
>>
>> Doesn't this code now assume that `an->prev` is in the same
>> cache line as `an`, and similarly that `bn` is in the same line
>> as `bn->prev`? Absent an alignment constraint on the `Node`
>> type, that's not guaranteed given the definition posted earlier.
>
>Everything you state is true, I just tried to move the esmLOCKxxxx
>up to the setup phase to obey ESM rules.

Yeah, that makes sense.  Describing it as a rule, however,
raises the question of whether one must touch a line before a
store to a different line, or just before a store to that line?

>> That may be an odd and ill-advised thing for a programmer to do
>> if they want their list type to work with atomic events, but it
>> is possible.
>>
>> The node pointers and their respective `next` pointers are okay,
>> so I wonder if perhaps this might have been written as:
>>
>> void
>> swap_places(Node **head, Node *a, Node *b)
>> {
>>         Node *hp, *an, *ap, *bn, *bp;
>>
>>         assert(head != NULL);
>>         assert(a != NULL);
>>         assert(b != NULL);
>>
>>         if (a == b)
>>                 return;
>>
>>         hp = esmLOCKload(*head);
>>         esmLOCKprefetch(an = esmLOCKload(a->next));
>>         ap = esmLOCKload(a->prev);
>>         esmLOCKprefetch(bn = esmLOCKload(b->next));
>>         bp = esmLOCKload(b->prev);
>>
>>         if (an != NULL)                    // I see what you did
>>                 esmLOCKprefetch(an->prev);
>>         if (bn != NULL) {
>>                 esmLOCKprefetch(bn->prev);
>> 				bn->prev = a;
>>         }
>>
>>         if (hp == a)
>>                 *head = b;
>>         else if (hp == b)
>>                 *head = a;
>>
>>         b->next = an;
>>         if (an != NULL)
>>                 an->prev = b;
>>         b->prev = ap;
>>         if (ap != NULL)
>>                 ap->next = b;
>>                                     // illustrative code
>>         a->next = bn;               //   ST     Rbp,[Ra,#next]
>>         if (bp != NULL)             //   PNE0   Rbp,T
>>                 bp->next = a;       //   ST     Ra,[Rbp,#next]
>>
>>         esmLOCKstore(a->prev, bp);
>> }
>>
>> But now the conditional testing whether or not `an` is `NULL` is
>> repeated.  Is the additional branch overhead worth it here?
>
>In My 66000 ISA, a compare against zero (or NULL) is just a branch
>instruction, so the CMP zero is performed twice, but each use is
>but a single Branch-on-condition instruction (or Predicate-on-
>Condition instruction).
>
>In the case of using predicates, FETCH/DECODE will simple issue
>both then and else clauses into the execution window (else-clause
>is empty here) and let the reservation stations handle execution
>order. And the condition latency is purely the register dependence
>chain. A 6-wide machine should have no trouble in inserting two
>copies of the code commented by "illustrative code" above--in
>this case 6-instructions or 2 sets of {ST, PNE0, ST}.
>
>In the case of using a real branch, latency per use is likely to
>be 2-cycles, moderated by typical branch prediction. The condition
>will have resolved early, so we are simply FETCH/DECODE/TAKE bound.
>
>{{That is: PRED should be faster in almost all conceivable cases.}}

Okay, that all makes sense.  Thanks for the detailed
explanation.  I agree it's very slick; is this documented
somewhere publicly?  If not, why not?

>Generally when I write queue-code, I use a dummy Node front/rear
>such that the checks for Null are unnecessary (at the cost of
>following 1 more ->next or ->prev). That is Q->head and Q->tail
>are never NULL and when the queue is empty there is a Node which
>carries the fact the queue is empty (not using NULL as a pointer).
>
>But that is just my style.

Ok.

	- Dan C.