Article <2024Aug2.101421@mips.complang.tuwien.ac.at>

Warning: mysqli::__construct(): (HY000/1203): User howardkn already has more than 'max_user_connections' active connections in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\includes\artfuncs.php on line 21
Failed to connect to MySQL: (1203) User howardkn already has more than 'max_user_connections' active connections
Warning: mysqli::query(): Couldn't fetch mysqli in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\index.php on line 66
Article <2024Aug2.101421@mips.complang.tuwien.ac.at>

Deutsch English Français Italiano

<2024Aug2.101421@mips.complang.tuwien.ac.at>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Memory ordering
Date: Fri, 02 Aug 2024 08:14:21 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 76
Message-ID: <2024Aug2.101421@mips.complang.tuwien.ac.at>
References: <b5d4a172469485e9799de44f5f120c73@www.novabbs.org> <v7uc71$2ec3f$1@dont-email.me> <2024Jul26.190007@mips.complang.tuwien.ac.at> <2032da2f7a4c7c8c50d28cacfa26c9c7@www.novabbs.org> <2024Jul29.152110@mips.complang.tuwien.ac.at> <f8869fa1aadb85896d237179d46b20f8@www.novabbs.org> <2024Jul30.115146@mips.complang.tuwien.ac.at> <249b2217b1dc1c8911eb45c5735d4aa9@www.novabbs.org> <2024Aug1.175455@mips.complang.tuwien.ac.at> <18ab7d4f4324a28ba0ab8bdb767a4261@www.novabbs.org>
Injection-Date: Fri, 02 Aug 2024 10:44:46 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="592efadf6d1535704eb2c99719bee763";
	logging-data="2884961"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1//ykpZ4S9BqxTU5I5F0Oa+"
Cancel-Lock: sha1:m6J+7hq1zPbDNyUjNMz2xYDIeYk=
X-newsreader: xrn 10.11
Bytes: 5005

mitchalsup@aol.com (MitchAlsup1) writes:
>On Thu, 1 Aug 2024 15:54:55 +0000, Anton Ertl wrote:
>
>> mitchalsup@aol.com (MitchAlsup1) writes:
>>>On Tue, 30 Jul 2024 9:51:46 +0000, Anton Ertl wrote:
>>>
>>>> mitchalsup@aol.com (MitchAlsup1) writes:
>>
>>>An MEMBAR requires the memory order to catch up to the current point
>>>before adding new AGENs to the problem space. If the memory order
>>>is already SC then MEMBAR has nothing to do and is pushed through
>>>the pipeline without delay.
>>
>> Yes, that's the slow implementation.  The fast implementation is to
>> implement sequential consistency all the time (by predicting and
>> speculating that memory accesses do not interfer with those of other
>> cores, and recovering from that speculation when the speculation turns
>> out to be wrong).  In such an implementation memory barriers are noops
>> (and thus fast), because the hardware already provides sequential
>> consistency.
>
>Why does SC need any MEMBARs ??

A program written for sequential consistency does not need them.  But
if you have a program written for a weaker memory model, the memory
barriers in that program will be noops and therefore really cheap.

>>>Then consider 2 Vector processors performing 2 STs (1 each) to
>>>non-overlapping addresses but with bank aliasing. Consider that
>>>the STs are scatter based and the back conflicts random. There
>>>is no way to determine which store happened first or which
>>>element of each vector store happened first.
>>
>> It's up to the architecture to define the order of stores and loads of
>> a given core.  For sequential consistency you then interleave the
>> sequences coming from the cores in some convenient order.
>
>Insufficient:: If OoO processor orders LDs and STs as they leave AGEN
>you cannot just interleave multiple core access streams and achieve
>sequential consistency.

Architecture is defined in the architecture manual.  Implementation
concepts like OoO and AGEN don't (or shouldn't) play a role there.
WRT memory ordering most architectures define clearly what happens
(for single-threaded programs), i.e., loads and stores happen exactly
in the architectural execution order of the instructions, and they
actually implement that, for single threaded programs.

Then they take back some of these guarantees for multi-processing, and
add some instructions (memory barriers, lock prefixes, etc.) to
reestablish these guarantees when needed, in an expensive way.

Sequential consistency is what you get if you do not take back these
guarantees.

Concerning vector instructions, what do architectures say about the
memory order here?  An ideal would be if they were treated as atomic,
i.e., a read access is all performed after any earlier and before any
later memory access in the stream of executed instructions.  But even
without multi-processing this tends to be inefficient, and has
problems with page faults and the number of necessary pages in memory
at the same time, especially with gather/scatter accesses and very
long vector memory-memory instructions as on the NEC SX (IIRC).  But
of course, the NEC SX is a supercomputer architecture, a certain
amount of architectural nonsense is not unusual there.

Given such difficulties, vector instructions, at least with gather
loads and scatter stores (whether strided or indirect), are not a good
idea (and a recent Intel hardware vulnerability shows another reason
why gather is not a good idea).  Your VVM OTOH allows a clean
architectural definition.

- anton
-- 
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
  Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>