Deutsch English Français Italiano |
<2024Aug2.101421@mips.complang.tuwien.ac.at> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: anton@mips.complang.tuwien.ac.at (Anton Ertl) Newsgroups: comp.arch Subject: Re: Memory ordering Date: Fri, 02 Aug 2024 08:14:21 GMT Organization: Institut fuer Computersprachen, Technische Universitaet Wien Lines: 76 Message-ID: <2024Aug2.101421@mips.complang.tuwien.ac.at> References: <b5d4a172469485e9799de44f5f120c73@www.novabbs.org> <v7uc71$2ec3f$1@dont-email.me> <2024Jul26.190007@mips.complang.tuwien.ac.at> <2032da2f7a4c7c8c50d28cacfa26c9c7@www.novabbs.org> <2024Jul29.152110@mips.complang.tuwien.ac.at> <f8869fa1aadb85896d237179d46b20f8@www.novabbs.org> <2024Jul30.115146@mips.complang.tuwien.ac.at> <249b2217b1dc1c8911eb45c5735d4aa9@www.novabbs.org> <2024Aug1.175455@mips.complang.tuwien.ac.at> <18ab7d4f4324a28ba0ab8bdb767a4261@www.novabbs.org> Injection-Date: Fri, 02 Aug 2024 10:44:46 +0200 (CEST) Injection-Info: dont-email.me; posting-host="592efadf6d1535704eb2c99719bee763"; logging-data="2884961"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1//ykpZ4S9BqxTU5I5F0Oa+" Cancel-Lock: sha1:m6J+7hq1zPbDNyUjNMz2xYDIeYk= X-newsreader: xrn 10.11 Bytes: 5005 mitchalsup@aol.com (MitchAlsup1) writes: >On Thu, 1 Aug 2024 15:54:55 +0000, Anton Ertl wrote: > >> mitchalsup@aol.com (MitchAlsup1) writes: >>>On Tue, 30 Jul 2024 9:51:46 +0000, Anton Ertl wrote: >>> >>>> mitchalsup@aol.com (MitchAlsup1) writes: >> >>>An MEMBAR requires the memory order to catch up to the current point >>>before adding new AGENs to the problem space. If the memory order >>>is already SC then MEMBAR has nothing to do and is pushed through >>>the pipeline without delay. >> >> Yes, that's the slow implementation. The fast implementation is to >> implement sequential consistency all the time (by predicting and >> speculating that memory accesses do not interfer with those of other >> cores, and recovering from that speculation when the speculation turns >> out to be wrong). In such an implementation memory barriers are noops >> (and thus fast), because the hardware already provides sequential >> consistency. > >Why does SC need any MEMBARs ?? A program written for sequential consistency does not need them. But if you have a program written for a weaker memory model, the memory barriers in that program will be noops and therefore really cheap. >>>Then consider 2 Vector processors performing 2 STs (1 each) to >>>non-overlapping addresses but with bank aliasing. Consider that >>>the STs are scatter based and the back conflicts random. There >>>is no way to determine which store happened first or which >>>element of each vector store happened first. >> >> It's up to the architecture to define the order of stores and loads of >> a given core. For sequential consistency you then interleave the >> sequences coming from the cores in some convenient order. > >Insufficient:: If OoO processor orders LDs and STs as they leave AGEN >you cannot just interleave multiple core access streams and achieve >sequential consistency. Architecture is defined in the architecture manual. Implementation concepts like OoO and AGEN don't (or shouldn't) play a role there. WRT memory ordering most architectures define clearly what happens (for single-threaded programs), i.e., loads and stores happen exactly in the architectural execution order of the instructions, and they actually implement that, for single threaded programs. Then they take back some of these guarantees for multi-processing, and add some instructions (memory barriers, lock prefixes, etc.) to reestablish these guarantees when needed, in an expensive way. Sequential consistency is what you get if you do not take back these guarantees. Concerning vector instructions, what do architectures say about the memory order here? An ideal would be if they were treated as atomic, i.e., a read access is all performed after any earlier and before any later memory access in the stream of executed instructions. But even without multi-processing this tends to be inefficient, and has problems with page faults and the number of necessary pages in memory at the same time, especially with gather/scatter accesses and very long vector memory-memory instructions as on the NEC SX (IIRC). But of course, the NEC SX is a supercomputer architecture, a certain amount of architectural nonsense is not unusual there. Given such difficulties, vector instructions, at least with gather loads and scatter stores (whether strided or indirect), are not a good idea (and a recent Intel hardware vulnerability shows another reason why gather is not a good idea). Your VVM OTOH allows a clean architectural definition. - anton -- 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.' Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>