Path: ...!weretis.net!feeder9.news.weretis.net!i2pn.org!i2pn2.org!.POSTED!not-for-mail From: mitchalsup@aol.com (MitchAlsup1) Newsgroups: comp.arch Subject: Re: Arguments for a sane ISA 6-years later Date: Mon, 29 Jul 2024 17:38:23 +0000 Organization: Rocksolid Light Message-ID: <579ed190735c42fbd995f1b0b403e123@www.novabbs.org> References: <2024Jul26.190007@mips.complang.tuwien.ac.at> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Info: i2pn2.org; logging-data="786332"; mail-complaints-to="usenet@i2pn2.org"; posting-account="65wTazMNTleAJDh/pRqmKE7ADni/0wesT78+pyiDW8A"; User-Agent: Rocksolid Light X-Spam-Checker-Version: SpamAssassin 4.0.0 X-Rslight-Posting-User: ac58ceb75ea22753186dae54d967fed894c3dce8 X-Rslight-Site: $2y$10$fVG.yCY77d8itzuSrvrKCeSQyjA3yXiDZdfvtJ1K5j34C8BLvxk3e Bytes: 4197 Lines: 70 On Mon, 29 Jul 2024 3:32:52 +0000, Chris M. Thomasson wrote: > On 7/26/2024 10:00 AM, Anton Ertl wrote: >> "Chris M. Thomasson" writes: >>> On 7/25/2024 1:09 PM, BGB wrote: >>>> At least with a weak model, software knows that if it doesn't go through >>>> the rituals, the memory will be stale. >> >> There is no guarantee of staleness, only a lack of stronger ordering >> guarantees. >> >>> The weak model is ideal for me. I know how to program for it >> >> And the fact that this model is so hard to use that few others know >> how to program for it make it ideal for you. >> >>> and it's more efficient >> >> That depends on the hardware. >> >> Yes, the Alpha 21164 with its imprecise exceptions was "more >> efficient" than other hardware for a while, then the Pentium Pro came >> along and gave us precise exceptions and more efficiency. And >> eventually the Alpha people learned the trick, too, and 21264 provided >> precise exceptions (although they did not admit this) and more >> efficieny. >> >> Similarly, I expect that hardware that is designed for good TSO or >> sequential consistency performance will run faster on code written for >> this model than code written for weakly consistent hardware will run >> on that hardware. That's because software written for weakly >> consistent hardware often has to insert barriers or atomic operations >> just in case, and these operations are slow on hardware optimized for >> weak consistency. >> >> By contrast, one can design hardware for strong ordering such that the >> slowness occurs only in those cases when actual (not potential) >> communication between the cores happens, i.e., much less frequently. >> >>> and sometimes use cases do not care if they encounter "stale" data. >> >> Great. Unless these "sometimes" cases are more often than the cases >> where you perform some atomic operation or barrier because of >> potential, but not actual communication between cores, the weak model >> is still slower than a well-implemented strong model. > > A strong model? You mean I don't have to use any memory barriers at all? > Tell that to SPARC in RMO mode... How strong? Even the x86 requires a > membar when a store followed by a load to another location shall be > respected wrt order. Store-Load. #StoreLoad over on SPARC. ;^) DRAM does not need this property, MMI/O does. > If you can force everything to be #StoreLoad (*) and make it faster than > a handcrafted algo on a very weak memory system, well, hats off! I > thought it was easier for a HW guy to implement weak consistency? At the > cost of the increased complexity wrt programming the sucker! ;^) Or HW can have different order strengths based on where the PTE sends the request. DRAM gets causal order, ATOMICs to DRAM get sequential consistency, MMI/O gets sequential consistency, Configuration gets strong ordering. Programmer has to do nothing. > > (*) Not just #StoreLoad for full consistency, you would need : > > MEMBAR #StoreLoad | #LoadStore | #StoreStore | #LoadLoad > > right?