Path: ...!weretis.net!feeder9.news.weretis.net!news.nk.ca!rocksolid2!i2pn2.org!.POSTED!not-for-mail From: mitchalsup@aol.com (MitchAlsup1) Newsgroups: comp.arch Subject: Re: Arguments for a sane ISA 6-years later Date: Fri, 26 Jul 2024 20:59:06 +0000 Organization: Rocksolid Light Message-ID: <2032da2f7a4c7c8c50d28cacfa26c9c7@www.novabbs.org> References: <2024Jul26.190007@mips.complang.tuwien.ac.at> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Info: i2pn2.org; logging-data="465253"; mail-complaints-to="usenet@i2pn2.org"; posting-account="65wTazMNTleAJDh/pRqmKE7ADni/0wesT78+pyiDW8A"; User-Agent: Rocksolid Light X-Spam-Checker-Version: SpamAssassin 4.0.0 X-Rslight-Posting-User: ac58ceb75ea22753186dae54d967fed894c3dce8 X-Rslight-Site: $2y$10$WRNO.g1ltB69JGQF8fZTveYtwp1pZcCSD7OPg1QyYxoutAELdUPy. Bytes: 3707 Lines: 63 On Fri, 26 Jul 2024 17:00:07 +0000, Anton Ertl wrote: > "Chris M. Thomasson" writes: >>On 7/25/2024 1:09 PM, BGB wrote: >>> At least with a weak model, software knows that if it doesn't go through >>> the rituals, the memory will be stale. > > There is no guarantee of staleness, only a lack of stronger ordering > guarantees. > >>The weak model is ideal for me. I know how to program for it > > And the fact that this model is so hard to use that few others know > how to program for it make it ideal for you. > >>and it's more efficient > > That depends on the hardware. > > Yes, the Alpha 21164 with its imprecise exceptions was "more > efficient" than other hardware for a while, then the Pentium Pro came > along and gave us precise exceptions and more efficiency. And > eventually the Alpha people learned the trick, too, and 21264 provided > precise exceptions (although they did not admit this) and more > efficieny. > > Similarly, I expect that hardware that is designed for good TSO or > sequential consistency performance will run faster on code written for > this model than code written for weakly consistent hardware will run > on that hardware. According to Lamport; only the ATOMIC stuff needs sequential consistency. So, it is completely possible to have a causally consistent processor that switches to sequential consistency when doing ATOMIC stuff and gain performance when not doing ATOMIC stuff, and gain programmability when doing atomic stuff. > That's because software written for weakly > consistent hardware often has to insert barriers or atomic operations > just in case, and these operations are slow on hardware optimized for > weak consistency. The operations themselves are not slow. What is slow is delaying the pipeline until it catches up to the stronger memory model before proceeding. > > By contrast, one can design hardware for strong ordering such that the > slowness occurs only in those cases when actual (not potential) > communication between the cores happens, i.e., much less frequently. How would you do this for a 256-way banked memory system of the NEC SX ?? I.E., the processor is not in charge of memory order-- the memory system is. > >>and sometimes use cases do not care if they encounter "stale" data. > > Great. Unless these "sometimes" cases are more often than the cases > where you perform some atomic operation or barrier because of > potential, but not actual communication between cores, the weak model > is still slower than a well-implemented strong model. > > - anton