Deutsch English Français Italiano |
<vio82b$ei97$2@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> Newsgroups: comp.arch Subject: Re: Memory ordering Date: Tue, 3 Dec 2024 16:34:19 -0800 Organization: A noiseless patient Spider Lines: 49 Message-ID: <vio82b$ei97$2@dont-email.me> References: <vfono1$14l9r$1@dont-email.me> <5yqdnU9eL_Y_GKv6nZ2dnZfqn_GdnZ2d@supernews.com> <2024Nov15.082512@mips.complang.tuwien.ac.at> <vh7rlr$3fu9i$1@dont-email.me> <2024Nov15.182737@mips.complang.tuwien.ac.at> <vh8c3f$3j6ql$2@dont-email.me> <2024Nov16.083744@mips.complang.tuwien.ac.at> <vhb587$6hbv$7@dont-email.me> <2024Nov17.161508@mips.complang.tuwien.ac.at> <vhduhg$sga5$1@dont-email.me> <2024Nov18.081104@mips.complang.tuwien.ac.at> <vhgi4p$1fms3$1@dont-email.me> <vhgitb$1fro3$1@dont-email.me> <2024Dec3.093252@mips.complang.tuwien.ac.at> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Date: Wed, 04 Dec 2024 01:34:20 +0100 (CET) Injection-Info: dont-email.me; posting-host="18f532fff7cb0dbc37a1aea16bf9bfbd"; logging-data="477479"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+CQEkYCLb27QoZy+gqjHYsK8fR9lcaxp4=" User-Agent: Mozilla Thunderbird Cancel-Lock: sha1:mdRW8WKOCbssKAfLOroYOxolnjA= Content-Language: en-US In-Reply-To: <2024Dec3.093252@mips.complang.tuwien.ac.at> Bytes: 4208 On 12/3/2024 12:32 AM, Anton Ertl wrote: > "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes: >> On 11/18/2024 3:20 PM, Chris M. Thomasson wrote: >>> On 11/17/2024 11:11 PM, Anton Ertl wrote: >>>> The flaw in the reasoning of the paper was: >>>> >>>> |To solve it more easily without floating–point von Neumann had >>>> |transformed equation Bx = c to B^TBx = B^Tc , thus unnecessarily >>>> |doubling the number of sig. bits lost to ill-condition >>>> >>>> This is an example of how the supposed gains that the harder-to-use >>>> interface provides (in this case the bits "wasted" on the exponent) >>>> are overcompensated by then having to use a software workaround for >>>> the harder-to-use interface. > ... >> Don't tell me you want all of std::memory_order_* to default to >> std::memory_order_seq_cst? If your on a system that only has seq_cst and >> nothing else, okay, but not on other weaker (memory order) systems, right? > > I tell anyone who wants to read it to stop buying hardware without FP > for non-integer work, and with weak memory ordering for work that > needs concurrent programming. There are enough affordable offerings > with FP and TSO that we do not need to waste programming time and > increase the frequency of hard-to-find bugs by figuring out how to get > good performance out of hardware without FP hardware and with weak > memory ordering. > > Those who enjoy the challenge of dealing with the unnecessary problems > of sub-par hardware can continue to enjoy that. > > But when developing production software, as a manager don't let > programmers with this hobby horse influence your hardware and > development decisions. Give full support for FP and TSO hardware, and > limited support to weakly-ordered hardware. That limited support may > consist of using software implementations of FP (instead of designing > software for fixed point arithmetic). In case of hardware with weak > ordering the limited support could be to use memory barriers liberally > (without trying to minimize them at all; every memory barrier > elimination costs development time and increases the potential for > hard-to-find bugs), of using OS mechanisms for concurrency (rather > than, e.g., lock-free algorithms), or maybe even only supporting > single-threaded operation. > > Efficiently-implemented sequentially-consistent hardware would be even > more preferable, and if it was widely available, I would recommend > buying that over TSO hardware, but unfortunately we are not there yet. For some reason, I boil that down to implementing lock/wait/obstruction-free algorithms is hard? Is that all you got?