Article <2024Dec3.093252@mips.complang.tuwien.ac.at>

Deutsch English Français Italiano
<2024Dec3.093252@mips.complang.tuwien.ac.at>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Memory ordering
Date: Tue, 03 Dec 2024 08:32:52 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 50
Message-ID: <2024Dec3.093252@mips.complang.tuwien.ac.at>
References: <vfono1$14l9r$1@dont-email.me> <5yqdnU9eL_Y_GKv6nZ2dnZfqn_GdnZ2d@supernews.com> <2024Nov15.082512@mips.complang.tuwien.ac.at> <vh7rlr$3fu9i$1@dont-email.me> <2024Nov15.182737@mips.complang.tuwien.ac.at> <vh8c3f$3j6ql$2@dont-email.me> <2024Nov16.083744@mips.complang.tuwien.ac.at> <vhb587$6hbv$7@dont-email.me> <2024Nov17.161508@mips.complang.tuwien.ac.at> <vhduhg$sga5$1@dont-email.me> <2024Nov18.081104@mips.complang.tuwien.ac.at> <vhgi4p$1fms3$1@dont-email.me> <vhgitb$1fro3$1@dont-email.me>
Injection-Date: Tue, 03 Dec 2024 10:01:00 +0100 (CET)
Injection-Info: dont-email.me; posting-host="4f224cff8eacfd56eb8a0f66fcc6d139";
	logging-data="4135936"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1+E8sGK87SJHskl/58RWFjG"
Cancel-Lock: sha1:/XcYgV1i5aKBWA+hRg9doNyypw8=
X-newsreader: xrn 10.11
Bytes: 3906

"Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:
>On 11/18/2024 3:20 PM, Chris M. Thomasson wrote:
>> On 11/17/2024 11:11 PM, Anton Ertl wrote:
>>> The flaw in the reasoning of the paper was:
>>>
>>> |To solve it more easily without floating–point von Neumann had
>>> |transformed equation Bx = c to B^TBx = B^Tc , thus unnecessarily
>>> |doubling the number of sig. bits lost to ill-condition
>>>
>>> This is an example of how the supposed gains that the harder-to-use
>>> interface provides (in this case the bits "wasted" on the exponent)
>>> are overcompensated by then having to use a software workaround for
>>> the harder-to-use interface.
....
>Don't tell me you want all of std::memory_order_* to default to 
>std::memory_order_seq_cst? If your on a system that only has seq_cst and 
>nothing else, okay, but not on other weaker (memory order) systems, right?

I tell anyone who wants to read it to stop buying hardware without FP
for non-integer work, and with weak memory ordering for work that
needs concurrent programming.  There are enough affordable offerings
with FP and TSO that we do not need to waste programming time and
increase the frequency of hard-to-find bugs by figuring out how to get
good performance out of hardware without FP hardware and with weak
memory ordering.

Those who enjoy the challenge of dealing with the unnecessary problems
of sub-par hardware can continue to enjoy that.

But when developing production software, as a manager don't let
programmers with this hobby horse influence your hardware and
development decisions.  Give full support for FP and TSO hardware, and
limited support to weakly-ordered hardware.  That limited support may
consist of using software implementations of FP (instead of designing
software for fixed point arithmetic).  In case of hardware with weak
ordering the limited support could be to use memory barriers liberally
(without trying to minimize them at all; every memory barrier
elimination costs development time and increases the potential for
hard-to-find bugs), of using OS mechanisms for concurrency (rather
than, e.g., lock-free algorithms), or maybe even only supporting
single-threaded operation.

Efficiently-implemented sequentially-consistent hardware would be even
more preferable, and if it was widely available, I would recommend
buying that over TSO hardware, but unfortunately we are not there yet.

- anton
-- 
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
  Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>