Deutsch English Français Italiano |
<v1to2h$3km86$1@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Thomas Koenig <tkoenig@netcologne.de> Newsgroups: comp.arch Subject: Re: Making Lemonade (Floating-point format changes) Date: Mon, 13 May 2024 19:01:37 -0000 (UTC) Organization: A noiseless patient Spider Lines: 38 Message-ID: <v1to2h$3km86$1@dont-email.me> References: <abe04jhkngt2uun1e7ict8vmf1fq8p7rnm@4ax.com> <memo.20240512203459.16164W@jgd.cix.co.uk> <v1rab7$2vt3u$1@dont-email.me> <20240513151647.0000403f@yahoo.com> Injection-Date: Mon, 13 May 2024 21:01:37 +0200 (CEST) Injection-Info: dont-email.me; posting-host="cfa5a9e9f5c865b62858236a1a3c17e0"; logging-data="3823878"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18H/8gc7ZEMHsU6w/22wipD9Ho3XtDjO4A=" User-Agent: slrn/1.0.3 (Linux) Cancel-Lock: sha1:XsY51qp663XC8Klsl742lRUi0bA= Bytes: 2551 Michael S <already5chosen@yahoo.com> schrieb: > On Sun, 12 May 2024 20:55:03 -0000 (UTC) > Thomas Koenig <tkoenig@netcologne.de> wrote: > >> John Dallman <jgd@cix.co.uk> schrieb: >> > In article <abe04jhkngt2uun1e7ict8vmf1fq8p7rnm@4ax.com>, >> > quadibloc@servername.invalid (John Savard) wrote: >> > >> >> I'm not really sure such floating-pont precision is useful, but I >> >> do remember some people telling me that higher float precision is >> >> indeed something to be desired. >> >> > I would be in favour of 128-bit being available. >> >> Me, too. Solving tricky linear systems, or obtaining derivatives >> numerically (for example for Jacobians) eats up a _lot_ of precision >> bits, and double precision can sometimes run into trouble. >> >> At least gcc and gfortran now support POWER's native 128-bit format >> in hardware. On other systems, software emulation is used, which >> is of course much slower. >> > > Much slower? > I think, at least for matrix multiplication, my emulation on modern x86 > was within factor of 1.5x from your measurements on POWER9. I don't remember the exact timing, and it might be interesting to revisit that (also considering that the gfortran code for matmul is not optimized for 128-bit float and might have blown cache sizes, plus it would be fair to compare compiler vs. compiler and assembler vs. assembler). I just looked it up - on POWER9, xsaddqp has 12 cycles of latency, with one result per cycle, POWER10 has 12 to 13 cycles with two results per cycle. What can your code get on x86_64?