Article <v7ei4f$34uc2$1@dont-email.me>

Deutsch English Français Italiano
<v7ei4f$34uc2$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!3.eu.feeder.erje.net!feeder.erje.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Thomas Koenig <tkoenig@netcologne.de>
Newsgroups: comp.arch
Subject: Faster div or 1/sqrt approximations (was: Continuations)
Date: Fri, 19 Jul 2024 20:25:51 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 31
Message-ID: <v7ei4f$34uc2$1@dont-email.me>
References: <v6tbki$3g9rg$1@dont-email.me>
 <47689j5gbdg2runh3t7oq2thodmfkalno6@4ax.com> <v71vqu$gomv$9@dont-email.me>
 <116d9j5651mtjmq4bkjaheuf0pgpu6p0m8@4ax.com>
 <f8c6c5b5863ecfc1ad45bb415f0d2b49@www.novabbs.org>
 <7u7e9j5dthm94vb2vdsugngjf1cafhu2i4@4ax.com>
 <0f7b4deb1761f4c485d1dc3b21eb7cb3@www.novabbs.org>
 <v78soj$1tn73$1@dont-email.me> <v7dsf2$3139m$1@dont-email.me>
 <277c774f1eb48be79cd148dfc25c4367@www.novabbs.org>
Injection-Date: Fri, 19 Jul 2024 22:25:51 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="6588ad2f0afe4174f58c0e61d8aff649";
	logging-data="3307906"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1/5C3ItzYBdRSQeXrmA0gxfkY6uIJtK8xk="
User-Agent: slrn/1.0.3 (Linux)
Cancel-Lock: sha1:q7/9O4m8dB6U866HEHWHlHFX2n4=
Bytes: 2663

MitchAlsup1 <mitchalsup@aol.com> schrieb:

> I, personally, have found many Newton-Raphson iterators that converge
> faster using 1/SQRT(x) than using the SQRT(x) equivalent.

I can well believe that.

It is interesting to see what different architectures offer for
faster reciprocals.

POWER has fre and fres (double and single version) for approximate
divisin, which are accurate to 1/256.  These operations are quite
fast, 4 to 7 cycles on POWER9, with up to 4 instructions per cycle
so obviously fully pipelined.  With 1/256 accuracy, this could
actually be the original Quake algorithm (or its modification)
with a single Newton step, but this is of course much better in
hardware where exponent handling can be much simplified (and
done only once).

x86_64 has rcpss, accurate to 1/6144, with (looking at the
instruction tables) 6 for newer architectures, with a throuhtput
of 1/4.  So, if your business depends on calculating many inaccurate
square roots, fast, buy a POWER :-)

Other architectures I have tried don't seem to have it.

Does it make sense? Well, if you want to calculate lots of Arrhenius
equations, you don't need full accuracy and (like in Mitch's case)
exp has become as fast as division, then it could actually make a
lot of sense.  It is still possible to add Newton steps afterwards,
which is what gcc does if you add -mrecip -ffast-math.