Article <v7grg1$3kf77$1@dont-email.me>

Deutsch English Français Italiano
<v7grg1$3kf77$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!news.nobody.at!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Thomas Koenig <tkoenig@netcologne.de>
Newsgroups: comp.arch
Subject: Re: Faster div or 1/sqrt approximations
Date: Sat, 20 Jul 2024 17:17:53 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 38
Message-ID: <v7grg1$3kf77$1@dont-email.me>
References: <v6tbki$3g9rg$1@dont-email.me>
 <47689j5gbdg2runh3t7oq2thodmfkalno6@4ax.com> <v71vqu$gomv$9@dont-email.me>
 <116d9j5651mtjmq4bkjaheuf0pgpu6p0m8@4ax.com>
 <f8c6c5b5863ecfc1ad45bb415f0d2b49@www.novabbs.org>
 <7u7e9j5dthm94vb2vdsugngjf1cafhu2i4@4ax.com>
 <0f7b4deb1761f4c485d1dc3b21eb7cb3@www.novabbs.org>
 <v78soj$1tn73$1@dont-email.me> <v7dsf2$3139m$1@dont-email.me>
 <277c774f1eb48be79cd148dfc25c4367@www.novabbs.org>
 <v7ei4f$34uc2$1@dont-email.me> <v7gqgr$3kclj$1@dont-email.me>
Injection-Date: Sat, 20 Jul 2024 19:17:53 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="ccdff6c1e7e8e7cd4872288a041e1c0d";
	logging-data="3816679"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1+7O3k2gRqwiNdafjrsWe00k8nzcTSt1nY="
User-Agent: slrn/1.0.3 (Linux)
Cancel-Lock: sha1:21gDA/YQSdPKSEw+7LGbmVcxGbw=
Bytes: 3166

Terje Mathisen <terje.mathisen@tmsw.no> schrieb:
> Thomas Koenig wrote:
>> MitchAlsup1 <mitchalsup@aol.com> schrieb:
>> 
>>> I, personally, have found many Newton-Raphson iterators that converge
>>> faster using 1/SQRT(x) than using the SQRT(x) equivalent.
>> 
>> I can well believe that.
>> 
>> It is interesting to see what different architectures offer for
>> faster reciprocals.
>> 
>> POWER has fre and fres (double and single version) for approximate
>> divisin, which are accurate to 1/256.  These operations are quite
>> fast, 4 to 7 cycles on POWER9, with up to 4 instructions per cycle
>> so obviously fully pipelined.  With 1/256 accuracy, this could
>> actually be the original Quake algorithm (or its modification)
>> with a single Newton step, but this is of course much better in
>> hardware where exponent handling can be much simplified (and
>> done only once).
>
> I've taken both a second and third look at InvSqrt() over the last few 
> years, it turns out that the Quake version is far from optimal: With the 
> exact same instructions, just a different set of constants, you can get 
> about 1.5 bits more than quake does, i.e. about 10 bits after that 
> single NR step (which isn't really NR since it modifies both the 1.5 and 
> the 0.5 factors).

Wikipedia has something on that, also literature; Moroz et al.
https://arxiv.org/abs/1603.04483 seem to give the optimum quasi-NR
method to do this, with one or two steps.

I've also looked a little bit at simplifying exp(-1/x) directly, and
it seems an unpleasent function to implement; at least there is no
argument reduction which comes to mind.  With exp2(x), you can split
off the integer part and do an appoximation on the rest over a finite
interval, but if there is a fast approximation of the integer part
of 1/x without dividing, I am not aware of one :-)