Article <v7gqgr$3kclj$1@dont-email.me>

Deutsch English Français Italiano
<v7gqgr$3kclj$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Terje Mathisen <terje.mathisen@tmsw.no>
Newsgroups: comp.arch
Subject: Re: Faster div or 1/sqrt approximations
Date: Sat, 20 Jul 2024 19:01:15 +0200
Organization: A noiseless patient Spider
Lines: 32
Message-ID: <v7gqgr$3kclj$1@dont-email.me>
References: <v6tbki$3g9rg$1@dont-email.me>
 <47689j5gbdg2runh3t7oq2thodmfkalno6@4ax.com> <v71vqu$gomv$9@dont-email.me>
 <116d9j5651mtjmq4bkjaheuf0pgpu6p0m8@4ax.com>
 <f8c6c5b5863ecfc1ad45bb415f0d2b49@www.novabbs.org>
 <7u7e9j5dthm94vb2vdsugngjf1cafhu2i4@4ax.com>
 <0f7b4deb1761f4c485d1dc3b21eb7cb3@www.novabbs.org>
 <v78soj$1tn73$1@dont-email.me> <v7dsf2$3139m$1@dont-email.me>
 <277c774f1eb48be79cd148dfc25c4367@www.novabbs.org>
 <v7ei4f$34uc2$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 20 Jul 2024 19:01:16 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="123f71f9bf7bcfce6a6a1a7b0ef5375a";
	logging-data="3814067"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX19cVXY0weNKk9Oy1hsLlt5EFbOq4Ua2f58maY40eVBhDA=="
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
 Firefox/91.0 SeaMonkey/2.53.18.2
Cancel-Lock: sha1:CHIVAiE/ujamunGXEKx7UPRHdbg=
In-Reply-To: <v7ei4f$34uc2$1@dont-email.me>
Bytes: 2815

Thomas Koenig wrote:
> MitchAlsup1 <mitchalsup@aol.com> schrieb:
> 
>> I, personally, have found many Newton-Raphson iterators that converge
>> faster using 1/SQRT(x) than using the SQRT(x) equivalent.
> 
> I can well believe that.
> 
> It is interesting to see what different architectures offer for
> faster reciprocals.
> 
> POWER has fre and fres (double and single version) for approximate
> divisin, which are accurate to 1/256.  These operations are quite
> fast, 4 to 7 cycles on POWER9, with up to 4 instructions per cycle
> so obviously fully pipelined.  With 1/256 accuracy, this could
> actually be the original Quake algorithm (or its modification)
> with a single Newton step, but this is of course much better in
> hardware where exponent handling can be much simplified (and
> done only once).

I've taken both a second and third look at InvSqrt() over the last few 
years, it turns out that the Quake version is far from optimal: With the 
exact same instructions, just a different set of constants, you can get 
about 1.5 bits more than quake does, i.e. about 10 bits after that 
single NR step (which isn't really NR since it modifies both the 1.5 and 
the 0.5 factors).

Terje

-- 
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"