Deutsch English Français Italiano |
<v7gqgr$3kclj$1@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Terje Mathisen <terje.mathisen@tmsw.no> Newsgroups: comp.arch Subject: Re: Faster div or 1/sqrt approximations Date: Sat, 20 Jul 2024 19:01:15 +0200 Organization: A noiseless patient Spider Lines: 32 Message-ID: <v7gqgr$3kclj$1@dont-email.me> References: <v6tbki$3g9rg$1@dont-email.me> <47689j5gbdg2runh3t7oq2thodmfkalno6@4ax.com> <v71vqu$gomv$9@dont-email.me> <116d9j5651mtjmq4bkjaheuf0pgpu6p0m8@4ax.com> <f8c6c5b5863ecfc1ad45bb415f0d2b49@www.novabbs.org> <7u7e9j5dthm94vb2vdsugngjf1cafhu2i4@4ax.com> <0f7b4deb1761f4c485d1dc3b21eb7cb3@www.novabbs.org> <v78soj$1tn73$1@dont-email.me> <v7dsf2$3139m$1@dont-email.me> <277c774f1eb48be79cd148dfc25c4367@www.novabbs.org> <v7ei4f$34uc2$1@dont-email.me> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Date: Sat, 20 Jul 2024 19:01:16 +0200 (CEST) Injection-Info: dont-email.me; posting-host="123f71f9bf7bcfce6a6a1a7b0ef5375a"; logging-data="3814067"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19cVXY0weNKk9Oy1hsLlt5EFbOq4Ua2f58maY40eVBhDA==" User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Firefox/91.0 SeaMonkey/2.53.18.2 Cancel-Lock: sha1:CHIVAiE/ujamunGXEKx7UPRHdbg= In-Reply-To: <v7ei4f$34uc2$1@dont-email.me> Bytes: 2815 Thomas Koenig wrote: > MitchAlsup1 <mitchalsup@aol.com> schrieb: > >> I, personally, have found many Newton-Raphson iterators that converge >> faster using 1/SQRT(x) than using the SQRT(x) equivalent. > > I can well believe that. > > It is interesting to see what different architectures offer for > faster reciprocals. > > POWER has fre and fres (double and single version) for approximate > divisin, which are accurate to 1/256. These operations are quite > fast, 4 to 7 cycles on POWER9, with up to 4 instructions per cycle > so obviously fully pipelined. With 1/256 accuracy, this could > actually be the original Quake algorithm (or its modification) > with a single Newton step, but this is of course much better in > hardware where exponent handling can be much simplified (and > done only once). I've taken both a second and third look at InvSqrt() over the last few years, it turns out that the Quake version is far from optimal: With the exact same instructions, just a different set of constants, you can get about 1.5 bits more than quake does, i.e. about 10 bits after that single NR step (which isn't really NR since it modifies both the 1.5 and the 0.5 factors). Terje -- - <Terje.Mathisen at tmsw.no> "almost all programming can be viewed as an exercise in caching"