Deutsch English Français Italiano |
<v7grg1$3kf77$1@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!news.nobody.at!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Thomas Koenig <tkoenig@netcologne.de> Newsgroups: comp.arch Subject: Re: Faster div or 1/sqrt approximations Date: Sat, 20 Jul 2024 17:17:53 -0000 (UTC) Organization: A noiseless patient Spider Lines: 38 Message-ID: <v7grg1$3kf77$1@dont-email.me> References: <v6tbki$3g9rg$1@dont-email.me> <47689j5gbdg2runh3t7oq2thodmfkalno6@4ax.com> <v71vqu$gomv$9@dont-email.me> <116d9j5651mtjmq4bkjaheuf0pgpu6p0m8@4ax.com> <f8c6c5b5863ecfc1ad45bb415f0d2b49@www.novabbs.org> <7u7e9j5dthm94vb2vdsugngjf1cafhu2i4@4ax.com> <0f7b4deb1761f4c485d1dc3b21eb7cb3@www.novabbs.org> <v78soj$1tn73$1@dont-email.me> <v7dsf2$3139m$1@dont-email.me> <277c774f1eb48be79cd148dfc25c4367@www.novabbs.org> <v7ei4f$34uc2$1@dont-email.me> <v7gqgr$3kclj$1@dont-email.me> Injection-Date: Sat, 20 Jul 2024 19:17:53 +0200 (CEST) Injection-Info: dont-email.me; posting-host="ccdff6c1e7e8e7cd4872288a041e1c0d"; logging-data="3816679"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+7O3k2gRqwiNdafjrsWe00k8nzcTSt1nY=" User-Agent: slrn/1.0.3 (Linux) Cancel-Lock: sha1:21gDA/YQSdPKSEw+7LGbmVcxGbw= Bytes: 3166 Terje Mathisen <terje.mathisen@tmsw.no> schrieb: > Thomas Koenig wrote: >> MitchAlsup1 <mitchalsup@aol.com> schrieb: >> >>> I, personally, have found many Newton-Raphson iterators that converge >>> faster using 1/SQRT(x) than using the SQRT(x) equivalent. >> >> I can well believe that. >> >> It is interesting to see what different architectures offer for >> faster reciprocals. >> >> POWER has fre and fres (double and single version) for approximate >> divisin, which are accurate to 1/256. These operations are quite >> fast, 4 to 7 cycles on POWER9, with up to 4 instructions per cycle >> so obviously fully pipelined. With 1/256 accuracy, this could >> actually be the original Quake algorithm (or its modification) >> with a single Newton step, but this is of course much better in >> hardware where exponent handling can be much simplified (and >> done only once). > > I've taken both a second and third look at InvSqrt() over the last few > years, it turns out that the Quake version is far from optimal: With the > exact same instructions, just a different set of constants, you can get > about 1.5 bits more than quake does, i.e. about 10 bits after that > single NR step (which isn't really NR since it modifies both the 1.5 and > the 0.5 factors). Wikipedia has something on that, also literature; Moroz et al. https://arxiv.org/abs/1603.04483 seem to give the optimum quasi-NR method to do this, with one or two steps. I've also looked a little bit at simplifying exp(-1/x) directly, and it seems an unpleasent function to implement; at least there is no argument reduction which comes to mind. With exp2(x), you can split off the integer part and do an appoximation on the rest over a finite interval, but if there is a fast approximation of the integer part of 1/x without dividing, I am not aware of one :-)