| Deutsch English Français Italiano |
|
<v7ei4f$34uc2$1@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!3.eu.feeder.erje.net!feeder.erje.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Thomas Koenig <tkoenig@netcologne.de> Newsgroups: comp.arch Subject: Faster div or 1/sqrt approximations (was: Continuations) Date: Fri, 19 Jul 2024 20:25:51 -0000 (UTC) Organization: A noiseless patient Spider Lines: 31 Message-ID: <v7ei4f$34uc2$1@dont-email.me> References: <v6tbki$3g9rg$1@dont-email.me> <47689j5gbdg2runh3t7oq2thodmfkalno6@4ax.com> <v71vqu$gomv$9@dont-email.me> <116d9j5651mtjmq4bkjaheuf0pgpu6p0m8@4ax.com> <f8c6c5b5863ecfc1ad45bb415f0d2b49@www.novabbs.org> <7u7e9j5dthm94vb2vdsugngjf1cafhu2i4@4ax.com> <0f7b4deb1761f4c485d1dc3b21eb7cb3@www.novabbs.org> <v78soj$1tn73$1@dont-email.me> <v7dsf2$3139m$1@dont-email.me> <277c774f1eb48be79cd148dfc25c4367@www.novabbs.org> Injection-Date: Fri, 19 Jul 2024 22:25:51 +0200 (CEST) Injection-Info: dont-email.me; posting-host="6588ad2f0afe4174f58c0e61d8aff649"; logging-data="3307906"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/5C3ItzYBdRSQeXrmA0gxfkY6uIJtK8xk=" User-Agent: slrn/1.0.3 (Linux) Cancel-Lock: sha1:q7/9O4m8dB6U866HEHWHlHFX2n4= Bytes: 2663 MitchAlsup1 <mitchalsup@aol.com> schrieb: > I, personally, have found many Newton-Raphson iterators that converge > faster using 1/SQRT(x) than using the SQRT(x) equivalent. I can well believe that. It is interesting to see what different architectures offer for faster reciprocals. POWER has fre and fres (double and single version) for approximate divisin, which are accurate to 1/256. These operations are quite fast, 4 to 7 cycles on POWER9, with up to 4 instructions per cycle so obviously fully pipelined. With 1/256 accuracy, this could actually be the original Quake algorithm (or its modification) with a single Newton step, but this is of course much better in hardware where exponent handling can be much simplified (and done only once). x86_64 has rcpss, accurate to 1/6144, with (looking at the instruction tables) 6 for newer architectures, with a throuhtput of 1/4. So, if your business depends on calculating many inaccurate square roots, fast, buy a POWER :-) Other architectures I have tried don't seem to have it. Does it make sense? Well, if you want to calculate lots of Arrhenius equations, you don't need full accuracy and (like in Mitch's case) exp has become as fast as division, then it could actually make a lot of sense. It is still possible to add Newton steps afterwards, which is what gcc does if you add -mrecip -ffast-math.