Path: ...!2.eu.feeder.erje.net!feeder.erje.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: anton@mips.complang.tuwien.ac.at (Anton Ertl) Newsgroups: comp.arch Subject: Re: Faster div or 1/sqrt approximations Date: Sun, 21 Jul 2024 13:11:56 GMT Organization: Institut fuer Computersprachen, Technische Universitaet Wien Lines: 34 Message-ID: <2024Jul21.151156@mips.complang.tuwien.ac.at> References: <116d9j5651mtjmq4bkjaheuf0pgpu6p0m8@4ax.com> <7u7e9j5dthm94vb2vdsugngjf1cafhu2i4@4ax.com> <0f7b4deb1761f4c485d1dc3b21eb7cb3@www.novabbs.org> <277c774f1eb48be79cd148dfc25c4367@www.novabbs.org> Injection-Date: Sun, 21 Jul 2024 15:17:50 +0200 (CEST) Injection-Info: dont-email.me; posting-host="8cf45a02fd465630b2f75371802cc5d5"; logging-data="124337"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/8JP1g9MvNlaMnFw4kIE20" Cancel-Lock: sha1:fuM1ngACs2DW8E09NDfOyGjen8M= X-newsreader: xrn 10.11 Bytes: 2754 Thomas Koenig writes: >Terje Mathisen schrieb: >> However, if you are willing to take that first NR iteration >> >> halfnumber = 0.5f*x >> ... >> i = R-(i>>1); >> ... >> x = x*(1.5f-halfnumber*x*x); >> >> and then make both the 0.5f and 1.5f constants free variables, you can >> in fact get 1.5 more bits than what they show in this paper. .... >Looks like https://web.archive.org/web/20180709021629/http://rrrola.wz.cz/inv_sqrt.html >who reports 6.50196699E−4 as the maximum error (also from the >Wikipedia article). > >That's 10.5 bits of accuracy, not bad at all. > >However... assume you want to do another NR step. In that case, >you might be better off not loading different constants from memory, >so having the same constants might actually be an advantage >(whch does not mean that they have to be the original Newton steps). The number of accurate digits doubles after each NR step, so starting with the better first "NR" iteration would result in an additional accuracy of 3 bits. And if you optimize the new constants for the second iteration, you may even get more. Or you could optimize for two iterations using the same constants ... - anton -- 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.' Mitch Alsup,