Deutsch English Français Italiano |
<v2o76h$1thju$1@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!feeds.phibee-telecom.net!news.mixmin.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: BGB-Alt <bohannonindustriesllc@gmail.com> Newsgroups: comp.arch Subject: Re: binary128 implementation Date: Thu, 23 May 2024 14:59:12 -0500 Organization: A noiseless patient Spider Lines: 65 Message-ID: <v2o76h$1thju$1@dont-email.me> References: <abe04jhkngt2uun1e7ict8vmf1fq8p7rnm@4ax.com> <memo.20240512203459.16164W@jgd.cix.co.uk> <v1rab7$2vt3u$1@dont-email.me> <20240513151647.0000403f@yahoo.com> <v1to2h$3km86$1@dont-email.me> <20240514221659.00001094@yahoo.com> <v234nr$12p27$1@dont-email.me> <20240516001628.00001031@yahoo.com> <v2cn4l$3bpov$1@dont-email.me> <v2d9sv$3fda0$1@dont-email.me> <20240519203403.00003e9b@yahoo.com> <2024May20.125648@mips.complang.tuwien.ac.at> <v2ffm2$3vs0t$1@dont-email.me> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Date: Thu, 23 May 2024 21:59:14 +0200 (CEST) Injection-Info: dont-email.me; posting-host="3530f2545554b77e32d0b2b8d79dfd40"; logging-data="2016894"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18SqyPUSLIu4xnv334BEnbQCVwqlu16AIU=" User-Agent: Mozilla Thunderbird Cancel-Lock: sha1:kX95H1IFYjlVNyvGEs0Y803t3W4= Content-Language: en-US In-Reply-To: <v2ffm2$3vs0t$1@dont-email.me> Bytes: 4183 On 5/20/2024 7:28 AM, Terje Mathisen wrote: > Anton Ertl wrote: >> Michael S <already5chosen@yahoo.com> writes: >>> On Sun, 19 May 2024 18:37:51 +0200 >>> Terje Mathisen <terje.mathisen@tmsw.no> wrote: >>>> The FMA normalizer has to handle a maximally bad cancellation, so it >>>> needs to be around 350 bits wide. Mitch knows of course but I'm >>>> guessing that this could at least be close to needing an extra cycle >>>> on its own and/or heroic hardware? >>>> >>>> Terje >>>> >>> >>> Why so wide? >>> Assuming that subnormal multiplier inputs are normalized before >>> multiplication, the product of multiplication is 226 bits >> >> The product of the mantissa multiplication is at most 226 bits even if >> you don't normalize subnormal numbers. For cancellation to play a >> role the addend has to be close in absolute value and have the >> opposite sign as the product, so at most one additional bit comes into >> play for that case (for something like the product being >> 0111111... and the addend being -10000000...). > > This is the part of Mitch's explanation that I have never been able to > totally grok, I do think you could get away with less bits, but only if > you can collapse the extra mantissa bits into sticky while aligning the > product with the addend. If that takes too long or it turns out to be > easier/faster in hardware to simply work with a much wider mantissa, > then I'll accept that. > > I don't think I've ever seen Mitch make a mistake on anything like this! > It is a mystery, though seems like maybe Binary128 FMA could be done in software via an internal 384-bit intermediate?... My thinking is, say, 112*112, padded by 2 bits (so 114 bits), leads to 228 bits. If one adds another 116 bits (for maximal FADD), this comes to 344. In this case, 384 bits would be because my "_BitInt" support code pads things to a multiple of 128 bits (for integer types larger than 256 bits). It isn't fast, but I am not against having Binary128 being slower, since if one is using Binary128 ("long double" or "__float128" in this case), it is likely the case that precision is more a priority than speed. Though, as of yet, there is no Binary128 FMA operation (in the software runtime). Could potentially add this in theory. I guess, maybe also possible could be whether to add the FADDX/FMULX/FMACX instructions in a form where they are allowed, but will be turned into runtime traps (would likely route them through the TLB Miss ISR, which thus far has ended up as a catch-all for this sort of thing...). Though, likely more efficient would still be "just use the runtime calls". > Terje > >