Warning: mysqli::__construct(): (HY000/1203): User howardkn already has more than 'max_user_connections' active connections in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\includes\artfuncs.php on line 21
Failed to connect to MySQL: (1203) User howardkn already has more than 'max_user_connections' active connections
Warning: mysqli::query(): Couldn't fetch mysqli in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\index.php on line 66
Article <v2o76h$1thju$1@dont-email.me>
Deutsch   English   Français   Italiano  
<v2o76h$1thju$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!feeds.phibee-telecom.net!news.mixmin.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: BGB-Alt <bohannonindustriesllc@gmail.com>
Newsgroups: comp.arch
Subject: Re: binary128 implementation
Date: Thu, 23 May 2024 14:59:12 -0500
Organization: A noiseless patient Spider
Lines: 65
Message-ID: <v2o76h$1thju$1@dont-email.me>
References: <abe04jhkngt2uun1e7ict8vmf1fq8p7rnm@4ax.com>
 <memo.20240512203459.16164W@jgd.cix.co.uk> <v1rab7$2vt3u$1@dont-email.me>
 <20240513151647.0000403f@yahoo.com> <v1to2h$3km86$1@dont-email.me>
 <20240514221659.00001094@yahoo.com> <v234nr$12p27$1@dont-email.me>
 <20240516001628.00001031@yahoo.com> <v2cn4l$3bpov$1@dont-email.me>
 <v2d9sv$3fda0$1@dont-email.me> <20240519203403.00003e9b@yahoo.com>
 <2024May20.125648@mips.complang.tuwien.ac.at> <v2ffm2$3vs0t$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 23 May 2024 21:59:14 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="3530f2545554b77e32d0b2b8d79dfd40";
	logging-data="2016894"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX18SqyPUSLIu4xnv334BEnbQCVwqlu16AIU="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:kX95H1IFYjlVNyvGEs0Y803t3W4=
Content-Language: en-US
In-Reply-To: <v2ffm2$3vs0t$1@dont-email.me>
Bytes: 4183

On 5/20/2024 7:28 AM, Terje Mathisen wrote:
> Anton Ertl wrote:
>> Michael S <already5chosen@yahoo.com> writes:
>>> On Sun, 19 May 2024 18:37:51 +0200
>>> Terje Mathisen <terje.mathisen@tmsw.no> wrote:
>>>> The FMA normalizer has to handle a maximally bad cancellation, so it
>>>> needs to be around 350 bits wide. Mitch knows of course but I'm
>>>> guessing that this could at least be close to needing an extra cycle
>>>> on its own and/or heroic hardware?
>>>>
>>>> Terje
>>>>
>>>
>>> Why so wide?
>>> Assuming that subnormal multiplier inputs are normalized before
>>> multiplication, the product of multiplication is 226 bits
>>
>> The product of the mantissa multiplication is at most 226 bits even if
>> you don't normalize subnormal numbers.  For cancellation to play a
>> role the addend has to be close in absolute value and have the
>> opposite sign as the product, so at most one additional bit comes into
>> play for that case (for something like the product being
>> 0111111... and the addend being -10000000...).
> 
> This is the part of Mitch's explanation that I have never been able to 
> totally grok, I do think you could get away with less bits, but only if 
> you can collapse the extra mantissa bits into sticky while aligning the 
> product with the addend. If that takes too long or it turns out to be 
> easier/faster in hardware to simply work with a much wider mantissa, 
> then I'll accept that.
> 
> I don't think I've ever seen Mitch make a mistake on anything like this!
> 

It is a mystery, though seems like maybe Binary128 FMA could be done in 
software via an internal 384-bit intermediate?...

My thinking is, say, 112*112, padded by 2 bits (so 114 bits), leads to 
228 bits. If one adds another 116 bits (for maximal FADD), this comes to 
344.

In this case, 384 bits would be because my "_BitInt" support code pads 
things to a multiple of 128 bits (for integer types larger than 256 bits).


It isn't fast, but I am not against having Binary128 being slower, since 
if one is using Binary128 ("long double" or "__float128" in this case), 
it is likely the case that precision is more a priority than speed.

Though, as of yet, there is no Binary128 FMA operation (in the software 
runtime). Could potentially add this in theory.


I guess, maybe also possible could be whether to add the 
FADDX/FMULX/FMACX instructions in a form where they are allowed, but 
will be turned into runtime traps (would likely route them through the 
TLB Miss ISR, which thus far has ended up as a catch-all for this sort 
of thing...).

Though, likely more efficient would still be "just use the runtime calls".

> Terje
> 
>