Path: ...!weretis.net!feeder9.news.weretis.net!i2pn.org!i2pn2.org!.POSTED!not-for-mail
From: George Neuner <gneuner2@comcast.net>
Newsgroups: comp.arch
Subject: Re: Misc: Applications of small floating point formats.
Date: Tue, 06 Aug 2024 11:51:57 -0400
Organization: i2pn2 (i2pn.org)
Message-ID: <r1h4bj14p401bs7mfpjudnm1t43gpk7r0g@4ax.com>
References: <v8ehgr$1q8sr$1@dont-email.me> <61e1f6f5f04ad043966b326d99e38928@www.novabbs.org> <v8ktu7$3d24l$1@dont-email.me> <v8m6an$3kkjg$1@dont-email.me> <h0v1bjlqmqab7gvaudpv93h0cr59pk8t49@4ax.com> <v8rk3b$154kc$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Injection-Info: i2pn2.org;
	logging-data="1668593"; mail-complaints-to="usenet@i2pn2.org";
	posting-account="h5eMH71iFfocGZucc+SnA0y5I+72/ecoTCcIjMd3Uww";
User-Agent: ForteAgent/8.00.32.1272
X-Spam-Checker-Version: SpamAssassin 4.0.0
Bytes: 4140
Lines: 79

On Mon, 5 Aug 2024 17:35:22 -0500, BGB-Alt
<bohannonindustriesllc@gmail.com> wrote:

>On 8/5/2024 11:24 AM, George Neuner wrote:
>> On Sat, 3 Aug 2024 21:09:43 -0000 (UTC), Lawrence D'Oliveiro
>> <ldo@nz.invalid> wrote:
>> 
>>> On Sat, 3 Aug 2024 11:40:23 +0200, Terje Mathisen wrote:
>>>
>>>> MitchAlsup1 wrote:
>>>>
>>>>> So, you have identified the problem:: 8-bits contains insufficient
>>>>> exponent and fraction widths to be considered standard format. Thus, in
>>>>> order to utilize 8-bit FP one needs several incarnations.
>>>>> This just points back at the problem:: FP needs at least 10 bits.
>>>>
>>>> I agree that fp10 is probably the shortest sane/useful version, but
>>>> 1:3:4 does in fact contain enough exponent and mantissa bits to be
>>>> considered an ieee754 format.
>>>
>>> The AI folks are quite happy with 8-bit floats for many applications. In
>>> fact, they prefer more exponent bits and fewer in the mantissa.
>> 
>> Insufficient precision is one of the many reasons that ANNs are prone
>> to hallucinate.
>
>Also likely depends on the type of NN as well.
>
>As noted, for some of the stuff I had tried doing, there was a 
>noticeable detrimental effect with fewer than around 8 to 10 bits in the 
>mantissa for the accumulator. Weights and biases could use fewer bits 
>(as could the inputs/outputs between layers), but not so much the 
>accumulator.
>
>Whereas, large exponent ranges tended to be much less of a factor 
>(though with training via genetic algos, it was needed to detect and 
>handle some cases where values went outside of a "reasonable" exponent 
>range, such as E+14 or so).

You can use more precision in the mantissa, or more range in the
exponent ... generally you don't need both ;-) ... but in either you
do need *enough* bits.

The problem with 8-bit reals is they have neither enough precision nor
enough range - they too easily can be saturated during training, and
even if the values are (re)normalized afterward, the damage already
has been done.

16-bit values seem to enough for many uses. It does not matter much
how the bits are split mantissa vs exponent ... what matters is having
enough relevant (to the algorithm) bits to avoid values being
saturated during training.


>One other thing I had found was that it was possible to DC-bias the 
>inputs (before multiplying against the weight), but the gains were small.
>
>
>So, say, for each input:
>   (In+InBias)*Weight
>Then, output:
>   OutFunc(Accum*OutGain+OutBias)
>
>Though, OutGain is also debatable (as is InBias), but both seem to help 
>slightly. Theoretically, they are unnecessary as far as the math goes 
>(and what gains they offer are more likely a product of numerical 
>precision and the training process).
>
>Will note that for transfer functions, I have tended to use one of:
>   SQRT: (x>0)?sqrt(x):0
>   ReLU: (x>0)?x:0
>   SSQRT: (x>0)?sqrt(x):-sqrt(-x)
>   Heaviside: (x>0)?1:0
>
>While tanh is traditionally popular, it had little obvious advantage 
>over SSQRT and lacks a cheap approximation (and numerical accuracy 
>doesn't really matter here).
>
>...
>