Path: ...!news.nobody.at!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Terje Mathisen Newsgroups: comp.arch Subject: Re: Misc: Applications of small floating point formats. Date: Sat, 3 Aug 2024 11:40:23 +0200 Organization: A noiseless patient Spider Lines: 47 Message-ID: References: <61e1f6f5f04ad043966b326d99e38928@www.novabbs.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable Injection-Date: Sat, 03 Aug 2024 11:40:24 +0200 (CEST) Injection-Info: dont-email.me; posting-host="f92b7cac1ac350d69bc0d0d755577e22"; logging-data="3573909"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19OnivYRSCpH8YLrFQaAORHr69BOMc+YTh4lKJkT4+31g==" User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Firefox/91.0 SeaMonkey/2.53.18.2 Cancel-Lock: sha1:CjwzHBsXusty5tpbE7e/nU5TjYk= In-Reply-To: <61e1f6f5f04ad043966b326d99e38928@www.novabbs.org> Bytes: 2871 MitchAlsup1 wrote: > On Wed, 31 Jul 2024 23:31:35 +0000, BGB wrote: >=20 >> So, say, we have common formats: >> =C2=A0=C2=A0 Binary64, S.E11.F52, Common Use >> =C2=A0=C2=A0 Binary32, S.E8.F23, Common Use >> =C2=A0=C2=A0 Binary16, S.E5.F10, Less Common Use >> >> But, things get funky below this: >> =C2=A0=C2=A0 A-Law: S.E3.F4 (Bias=3D8) >> =C2=A0=C2=A0 FP8: S.E4.F3 (Bias=3D7) (E4M3 in NVIDIA terms) >> =C2=A0=C2=A0 FP8U: E4.F4 (Bias=3D7) >> =C2=A0=C2=A0 FP8S: E4.F3.S (Bias=3D7) >> >> >> Semi-absent in my case: >> =C2=A0=C2=A0 BFloat16: S.E8.F7 >> =C2=A0=C2=A0=C2=A0=C2=A0 Can be faked in software in my case using Shu= ffle ops. >> =C2=A0=C2=A0 NVIDIA E5M2 (S.E5.F2) >> =C2=A0=C2=A0=C2=A0=C2=A0 Could be faked using RGBA32 pack/unpack ops. >=20 > So, you have identified the problem:: 8-bits contains insufficient > exponent and fraction widths to be considered standard format. > Thus, in order to utilize 8-bit FP one needs several incarnations. > This just points back at the problem:: FP needs at least 10 bits. I agree that fp10 is probably the shortest sane/useful version, but=20 1:3:4 does in fact contain enough exponent and mantissa bits to be=20 considered an ieee754 format. 3 exp bits means that you have 6 steps for regular/normal numbers, which = is enough to give some range. 4 mantissa bits (with hidden bit of course) handles=20 zero/subnormal/normal/infinity/qnan/snan. Afair the absolute limit is two mantissa bits in order to differentiate=20 between Inf/QNaN and SNaN, as well as two exp bits, so fp5 (1:2:2) Terje --=20 - "almost all programming can be viewed as an exercise in caching"