Article <vb2lt5$1krug$1@dont-email.me>

Deutsch English Français Italiano
<vb2lt5$1krug$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!feeds.phibee-telecom.net!news.mixmin.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: BGB <cr88192@gmail.com>
Newsgroups: comp.arch
Subject: Re: Computer architects leaving Intel...
Date: Sun, 1 Sep 2024 16:21:38 -0500
Organization: A noiseless patient Spider
Lines: 239
Message-ID: <vb2lt5$1krug$1@dont-email.me>
References: <vajo7i$2s028$1@dont-email.me>
 <memo.20240827205925.19028i@jgd.cix.co.uk> <valki8$35fk2$1@dont-email.me>
 <2644ef96e12b369c5fce9231bfc8030d@www.novabbs.org>
 <vam5qo$3bb7o$1@dont-email.me>
 <2f1a154a34f72709b0a23ac8e750b02b@www.novabbs.org>
 <vaoqcf$3r1u3$1@dont-email.me>
 <2366e332022b8bc8bf2cae9dae663eeb@www.novabbs.org>
 <vaqgtl$3526$1@dont-email.me>
 <d0539090a239743b48e51ec4ae7ecffd@www.novabbs.org>
 <vathse$m1vn$1@dont-email.me>
 <67b047dcb5e46380694b8cddaf1658bb@www.novabbs.org>
 <vb11t3$1dehl$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 01 Sep 2024 23:21:42 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="ef4435790c7892fcb974e9d35f349707";
	logging-data="1732560"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX18urbShXFK6Ejvjfe3CfJoD1LcG4rxfr7I="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:oMqsLFIvf/xy9ESge9X2fDhbOHU=
In-Reply-To: <vb11t3$1dehl$1@dont-email.me>
Content-Language: en-US
Bytes: 11056

On 9/1/2024 1:34 AM, Terje Mathisen wrote:
> MitchAlsup1 wrote:
>> On Fri, 30 Aug 2024 22:42:19 +0000, BGB wrote:
>>
>>> On 8/30/2024 1:11 PM, MitchAlsup1 wrote:
>>>> On Thu, 29 Aug 2024 19:07:29 +0000, BGB wrote:
>>>> Integer Overflow
>>>
>>> Not usually a thing. Pretty much everything seems to treat integer
>>> overflow as silently wrapping.
>>
>> ADA wants these.
>>
>>>
>>>
>>>> Bad Instruction encoding--OpCode exists but not as this
>>>>  Â Â  instruction uses it. Random code generation can use
>>>>  Â Â  every instruction without privilege.
>>>
>>> Hit or miss.
>>>
>>> Will usually fault on invalid instructions.
>>
>> Must be 100% to guarantee upwards compatibility.
>>
>>> There is logic in place to reject privileged instructions in user-mode,
>>> if the CPU is actually run in user-mode. Some of this is still TODO
>>> (currently, TestKern is still running everything in Supervisor Mode).
>>
>> Yes, it is a pain--but a pain that is absolutely worth it.
>>
>>>
>>> The alternative is to treat them as UB, so they may be one of:
>>>    Trap;
>>>    Do something else (like, if an instruction was added);
>>>    Do something wonky / unintended.
>>>
>>> In practice, this seems to be more how it works.
>>
>> Bad practice == not industrial quality.
>>
>>>
>>>  > Bad address--address exists but you are not allowed to touch it>  Â Â
>>> with LD or ST instruction or to attempt to execute it.
>>>
>>> If the MMU is enabled, it should fault on bad memory accesses.
>>>
>>> In physical addressing mode, it does not trap.
>>
>> YOU FAIL TO UNDERSTAND--there is an area in memory where the
>> preserved registers are stored--stored in a way that only 3
>> instructions can access--and the PTE is marked RWE=000
>> This prevents damaging the contract between callee and caller.
>> 3 instructions can access these pages ENTER, EXIT and RET
>> nothing else.
>>
>>>
>>> IIRC, there was a mechanism on the bus to deal with accesses to bad
>>> physical addresses (returning all zeroes). Otherwise, trying to access
>>> an invalid address would cause the CPU to deadlock.
>>
>> It is NOT a BAD address--it is a good but inaccessible address
>> outside those 3 instructions.
>>>
>>>
>>>>
>>>> As I understand it, you don't even get FMUL correctly rounded.
>>>> To get it properly rounded you have to compute the full 53*53
>>>> product.
>>>
>>> AFAICT, this wasn't required for the 1985 spec...
>>
>> You Cannot get rounding correct unless you "compute as if to
>> infinite precision" and then follow the rules of rounding
>> (all modes).
> 
> This rule is in fact really simple:
> 
> In all versions of the standard, from the very first up to the upcoming 
> 2029, the core instructions (FADD/FSUB/FMUL/FDIV/FSQRT) MUST result in 
> the correctly rounded result, according to whatever the current rounding 
> mode is/was.
> 
> This does mean that you have to act as if you did the calculation to 
> arbitrary/infinite precision, which really means "enough bits so that 
> any following bits do not matter for the rounding result".
> 
> It was a revelation to me when I wrote my first fp emulation code and 
> grok'ed how having a single guard bit followed by a sticky bit was 
> sufficient to do this for all rounding modes.
> 
> At that point I only needed to maintain enough intermediate bits to 
> guarantee I would still have those rounding bits after normalization.
> 
> This doesn't mean that I could skip calculating all the bits of the full 
> NxN->2N mantissa product, only that I didn't need to keep them all 
> around after normalization.
> 

OK.

It seemed like when I looked over the 1985 spec initially, it only 
required that the result be larger than that of the destination 
(seemingly missed the point of it also requiring infinite precision).

Say, 54*54 => 68 bits, where 68 > 52, under this interpretation, it 
would have worked. Granted, this does turn it into a probability game 
whether the result is correct or off by 1.

But, have now since noticed that it did specify computing to infinite 
precision (in this version of the standard), which, my FPU does not do.



There was mention of some operations that I have generally not seen in 
the ISA in real-world FPUs:
   An FP remainder operator;
   Converters to/from ASCII strings;
   An FP->Int truncate operator with the result still in FP format;
     Usually, one goes round-trip FP->Int->FP;
   ...

Seems like pretty much everyone offloaded these tasks to the C library.


I had ended up with coverage of most of the rest, albeit still lacking a 
"trap on denormal" handler (seemingly worked for MIPS and friends, *).

So, it seemed like it was getting pretty close to "could maybe pass the 
1985 spec if one lawyers it...". Maybe not so much it seems, unless I 
fix the FMUL issue (TBD if it can be done without significantly 
increasing adder-chain latency).


It is possible I could also add a check to detect and trap multiplies 
for cases where both values have non-zero low-order bits (allowing these 
to also be emulated in software).

So, went and added a flag for "Trap as needed to emulate full IEEE 
semantics" to FPSCR, where the idea is that enabling this will cause it 
to trap in cases where the FPU detects that the results would likely not 
match the IEEE standard (if using FADDG/FSUBG/FMULG/..., generally if 
fenv_access is enabled).

Might make sense to have a compiler option to assume fenv_access is 
always enabled.



*: Though, from what I can gather, most of the N64 games and similar had 
operated with this disabled (giving DAZ/FTZ semantics) which apparently 
posed an annoyance for later emulators (things like moving platforms in 
games like SMB64 would apparently slowly drift upwards or away from the 
origin if the map was left running for long enough, etc; due to SSE and 
similar tending to operate with denormals enabled).


But, I guess there was also fun that emulated textures don't look quite 
the same either, as N64 used an approximation of bilinear filtering that 
only sampled 3 points rather than the standard 4.

Though, in my rasterizer module, I also copied this trick (since it 
allows saving 1 block-texture decoder and can use cheaper interpolation 
logic). Well, and the recent extra wonk of shoving HDR (as E4.F4 FP8U) 
though this pathway (and falling back to software rendering based on the 
blending mode).

This, leading to extra wonk, like still using linear alpha blending, and 
========== REMAINDER OF ARTICLE TRUNCATED ==========