Article <v1ubel$3p1kt$1@dont-email.me>

Deutsch English Français Italiano
<v1ubel$3p1kt$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!weretis.net!feeder9.news.weretis.net!feeder8.news.weretis.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: BGB <cr88192@gmail.com>
Newsgroups: comp.arch
Subject: Re: Making Lemonade (Floating-point format changes)
Date: Mon, 13 May 2024 19:32:16 -0500
Organization: A noiseless patient Spider
Lines: 160
Message-ID: <v1ubel$3p1kt$1@dont-email.me>
References: <abe04jhkngt2uun1e7ict8vmf1fq8p7rnm@4ax.com>
 <memo.20240512203459.16164W@jgd.cix.co.uk> <v1rab7$2vt3u$1@dont-email.me>
 <20240513151647.0000403f@yahoo.com> <v1tre1$3leqn$1@dont-email.me>
 <9c79fb24a0cf92c5fac633a409712691@www.novabbs.org>
 <v1u6oi$3o53t$1@dont-email.me>
 <bcbda29c4c23543d1ed6de8290d1dc3b@www.novabbs.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 14 May 2024 02:32:22 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="f7bb63f21ae876478e2f78c3252a6146";
	logging-data="3966621"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX19S4GR/TcrA9XGsEut4cYD6Ptn0O75vLWE="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:xNYjqxRzgFuSDPvNoKXje8TI0eg=
Content-Language: en-US
In-Reply-To: <bcbda29c4c23543d1ed6de8290d1dc3b@www.novabbs.org>
Bytes: 7342

On 5/13/2024 6:25 PM, MitchAlsup1 wrote:
> BGB wrote:
> 
>> On 5/13/2024 4:16 PM, MitchAlsup1 wrote:
>>> BGB wrote:
>>>
>>>>
>>>> Emulation via traps is very slow, but typical for many ISA's is to 
>>>> just quietly turn the soft-float operations into runtime calls.
>>>
>>> I recall that MIPS could emulate a TLB table walk in something like
>>> 19 cycles. That is:: a few cycles to get there, a hash table access,
>>> a check, a TLB install, and a few cycles to get back.
>>>
>>> On an x86 this would be at least 200 cycles just getting there and back.
>>>
> 
>> I guess there are different possibilities here...
> 
>> Trap cost can be reduced, say, by having banked registers.
>> But, not so good with explicit save/restore and a large register file.
> 
> 
>> For example, I can note that a MSP430 at 16MHz can service a 32kHz 
>> timer... (with a budget of 488 cycles per interrupt).
> 
> 
>> But, my BJX2 core (at 50MHz) would have a harder time here, with 
>> around a 1.5k cycle budget...
> 
>> Then again, it is possible the per-interrupt overhead would go down 
>> slightly, since most likely the ISR stack will still be in the L1 
>> cache between interrupts (and save/restore overhead should drop to ~ 
>> 100 cycles in the absence of L1 misses).
> 
> 
>> MSP430 had a slight advantage here (besides fewer registers) in that 
>> L1 misses are not a thing (so, memory access has constant latency).
> 
> 
>>> So, to revisit your statement::
>>>
>>> Emulation is slow when trap overhead is large and not-slow when trap 
>>> overhead
>>> is small.
> 
>> Possible, but I would not expect trap overhead to be lower than 
>> runtime call overhead...
> 
> Yes, of course, trapping can never be quite as inexpensive as a CALL/RET
> sequence.
> 
> But it does not have to be much larger--just a little bit larger.
> 

OK.

> 
>> Also (in my case):
>> Debugging is rather annoying in cases where dealing with bugs 
>> appear/disappear/move around at random or with the slightest 
>> perturbation...
> 
> You need better verification--Oh Wait ...
> 

Not sure I understand what you mean by this.


Some of these bugs are behaving very similar to some bugs I was battling 
against a while ago (but never properly debugged, the bug just sort of 
seemingly disappeared).


If so (and it is all the same bug), it means the bug shows up:
   In both the emulator and Verilog implementation;
   In both BJX2 (BGBCC) and RV64G (GCC) builds;
   Applies across multiple programs (as well as the shell/kernel).

Vs, say, it being a new bug that has appeared just because I had started 
preempting the threads.


Main sort of observed behavior:
Random memory corruptions seem to occur, seemingly most often scattered 
on the stack, usually associated with either file IO (via stdio) or 
"printf()".


Granted, this code is kind of a tangled and hairy mess, and one of the 
remaining parts of PDPCLIB that I had not entirely replaced (though had 
previously removed most of the "IBM MVS" stuff due to being irrelevant 
to my uses).

Had also been gradually weeding out a lot of the code related to things 
other than ASCII with LF only text handling (TestKern not using either 
EBCDIC nor CR+LF file IO). Probably wouldn't surprise me if some stuff 
is still buggy in there (and had already ended up rewriting much of the 
rest of the C library).


Though, "I am just going to yank and replace all of the "stdio.h" stuff 
because it seems to be potentially buggy and I can't seem to find the 
bug" seems a bit drastic... Could be something like a dangling pointer 
somewhere or something.

Does seem to be more closely associated with "stdio.h" stuff, than, say, 
with things related to the FAT driver or similar.


>> But, given for the most part behavior is consistently buggy (and 
>> manifesting in seemingly the same ways) between both the emulator and 
>> Verilog implementation, this implies the causal factors are in software.
> 
>> I guess in this case, either I figure it out, or will need to again go 
>> back to cooperative scheduling. Seemingly, using preemptive scheduling 
>> and virtual memory at the same time is particularly unstable (programs 
>> tend to crash on startup or soon after).
> 
> 
>> Also I may need to rework how page-in/page-out is handled (and or how 
>> IO is handled in general) since if a page swap needs to happen while 
>> IO is already in progress (such as a page-miss in the system-call 
>> process), at present, the OS is dead in the water (one can't access 
>> the SDcard in the middle of a different access to the SDcard).
> 
> Having a HyperVisor helps a lot here, with HV taking the page faults
> of the OS page fault handler.

Seems like adding another layer couldn't help with this, unless it also 
abstracts away the SDcard interface.


Say, to read/write a sector:
   Shove the request out the SPI interface;
   Wait for response from SDcard (while shoveling FF bytes);
   Shovel FF bytes until done.

If we suddenly need to do another IO request in the middle (such as to 
swap something), there is a problem...


Would need some way to be sure that any low-level IO request either 
completes or can be completed independent of any interrupt handling needed.

Note that the syscall handler task can't be preempted, but trying to 
preempt this task would effectively nuke the OS (by leaving it in a 
state where execution could not continue).

Could almost justify moving all the SDcard IO into interrupt handlers, 
except that SDcard IO is slow, and long-running interrupt handlers is 
preferably avoided.

Possibly better would be if there were some mechanism where IO 
operations could be handled asynchronously (say, SDcard blocks being 
transferred to/from memory buffers without needing the CPU to spin in a 
loop doing MMIO requests to manage the SPI interface).

....