| Deutsch English Français Italiano |
|
<2024Jul14.150015@mips.complang.tuwien.ac.at> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.lang.forth
Subject: Re: Implementing DOES>: How not to do it (and why not) and how to do it
Date: Sun, 14 Jul 2024 13:00:15 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 69
Message-ID: <2024Jul14.150015@mips.complang.tuwien.ac.at>
References: <2024Jul11.160602@mips.complang.tuwien.ac.at> <2024Jul13.173138@mips.complang.tuwien.ac.at> <nnd$68dd354d$5a60d664@a6110a1e6f38ddc9>
Injection-Date: Sun, 14 Jul 2024 16:08:42 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="f85a3159a72ad57e0261f4b9fb190d6f";
logging-data="200917"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX186iqZwzKXmC4uMgiHAK1j8"
Cancel-Lock: sha1:sNaTHEvr7isfvya0Yr5TG9b2+NA=
X-newsreader: xrn 10.11
Bytes: 4586
albert@spenarnc.xs4all.nl writes:
>In article <2024Jul13.173138@mips.complang.tuwien.ac.at>,
>Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
><SNIP>
>>
>>In any case, if you are a system implementor, you may want to check
>>your DOES> implementation with a microbenchmark that stores into the
>>does-defined word in a case where that word is not inlined.
>
>Is that equally valid for indirect threaded code?
>In indirect threaded code the instruction and data cache
>are more separated, e.g. in a simple Forth all the low level
>code could fit in the I-cache, if I'm not mistaken.
It depends on how the system is implemented. In fig-Forth, the code
for the primitives is mixed with the data. In such a system, if you
have memory close to the primitive that is frequently being written
to, you will see cache ping-pong. If the native code of the
primitives is in a separate area with enough distance to data, there
should be no such issues for primitives.
Also, there are at least two ways to implement DOES>-defined words:
1) In addition to the code address of DODOES, have an extra cell
pointing to the code after DOES> so that DODOES can find it. This
is used in fig-Forth with <BUILDS (which reserves the extra cell in
fig-Forth, whereas CREATE does not and cannot be used with DOES> in
fig-Forth). In Gforth all words including CREATEd words have a
two-cell code field since the beginning, and indirect threaded
variants of Gforth (including the hybrid direct/indirect threaded
approach we have used on all architectures since 2001) have used
this. With the new header the details are a bit different, but the
principle is the same.
2) The F83 way of implementing CREATE ... DOES> (others probably have
used this way earlier, and it probably led to the introduction of
CREATE...DOES>, but F83 is a system that I find documentation
about): There is only a single cell at the code field, and it
points to a native-code CALL DODOES that sits between the (DOES>)
and the threaded code for the source code behind the DOES>. DODOES
then pops the return address of the call, and this is the address
of the threaded code to be called by DODOES>, while the data
address is in W, as usual. Gforth used a similar approach for
direct-threaded implementations (until 2001), but IIRC without the
calling and popping. It seems that the DOES> implementation of
systems A and B were inspired by this approach.
Inside F83 <https://www.forth.org/OffeteStore/1003_InsideF83.pdf>
discusses this starting in Section "High Level Inner Interpreter"
on page 45, but IMO Ting uses confusing terminology here: What I
call run-time routines for defining words, Ting calls "inner
interpreter" (for me the "inner interpreter" is NEXT).
For the first way, no cache ping-pong nor a particularly high level of
branch mispredictions is expected.
For the second way, I expect cache ping-pong if there is written data
close to the DOES>. I don't expect a particularly high way of branch
mispredictions in indirect threaded code: While the hardware return
stack will get out of sync because of the call-pop usage, indirect
threaded code does not have returns that would mispredict because of
this lack of synchronization.
- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2024: https://euro.theforth.net