Article <2025Feb26.093028@mips.complang.tuwien.ac.at>

Deutsch English Français Italiano
<2025Feb26.093028@mips.complang.tuwien.ac.at>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.lang.forth
Subject: Re: Stack vs stackless operation
Date: Wed, 26 Feb 2025 08:30:28 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 116
Message-ID: <2025Feb26.093028@mips.complang.tuwien.ac.at>
References: <591e7bf58ebb1f90bd34fba20c730b83@www.novabbs.com> <34df278ef0a52d0eab9d035f45795389@www.novabbs.com> <vploha$3h0e8$2@paganini.bofh.team>
Injection-Date: Wed, 26 Feb 2025 10:49:08 +0100 (CET)
Injection-Info: dont-email.me; posting-host="c59fc92ccbf7bc5f26f5e1bff4baf817";
	logging-data="2645528"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1+BKWug5UNi1xGbhcUcfH1e"
Cancel-Lock: sha1:ZEeBgBRoENAIK3EDjq3XMZGWSwc=
X-newsreader: xrn 10.11

antispam@fricas.org (Waldek Hebisch) writes:
>Potential alternative is
>a pair of operations, say PUSH and POP, and Forth compiler
>that replaces pair like V1 @ by PUSH(V1).  Note that here
>address of V1 is intended to be part to PUSH (so it will
>take as much space as separate V1 and @, but is only a
>single primitive).

In Gforth variables are compiled as "lit <addr>", and Gforth has a
primitive LIT@, and a generalized constant-folding optimization that
replaces "lit <addr> @" with "LIT@ <addr>".  In the Gforth image there
are 490 uses of lit@ (out of 33611 uses of primitives) and 76
occurences of "lit @" (in parts that are compiled before the
generalized constant-folding is active).  There are also 293
occurences of "lit !".

However, given the minimal difference between the code produced for
"LIT@" and "LIT @", LIT@ is no longer beneficial.  E.g.:

variable v  ok
: foo1 v @ + ;  ok
: foo2 v [ basic-block-end ] @ + ;  ok
see-code foo1 
$7F27184A08B0 lit@    1->2 
$7F27184A08B8 v
7F271806B580:   mov     rax,$08[rbx]
7F271806B584:   mov     r15,[rax]
$7F27184A08C0 +    2->1 
7F271806B587:   add     r13,r15
$7F27184A08C8 ;s    1->1
....
see-code foo2 
$7F27184A08F8 lit    1->2 
$7F27184A0900 v
7F271806B596:   mov     r15,$08[rbx]
$7F27184A0908 @    2->2 
7F271806B59A:   mov     r15,[r15]
$7F27184A0910 +    2->1 
7F271806B59D:   add     r13,r15
$7F27184A0918 ;s    1->1 
....

>More generally, a simple "optimizer" that replaces short
>sequences of Forth primitives by different, shorter sequence
>of primitives is likely to give similar gain.  However,
>chance of match decreases with length of the sequence.

Gforth has that as static superinstructions.  You can see the
sequences in
<http://git.savannah.gnu.org/cgit/gforth.git/tree/peeprules.vmg>, in
the lines before there is any occurence of prim-states or something
similar.  As you can see, many of the formerly-used sequences are now
commented out, because static superinstructions do not play well with
a) static stack caching (currently static superinstructions only work
for the default stack cache state) and b) IP-update optimization (if
one of the primitives in the sequence has an immediate argument (e.g.,
LIT), you would need additional variants for various IP offsets, or
update the IP before the sequence).

The remaining static superinstructions

* have to do with stacks where we do not have stack caching (FP stack,
  locals stack, return stack),

* are combinations of comparison primitives and ?BRANCH (this avoids
  the need to reify the result of the comparison in a general-purpose
  register), or

* are sequences of typical memory-access words (not because they occur
  so often, but because it's better to have a small number of words
  that can be combined, and a number of combinations in the optimizer
  than to have a combinatorial explosion of words).

>Above you bet on relatively long seqences (and on programmer
>writing alternative seqence).  Shorter seqences have more
>chance of matching, so you need smaller number of them
>for similar gain.

That's certainly our experience.  Long sequences with high dynamic
counts often come out of the inner loop of a single benchmark, and do
not help other programs at all.  We later preferred to go with static
usage counts (i.e., the sequence occurs several times in the code),
and this naturally leads to short sequences.

>One can
>do better than using machine stack, namely keeping thing in
>registers, but that means generating machine code and doing
>optimization.

Gforth does stack caching at the level of primitives, by having
several variants of the primitives for different start and end states
of the primitives, and using a shortest-path search for finding out
which combination of these variants to use.  However, for multiple
stacks this leads to a large number of states, and the shortest-path
algorithm becomes too expensive.  For now we only stack-cache the data
stack.

For extending this to multiple stacks, I see several alternatives:

* Use a greedy algorithm instead of an optimal shortest-path
  algorithm.  The difference is probably non-existent in most cases.

* Manage the stack cache using register allocation techniques instead
  of representing it as an abstract state.  This would often produce
  similar results as the greedy technique, but it can also handle
  stack manipulation words cheaply without having an explosion of
  stack states and the related complexity in the generator that
  generates the states and the tables for the state-handling.

- anton
-- 
M. Anton Ertl  http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
     New standard: https://forth-standard.org/
EuroForth 2023 proceedings: http://www.euroforth.org/ef23/papers/
EuroForth 2024 proceedings: http://www.euroforth.org/ef24/papers/