Deutsch English Français Italiano |
<2025Feb26.093028@mips.complang.tuwien.ac.at> View for Bookmarking (what is this?) Look up another Usenet article |
Path: news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail From: anton@mips.complang.tuwien.ac.at (Anton Ertl) Newsgroups: comp.lang.forth Subject: Re: Stack vs stackless operation Date: Wed, 26 Feb 2025 08:30:28 GMT Organization: Institut fuer Computersprachen, Technische Universitaet Wien Lines: 116 Message-ID: <2025Feb26.093028@mips.complang.tuwien.ac.at> References: <591e7bf58ebb1f90bd34fba20c730b83@www.novabbs.com> <34df278ef0a52d0eab9d035f45795389@www.novabbs.com> <vploha$3h0e8$2@paganini.bofh.team> Injection-Date: Wed, 26 Feb 2025 10:49:08 +0100 (CET) Injection-Info: dont-email.me; posting-host="c59fc92ccbf7bc5f26f5e1bff4baf817"; logging-data="2645528"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+BKWug5UNi1xGbhcUcfH1e" Cancel-Lock: sha1:ZEeBgBRoENAIK3EDjq3XMZGWSwc= X-newsreader: xrn 10.11 antispam@fricas.org (Waldek Hebisch) writes: >Potential alternative is >a pair of operations, say PUSH and POP, and Forth compiler >that replaces pair like V1 @ by PUSH(V1). Note that here >address of V1 is intended to be part to PUSH (so it will >take as much space as separate V1 and @, but is only a >single primitive). In Gforth variables are compiled as "lit <addr>", and Gforth has a primitive LIT@, and a generalized constant-folding optimization that replaces "lit <addr> @" with "LIT@ <addr>". In the Gforth image there are 490 uses of lit@ (out of 33611 uses of primitives) and 76 occurences of "lit @" (in parts that are compiled before the generalized constant-folding is active). There are also 293 occurences of "lit !". However, given the minimal difference between the code produced for "LIT@" and "LIT @", LIT@ is no longer beneficial. E.g.: variable v ok : foo1 v @ + ; ok : foo2 v [ basic-block-end ] @ + ; ok see-code foo1 $7F27184A08B0 lit@ 1->2 $7F27184A08B8 v 7F271806B580: mov rax,$08[rbx] 7F271806B584: mov r15,[rax] $7F27184A08C0 + 2->1 7F271806B587: add r13,r15 $7F27184A08C8 ;s 1->1 .... see-code foo2 $7F27184A08F8 lit 1->2 $7F27184A0900 v 7F271806B596: mov r15,$08[rbx] $7F27184A0908 @ 2->2 7F271806B59A: mov r15,[r15] $7F27184A0910 + 2->1 7F271806B59D: add r13,r15 $7F27184A0918 ;s 1->1 .... >More generally, a simple "optimizer" that replaces short >sequences of Forth primitives by different, shorter sequence >of primitives is likely to give similar gain. However, >chance of match decreases with length of the sequence. Gforth has that as static superinstructions. You can see the sequences in <http://git.savannah.gnu.org/cgit/gforth.git/tree/peeprules.vmg>, in the lines before there is any occurence of prim-states or something similar. As you can see, many of the formerly-used sequences are now commented out, because static superinstructions do not play well with a) static stack caching (currently static superinstructions only work for the default stack cache state) and b) IP-update optimization (if one of the primitives in the sequence has an immediate argument (e.g., LIT), you would need additional variants for various IP offsets, or update the IP before the sequence). The remaining static superinstructions * have to do with stacks where we do not have stack caching (FP stack, locals stack, return stack), * are combinations of comparison primitives and ?BRANCH (this avoids the need to reify the result of the comparison in a general-purpose register), or * are sequences of typical memory-access words (not because they occur so often, but because it's better to have a small number of words that can be combined, and a number of combinations in the optimizer than to have a combinatorial explosion of words). >Above you bet on relatively long seqences (and on programmer >writing alternative seqence). Shorter seqences have more >chance of matching, so you need smaller number of them >for similar gain. That's certainly our experience. Long sequences with high dynamic counts often come out of the inner loop of a single benchmark, and do not help other programs at all. We later preferred to go with static usage counts (i.e., the sequence occurs several times in the code), and this naturally leads to short sequences. >One can >do better than using machine stack, namely keeping thing in >registers, but that means generating machine code and doing >optimization. Gforth does stack caching at the level of primitives, by having several variants of the primitives for different start and end states of the primitives, and using a shortest-path search for finding out which combination of these variants to use. However, for multiple stacks this leads to a large number of states, and the shortest-path algorithm becomes too expensive. For now we only stack-cache the data stack. For extending this to multiple stacks, I see several alternatives: * Use a greedy algorithm instead of an optimal shortest-path algorithm. The difference is probably non-existent in most cases. * Manage the stack cache using register allocation techniques instead of representing it as an abstract state. This would often produce similar results as the greedy technique, but it can also handle stack manipulation words cheaply without having an explosion of stack states and the related complexity in the generator that generates the states and the tables for the state-handling. - anton -- M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html New standard: https://forth-standard.org/ EuroForth 2023 proceedings: http://www.euroforth.org/ef23/papers/ EuroForth 2024 proceedings: http://www.euroforth.org/ef24/papers/