Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.lang.forth
Subject: Code generation for DOES> in Gforth
Date: Sat, 21 Sep 2024 17:25:51 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 103
Message-ID: <2024Sep21.192551@mips.complang.tuwien.ac.at>
Injection-Date: Sat, 21 Sep 2024 20:27:14 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="a8b737df8ec612c03e4bb249933a0570";
	logging-data="1793897"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX19YPxxFVBd79OrOVnOAyfUl"
Cancel-Lock: sha1:lgzXqSy2Rk1wkjK82CpWb3jy3Wg=
X-newsreader: xrn 10.11
Bytes: 4433

I recently noticed that Gforth still used the following COMPILE,
implementation for words defined with CREATE...SET-DOES> (and
consequently also for words defined with CREATE...DOES>):

: does, ( xt -- ) does-check ['] does-xt peephole-compile, , ;

Ignore DOES-CHECK (it has to do with stack-depth checking, still
incomplete).  The rest means that it compiles the primitive DOES-XT
with the xt of the COMPILE,d word as immediate argument.  DOES-XT
pushes the body of the word and then EXECUTEs the xt that SET-DOES>
has registered for this word.  In most cases this is a colon
definition (always if DOES> is used), so the next thing that happens
is DOCOL, and then the code for the colon definition is run.

I have now replaced this with

: does, ( xt -- ) does-check dup >body lit, >extra @ compile, ;

What this does is to compile the body as a literal, and then it
COMPILE,s the xt that DOES-XT would EXECUTE.  In the common case of a
colon definition this compiles a call to the colon definition.  This
saves the overhead of accessing the doesfield and of dispatching on
its contents at run-time; all that is now done during compilation.

Let us first look at the generated code.  Consider the example:

: myconst create , does> @ ;
5 myconst five
: foo five ;

SIMPLE-SEE FOO shows:

old                               new
$7F6F5CAE6BC8 does-xt    1->1     $7F46A7EA92B8 lit    1->1 
$7F6F5CAE6BD0 five                $7F46A7EA92C0 five
$7F6F5CAE6BD8 ;s    1->1  ok      $7F46A7EA92C8 call    1->1 
                                  $7F46A7EA92D0 $7F46A7C0A168 
                                  $7F46A7EA92D8 ;s    1->1

For the following microbenchmark:

: d1 ( "name" -- )
  create 0 ,
does> ( -- addr )
; \ yes, an empty DOES> exists in an application program
d1 z1

: bench-z1-comp ( -- )
    iterations 0 ?do
        1 z1 +!
    loop ;

I see the following results per iteration (startup overhead included)
on a Rocket Lake:

 old   new
 8.2   7.5 cycles:u
34.0  29.0 instructions:u
 5.2   4.2 branches:u

So five instructions less (including one branch), resulting in a small
speedup for this microbenchmark.
        
The Gforth image contained 129 occurences of does-xt and after the
change it contains 12 (a part of the image is created with the
cross-compiler, which still compiles to DOES-XT.  As a result, the
image size and gforth-fast (AMD64) native-code size in bytes are as
follows:

  old     new
2189364 2193264 image
 448291  448659 native-code

The larger image is no surprise.  For the 117 replaced does-xts, the
threaded code grows by 2 cells each, and the meta-data grows
correspondingly.

For the native code, the growth is not that expected.  Let's see how
the code looks:

does-xt               lit call
add rbx,$10           mov $00[r13],r8 
mov $00[r13],r8       sub r13,$08     
mov r8,-$08[rbx]      mov r8,$08[rbx] 
sub r13,$08           mov rax,$18[rbx]
sub rbx,$08           sub r14,$08     
mov rax,-$08[r8]      add rbx,$20     
mov rdx,$18[rax]      mov [r14],rbx   
mov rax,-$10[rdx]     mov rbx,rax     
jmp eax               mov rax,[rbx]   
                      jmp eax         

34 bytes              35 bytes

Ok, it's larger, but that explains only 117 extra bytes.  Maybe the
interaction with other optimizations explains the rest.

- anton
-- 
M. Anton Ertl  http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
     New standard: https://forth-standard.org/
   EuroForth 2024: https://euro.theforth.net