Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: anton@mips.complang.tuwien.ac.at (Anton Ertl) Newsgroups: comp.lang.forth Subject: Code generation for DOES> in Gforth Date: Sat, 21 Sep 2024 17:25:51 GMT Organization: Institut fuer Computersprachen, Technische Universitaet Wien Lines: 103 Message-ID: <2024Sep21.192551@mips.complang.tuwien.ac.at> Injection-Date: Sat, 21 Sep 2024 20:27:14 +0200 (CEST) Injection-Info: dont-email.me; posting-host="a8b737df8ec612c03e4bb249933a0570"; logging-data="1793897"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19YPxxFVBd79OrOVnOAyfUl" Cancel-Lock: sha1:lgzXqSy2Rk1wkjK82CpWb3jy3Wg= X-newsreader: xrn 10.11 Bytes: 4433 I recently noticed that Gforth still used the following COMPILE, implementation for words defined with CREATE...SET-DOES> (and consequently also for words defined with CREATE...DOES>): : does, ( xt -- ) does-check ['] does-xt peephole-compile, , ; Ignore DOES-CHECK (it has to do with stack-depth checking, still incomplete). The rest means that it compiles the primitive DOES-XT with the xt of the COMPILE,d word as immediate argument. DOES-XT pushes the body of the word and then EXECUTEs the xt that SET-DOES> has registered for this word. In most cases this is a colon definition (always if DOES> is used), so the next thing that happens is DOCOL, and then the code for the colon definition is run. I have now replaced this with : does, ( xt -- ) does-check dup >body lit, >extra @ compile, ; What this does is to compile the body as a literal, and then it COMPILE,s the xt that DOES-XT would EXECUTE. In the common case of a colon definition this compiles a call to the colon definition. This saves the overhead of accessing the doesfield and of dispatching on its contents at run-time; all that is now done during compilation. Let us first look at the generated code. Consider the example: : myconst create , does> @ ; 5 myconst five : foo five ; SIMPLE-SEE FOO shows: old new $7F6F5CAE6BC8 does-xt 1->1 $7F46A7EA92B8 lit 1->1 $7F6F5CAE6BD0 five $7F46A7EA92C0 five $7F6F5CAE6BD8 ;s 1->1 ok $7F46A7EA92C8 call 1->1 $7F46A7EA92D0 $7F46A7C0A168 $7F46A7EA92D8 ;s 1->1 For the following microbenchmark: : d1 ( "name" -- ) create 0 , does> ( -- addr ) ; \ yes, an empty DOES> exists in an application program d1 z1 : bench-z1-comp ( -- ) iterations 0 ?do 1 z1 +! loop ; I see the following results per iteration (startup overhead included) on a Rocket Lake: old new 8.2 7.5 cycles:u 34.0 29.0 instructions:u 5.2 4.2 branches:u So five instructions less (including one branch), resulting in a small speedup for this microbenchmark. The Gforth image contained 129 occurences of does-xt and after the change it contains 12 (a part of the image is created with the cross-compiler, which still compiles to DOES-XT. As a result, the image size and gforth-fast (AMD64) native-code size in bytes are as follows: old new 2189364 2193264 image 448291 448659 native-code The larger image is no surprise. For the 117 replaced does-xts, the threaded code grows by 2 cells each, and the meta-data grows correspondingly. For the native code, the growth is not that expected. Let's see how the code looks: does-xt lit call add rbx,$10 mov $00[r13],r8 mov $00[r13],r8 sub r13,$08 mov r8,-$08[rbx] mov r8,$08[rbx] sub r13,$08 mov rax,$18[rbx] sub rbx,$08 sub r14,$08 mov rax,-$08[r8] add rbx,$20 mov rdx,$18[rax] mov [r14],rbx mov rax,-$10[rdx] mov rbx,rax jmp eax mov rax,[rbx] jmp eax 34 bytes 35 bytes Ok, it's larger, but that explains only 117 extra bytes. Maybe the interaction with other optimizations explains the rest. - anton -- M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html New standard: https://forth-standard.org/ EuroForth 2024: https://euro.theforth.net