Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail From: anton@mips.complang.tuwien.ac.at (Anton Ertl) Newsgroups: comp.lang.forth Subject: Re: Efficient implementation of Finite State Machines (FSM) Date: Sun, 09 Mar 2025 16:29:23 GMT Organization: Institut fuer Computersprachen, Technische Universitaet Wien Lines: 224 Message-ID: <2025Mar9.172923@mips.complang.tuwien.ac.at> References: <2025Mar5.175013@mips.complang.tuwien.ac.at> <2025Mar5.200303@mips.complang.tuwien.ac.at> <001c29937cf8440ee5cc4a26135820af@www.novabbs.com> <2025Mar6.190930@mips.complang.tuwien.ac.at> Injection-Date: Sun, 09 Mar 2025 19:53:29 +0100 (CET) Injection-Info: dont-email.me; posting-host="7a5dffd29d7f11526c0b291691b25b9d"; logging-data="935922"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/BqGf+hp5Fyxlz40nzQjyM" Cancel-Lock: sha1:pp2E40z0F/T8BjWnH9cjTmQGqKE= X-newsreader: xrn 10.11 Bytes: 10490 Gerry Jackson writes: >Use of a wordlist, whose wid is held in an immediate constant, enables >easy linkage between states at compile time, eg a typical state action >in outline is: > >:noname > if state-x [ >order ] next-state [ previous ] > else state-y [ >order ] next-state [ previous ] >; this-state >order to next-state previous It means that Gforth will have to improve its SEE in order to point out the differences between the different NEXT-STATEs. Currently I get: /2 >order ok next-state xt-see noname : dup @ #1 and IF next-state ELSE next-state THEN ; ok >Disadvantages are: >- the Forth code doesn't give much idea of the overall operation of the >FSM (probably true for FSMs in general) That's definitely the case here. IIRC for Michael Gassanenko it was a demonstration of filtering and backtracking, but it's unclear to me how that transfers to FSMs. Anyway, let's look at the core: >: run-fsm ( ad xt -- ) begin dup while execute repeat 2drop ; This means that every state has to return to this loop to dispatch the next one. Now Gforth (development) has EXECUTE-EXIT, which allows tail-calling the next state for better efficiency. I have worked out an example: https://www.complang.tuwien.ac.at/forth/programs/fsm-ae.4th Let's first look at the example: The example recognizes and prints numbers in a text and ignores everything else. It terminates when it sees '$'. It has two states, one for being inside a number and one for outside: state outside-num state inside-num (Note that this is not the standar Forth word STATE). Then we define transitions: : out->out ( c-addr -- c-addr1 ) count outside-num transition ; ' out->out outside-num all-transitions The out->out transition is the simplest one: It fetches the next char (with COUNT) and switches to OUTSIDE-NUM. TRANSITION already starts the dispatch for that state and the next char; this (and maybe also COUNT) could be put in the general FSM interpreter (START-DFA), but by having TRANSITION in the individual transition actions (e.g., OUT->OUT), the implementation is more flexible, as we will see. At first OUT->OUT is put in transitions from OUTSIDE-NUM for all characters using ALL-TANSITIONS; later the transitions of various characters are overwritten: ' out->in '9' 1+ '0' outside-num some-transitions ' out->stop '$' outside-num one-transition Note that the stack effect comment for out->out is from the start of the word to the start of the next state-transition word; the actual stack effect depends on the implementation of transition. For more state transitions and the corresponding transition words see the source code. Example usage: s" 123 abc 456 df$" drop outside-num start-dfa \ prints "123 456" Now for the implementations: States are just arrays of xts, indexed by the character, and the xt is that of the transition from the state with that character. The implementation without EXECUTE-EXIT looks as follows: : transition ( c addr -- xt ) \ addr specifies the next state ]] swap th @ [[ ; immediate : stop ( c-addr -- 0 ) drop 0 ; : start-dfa ( c-addr addr -- ) swap count rot transition begin ( ... xt ) execute dup 0= until drop ; TRANSITION could be a plain colon definition here, but it is a macro in order to make it more competetive in Gforth with the EXECUTE-EXIT variant. Here the termination is performed by replacing the next c-addr with 0 and testing for 0 in the loop. An alternative implementation is to use EXECUTE-EXIT to tail-call the next transition: : transition ( c addr -- ) ]] swap th @ execute-exit [[ ; immediate : stop ( -- ) \ let the ";" behind the STOP do the stopping ]] drop [[ ; immediate : start-dfa ( c-addr addr -- ) \ let the dfa work on the string starting at c-addr, with initial \ state addr swap count rot transition ; Here TRANSITION contains the EXECUTE in the form of EXECUTE-EXIT, and so each transition word directly calls the next one, and no loop is necessary; with EXECUTE this would fill the return stack after a few thousand transitions, but EXECUTE-EXIT takes the return address off the return stack before EXECUTEing the next word and therefore can perform an indefinite number of state transitions. So how do we get out of the state machine? By not performing a transition; instead we simply return to the caller of START-DFA. I looked at the generated code and thought that we can get rid of the SWAP in the transition code by using the generalized constant folding feature of Gforth. This replaces the definition of TRANSITION with: : noopt-transition-compile, ( xt -- ) \ basic compile, implementation for TRANSITION drop ]] swap th @ execute-exit [[ ; : 1opt-transition-compile, ( xt -- ) drop lits> ]] cells literal + @ execute-exit [[ ; `noopt-transition-compile, 1 foldn: transition-compile, `1opt-transition-compile, 1 set-fold# : transition ( c addr -- ) \ dispatches the transition code for char C in state ADDR. The \ stack effect describes the state of the stack when the xt has \ been consumed, not when the EXECUTE-EXIT returns, if at all; \ either compile, or execute-exit this word in order to achieve \ tail-call optimization [ 0 noopt-transition-compile, ] ; ' transition-compile, set-optimizer The NOOPT-TRANSITION-COMPILE, implementation is used when TRANSITION is compiled without a preceding constant, 1OPT-TRANSITION-COMPILE, is used when 1 or more constants precede the compilation of TRANSITION. In the latter case code without the SWAP is generated. How fast are the variants. I have a benchmark that performs 1M iterations of processing a string of 100 non-digits (last char is '$', of course), and one where the processed string contains numbers. The latter just takes more instructions and cycles, but the former just performs 99 OUT->OUT transitions in each iteration and shows the basic performance of the whole scheme. Here are the results on a Zen4: loop plain optimized implementation variant 1_278_763_454 1_175_241_846 1_175_505_964 cycles 3_651_376_357 2_441_376_030 2_045_832_844 instructions "loop" is the version that has a loop; the others are the EXECUTE-EXIT variants, "plain" uses the macro, "optimized" tries to do better by recognizing that there is a constant in play. While "optimized" saves 4 instructions per transition compared to "plain", it does not save cycles. Overall the number of cycles is quite high given the number of instructions and the capabilities of the CPU. I'll take a look at it below. The loop variant costs 12 instructions and 1 cycle per transition more than the plain variant. Here's the code for OUT->OUT: ========== REMAINDER OF ARTICLE TRUNCATED ==========