Path: ...!goblin1!goblin.stu.neva.ru!eternal-september.org!feeder.eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail From: Ruvim Newsgroups: comp.lang.forth Subject: Recognizers and postponing (was: Progressing Matthias Trute's recognizer proposal) Date: Mon, 29 Jun 2020 15:33:41 +0300 Organization: A noiseless patient Spider Lines: 181 Message-ID: References: <2020Jun28.173040@mips.complang.tuwien.ac.at> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Date: Mon, 29 Jun 2020 12:33:41 -0000 (UTC) Injection-Info: reader02.eternal-september.org; posting-host="6048b26134cb5bd61ad2b70a54e747d1"; logging-data="29090"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19bGXAZwVqdt2g1S1B8yQ3j" User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.9.0 Cancel-Lock: sha1:WvMQ5XAiqOCAfkqbgbJkltIqXQY= In-Reply-To: <2020Jun28.173040@mips.complang.tuwien.ac.at> Content-Language: en-US X-Mozilla-News-Host: news://nntp.aioe.org Bytes: 6905 On 2020-06-28 18:30, Anton Ertl wrote: > Ruvim writes: >> By the Standard, in the glossary entry for POSTPONE >> https://forth-standard.org/standard/core/POSTPONE >> >> Skip leading space delimiters. Parse _name_ delimited by a space. >> Find _name_. Append the compilation semantics of _name_ >> to the current definition. >> >> >> If the Recognizers word set is provided, this specification can be >> updated to something like the following: >> >> Skip leading space delimiters. Parse _lexeme_ delimited by a space. >> Recognize _lexeme_. Append the compilation semantics for _lexeme_ >> to the current definition. > > Sounds ok (apart from the bikeshedded terminology). > >> It should append compilations semantics for *the same* _lexeme_ that was >> parsed. >> >> Let _lexeme_ is "foo{". So it should append compilation semantics for >> lexeme "foo{". Not for lexeme "foo{ bar }" or anything else. >> >> >> The same for string literals. >> A code like >> POSTPONE "foo bar" >> should be incorrect. It should not be exclusion from the general rule. > > That does not make sense. Your "recognize _lexeme_" invokes the > string recognizer, which parses ' bar"'. Definitions of the terms matter - https://git.io/JfhaI So it does make sense. to *recognize* a /lexeme/: to determine the interpretation semantics and the compilation semantics for the /lexeme/ in the current dynamic context. *recognizer*: a Forth definition that tries to recognize a lexeme producing a fully qualified token. Determining the semantics for a given lexeme cannot involve parsing the input buffer. Determining semantics can't have any side effects in principle. If some recognizer violates this specification — this "recognizer" just is out of the scope of the specification. If you want to append additional parsing, a proper way is an additional separate step after "parse lexeme delimited by a space". But the problem of postponing can be also solved in another way. > It's also more useful. > > : foo POSTPONE "foo bar" POSTPONE type ; immediate > : bar foo ; > bar \ prints "foo bar" Actually, it's more useful to have a proper way to compile fragments of code. The phrases like POSTPONE foo POSTPONE bar POSTPONE baz — are a pain. Also, if you need to compile a single string literal or a number, you don't need POSTPONE at all. "foo bar" slit, or 123 lit, are enough. > certainly works in Gforth. > It also makes more sense to those who know the string recognizer. But it makes far less sense when you don't know whether a recognizer does parsing or not. I provided several examples of a recognizer for string literals that doesn't do parsing. I use a "recognizer" for the "p{ ... }p" construct that does parsing (and even compilation). Some "recognizer" may even do parsing if interpreting, and avoid parsing if compiling, or vice versa (and I used such one also). NB: any tool will be used in all unexpected ways that are possible. So we should accurately limit all the possible ways, to guarantee that all possible various parts can work together. And it's a very good practical rule, that POSTPONE always extract from the input buffer only one space delimited lexeme and nothing more. Let POSTPONE be a last resort for back compatibility. Let's to don't touch it. >> If we want to postpone (compile) fragments of code, the right ways is >> something like c{ ... }c construct that properly works for *any* code. >> >> : postpone-my-fancy-code >> c{ "foo bar" ( c-addr u ) type }c >> >> c{ >> foo{ bar } [defined] x [if] x [then] >> }c >> ; >> >> : foo [ c{ "test passed" }c ] type ; foo > > Gforth has ]] ... [[, and > > : foo ]] "foo bar" type [[ ; immediate > : bar foo ; > bar > > works exactly like the version using POSTPONE above. > > It would take extra code to make it work differently. Yes. But it worth it. The following implementation for "]] ... [[" construct covers the most practical cases of your implementation and does not use the "postpone action" from a token descriptor at all! Do we really need this action? : xt, compile, ; : compile-compile-token? ( k*x td -- true | k*x td false ) td-nt over = if drop name>compile swap lit, xt, true exit then token>xt? if lit, 'xt, xt, true exit then postpone [: compile-token postpone ;] 'xt, xt, true \ There is a possibility to return false \ for some implementation defined cases ; : next-lexeme ( -- c-addr u|0 ) begin parse-name ?ET ( addr ) refill ?E0 drop again ; : ]] begin next-lexeme 2dup "[[" equals 0= while recognize-any dup ?nf compile-compile-token? 0= -32 and throw repeat 2drop ; Where compile-token ( i*x k*x td -- j*x ) \ perform the compilation semantics that are determined \ by the given fully qualified token ( k*x td ) recognize-any ( c-addr u -- k*x td | 0 ) \ recognize a lexeme using the recognizer \ that the Forth text interpreter currently uses ========== REMAINDER OF ARTICLE TRUNCATED ==========