Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Ruvim <ruvim.pinka@gmail.com>
Newsgroups: comp.lang.forth
Subject: Re: Alternative for long parsing words
Date: Fri, 9 Aug 2024 17:37:20 +0400
Organization: A noiseless patient Spider
Lines: 209
Message-ID: <v9562h$jq4q$1@dont-email.me>
References: <a1aab44ee3b1b56c2f54f2606e98d040@www.novabbs.com>
 <23a44aa0445a30c0fc782819f48463f9@www.novabbs.com>
 <nnd$685d2a62$7d072f38@c2a291bc8c0eb1cd> <v8nrb0$3vbpv$3@dont-email.me>
 <nnd$4ced6d91$68fcde58@19347b2874c81786> <v8qnhv$n39d$1@dont-email.me>
 <2024Aug5.163310@mips.complang.tuwien.ac.at> <v8t4cj$1hulc$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 09 Aug 2024 15:37:23 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="e6355d34d82e1aafe1103e03dda1d08b";
	logging-data="649370"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1/6WlO5ZPBEdYpoDnPHGYoh"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:woHkOgk9DH+bRkT6zPY2Hq2dMUE=
Content-Language: en-US
In-Reply-To: <v8t4cj$1hulc$1@dont-email.me>
Bytes: 8386

On 2024-08-06 16:19, Ruvim wrote:
> On 2024-08-05 18:33, Anton Ertl wrote:
>> Ruvim <ruvim.pinka@gmail.com> writes:
>>> On 2024-08-05 14:04, albert@spenarnc.xs4all.nl wrote:
>>>> In article <v8nrb0$3vbpv$3@dont-email.me>,
>>>> Ruvim  <ruvim.pinka@gmail.com> wrote:
>>>>> But if you hate parsing words in principle (just because they do
>>>>> parsing), why not hate such long parsing words like `[if]`, `\`, the
>>>>> construct "]]...[[", etc? What is an alternative for them?
>> ...
>>> I meant the word `[IF]` by itself, without connection with `WANT`.
>>
>> Not necessarily a parsing word.  Could also be treated as something
>> like another state (i.e., the text interpreter does the parsing, but
>> does something different with the words than interpretation state or
>> compile state.
> [...]
>>
>> \ parses, but apart from interactions like above it looks fine to me.
>>
>>> The word `]]` is also a parsing word (in a standard-compliant
>>> implementation).
>>
>> You mean that implementing ]] as a standard program requires parsing.
>> That's true, but the usual implementation in systems is as another
>> state-like thing.
[...]
>>
>>> How to implement such functionality without active parsing the input 
>>> stream?
>>
>> How does :NONAME or ] implement its functionality?  Do you also
>> consider it a parsing word?  Note that in some Forth, Inc. Forth
>> systems ] parses on its own rather than using a state of the ordinary
>> text interpreter.
>>
> 
> 
> Yes, I mean that in a standard program the only approach available to 
> implement such functionality is the active parsing approach (at the 
> moment).
> 
> 
> The Recognizer API allows to replace short parsing words with 
> syntactically recognized forms (limited by one lexeme). Actually, it is 
> a generalization of numeric literals.
> 
> 
> But to implement string literals or string templates (for string 
> interpolation), we still need active parsing. And we need it even if we 
> implement the beginning of such a literal as a recognizable form.
> 
> An illustration of this two approaches:
> 
> - a parsing word `s"`:
>    s" lorem ipsum dolor"
> 
> - a recognizable form, a lexeme that starts with `"`:
>    "lorem ipsum dolor"
> 
> If a recognizer has no side effects and returns a token translator on 
> success, then, for a string literal, the returned translator *parses* 
> the input buffer (or the input stream) till `"` [1,2].
> 
> So, in general, parsing is *inevitable*.  Whether it is a parsing word 
> or a parsing translator — it does not matter.


One difference between parsing words and a recognizable syntactic forms 
is visual — the latter does not require a space before the enclosed 
content (this makes sense when whitespaces matter, like in string literals).

Another difference is related to API: `[']` and `postpone` are not 
applicable to the latter (but are applicable to the former).

NB: applying `postpone` to `s"` is not the same as applying `postpone` 
to a particular string literal.

So, a parsing word can be used by itself to reuse its functionality. But 
to reuse functionality of some recognizable syntactic form, additional 
words should be provided. It's notable that the Recognizers API 
encourages (or even forces) us to provide such additional words.


One of features that Recognizer API provides is the *ability* to reuse 
the system's Forth text interpreter loop without nesting.


An example:

   \ Common data types that are related to the Recognizer API:
   \   DataType: recognizer ⇒ xt
   \   DataType: tt ⇒ xt
   \   DataType: token ⇒ ( S: i*x  F: j*k )
   \   DataType: qt ⇒ ( token tt | 0 )
   \ Functional data types qualifications:
   \   DataType: recognizer = ( sd.lexeme -- qt )
   \   DataType: tt = ( i*x token -- j*x )


   wordlist constant foo-wid
   here constant foo-magic

   \ DataType: foo-sys ⇒ ( recognizer.prev x.foo-magic )

   : end-foo ( foo-sys -- )
     foo-magic <> -22 and throw \ "control structure mismatch"
     set-perceptor \ restore the perceptor state
   ;
   : recognize-foo ( sd.lexeme -- tt | 0 )
     2dup "}foo" equals if 2drop ['] end-foo exit then
     foo-wid search-wordlist if exit then 0
   ;
   : begin-foo ( -- foo-sys )
     \ save the perceptor state
     perceptor foo-magic ( recognizer.prev x.foo-magic ) ( foo-sys )
     \ set the system to use our recognizer
     ['] recognize-foo set-perceptor
   ;
   : foo{ ( -- foo-sys )
     begin-foo
   ; immediate
   \ NB: "}foo" is not a word, but just a terminator.

   \ create some test words
   get-current foo-wid set-current
   : test ." (test passed)" ;
   : n1 1 ;  : n2 2 ;  : + + ;  : . . ;
   set-current

   \ run some tests
   foo{ test }foo \ should print "(test passed)"
   foo{ n1 n2 + . }foo \ should print "3"
   t{ foo{ n1 n2 + }foo -> 3 }t

Voilà! The third test fails with error -22. Because of foo-sys, we 
cannot freely consume or produce stack parameters from/to outside of the 
foo{ }foo structure.

NB: this example is not intended to show how to implement this dummy 
functionality, but to show how the Recognizer API is used.


This functionality can be also implemented as a parsing word, using the 
same Recognizer API, as follows:

   : recognize-foo ( sd.lexeme -- tt | 0 )
     foo-wid search-wordlist if exit then 0
   ;
   : foo{
     [: ( sd.lexeme -- qt )
       2dup "}foo" equals 0= if recognize-foo exit then
       2drop ['] unnest-translation
     ;] translate-input-with
   ; immediate


Where the following common factors are used:

   : extract-lexeme ( -- sd.lexeme | 0 0 )
     begin parse-name dup if exit then 2drop refill 0= until 0 0
   ;
   : unnest-translation ( -- ⊥ )
     true abort" unnest-translation is not handled"
   ;
   : translate-input-till-unnest ( i*x -- j*x )
     begin extract-lexeme
       dup 0= -39 and throw \ "unexpected end of file"
       perceive dup 0= -13 and throw \ "unrecognized"
       dup ['] unnest-translation <> while execute
     repeat drop
   ;
   : translate-input-with ( i*x recognizer -- j*x )
     perceptor >r  set-perceptor
       ['] translate-input-till-unnest catch
========== REMAINDER OF ARTICLE TRUNCATED ==========