Article <20241122101217.134@kylheku.com>

Deutsch English Français Italiano
<20241122101217.134@kylheku.com>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!eternal-september.org!feeder2.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Kaz Kylheku <643-408-1753@kylheku.com>
Newsgroups: comp.unix.shell,comp.unix.programmer,comp.lang.misc
Subject: Re: Command Languages Versus Programming Languages
Date: Fri, 22 Nov 2024 18:18:04 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 70
Message-ID: <20241122101217.134@kylheku.com>
References: <uu54la$3su5b$6@dont-email.me> <87edbtz43p.fsf@tudado.org>
 <0d2cnVzOmbD6f4z7nZ2dnZfqnPudnZ2d@brightview.co.uk>
 <uusur7$2hm6p$1@dont-email.me> <vdf096$2c9hb$8@dont-email.me>
 <87a5fdj7f2.fsf@doppelsaurus.mobileactivedefense.com>
 <ve83q2$33dfe$1@dont-email.me> <vgsbrv$sko5$1@dont-email.me>
 <vgtslt$16754$1@dont-email.me> <86frnmmxp7.fsf@red.stonehenge.com>
 <vhk65t$o5i$1@dont-email.me> <vhkev7$29sc$1@dont-email.me>
 <20241121110710.49@kylheku.com> <vhpl9c$14mdr$1@dont-email.me>
Injection-Date: Fri, 22 Nov 2024 19:18:04 +0100 (CET)
Injection-Info: dont-email.me; posting-host="9c13f6f155aa81285f56a101d8a781a7";
	logging-data="1355154"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1+1YjAEdL3pBFp3AfMoZy2zKLSkP+/qOCU="
User-Agent: slrn/pre1.0.4-9 (Linux)
Cancel-Lock: sha1:XoG/QI4HVnBAoehaicgMevaqX3g=
Bytes: 3898

On 2024-11-22, Muttley@DastartdlyHQ.org <Muttley@DastartdlyHQ.org> wrote:
> On Thu, 21 Nov 2024 19:12:03 -0000 (UTC)
> Kaz Kylheku <643-408-1753@kylheku.com> boring babbled:
>>On 2024-11-20, Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
>>> I'm curious what you mean by Regexps presented in a "procedural" form.
>>> Can you give some examples?
>>
>>Here is an example: using a regex match to capture a C comment /* ... */
>>in Lex compared to just recognizing the start sequence /* and handling
>>the discarding of the comment in the action.
>>
>>Without non-greedy repetition matching, the regex for a C comment is
>>quite obtuse. The procedural handling is straightforward: read
>>characters until you see a * immediately followed by a /.
>
> Its not that simple I'm afraid since comments can be commented out.

Umm, no.
>
> eg:
>
>    // int i; /*

This /* sequence is inside a // comment, and so the machinery that
recognizes /* as the start of a comment would never see it.

Just like "int i;" is in a string literal and so not recognized
as a keyword, whitespace, identifier and semicolon.

>    int j;
>    /*
>    int k;
>    */
>   ++j;
>
> A C99 and C++ compiler would see "int j" and compile it, a regex would
> simply remove everything from the first /* to */.

No, it won't, because that's not how regexes are used in a lexical
analyzer. At the start of the input, the lexical analyzer faces
the characters "// int i; /*\n".  This will trigger the pattern match
for // comments. Essentially that entire sequence through the newline
is treated as a kind of token, equivalent to a space.

Once a token is recognized and removed from the input, it is gone;
no other regular expression can match into it.

> Also the same probably applies to #ifdef's.

Lexically analyzing C requires implementing the translation phases
as described in the standard. There are preprocessor phases which
delimit the input into preprocessor tokens (pp-tokens). Comments
are stripped in preprocessing. But logical lines (backslash
continuations) are recognized below comments; i.e. this is one
comment:

  \\ comment \
  split \
  into \
  physical \
  lines

A lexical scanner can have an input routine which transparently handles
this low-level detail, so that it doesn't have to deal with the
line continuations in every token pattern.

-- 
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca