Deutsch English Français Italiano |
<20241122101217.134@kylheku.com> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!eternal-september.org!feeder2.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Kaz Kylheku <643-408-1753@kylheku.com> Newsgroups: comp.unix.shell,comp.unix.programmer,comp.lang.misc Subject: Re: Command Languages Versus Programming Languages Date: Fri, 22 Nov 2024 18:18:04 -0000 (UTC) Organization: A noiseless patient Spider Lines: 70 Message-ID: <20241122101217.134@kylheku.com> References: <uu54la$3su5b$6@dont-email.me> <87edbtz43p.fsf@tudado.org> <0d2cnVzOmbD6f4z7nZ2dnZfqnPudnZ2d@brightview.co.uk> <uusur7$2hm6p$1@dont-email.me> <vdf096$2c9hb$8@dont-email.me> <87a5fdj7f2.fsf@doppelsaurus.mobileactivedefense.com> <ve83q2$33dfe$1@dont-email.me> <vgsbrv$sko5$1@dont-email.me> <vgtslt$16754$1@dont-email.me> <86frnmmxp7.fsf@red.stonehenge.com> <vhk65t$o5i$1@dont-email.me> <vhkev7$29sc$1@dont-email.me> <20241121110710.49@kylheku.com> <vhpl9c$14mdr$1@dont-email.me> Injection-Date: Fri, 22 Nov 2024 19:18:04 +0100 (CET) Injection-Info: dont-email.me; posting-host="9c13f6f155aa81285f56a101d8a781a7"; logging-data="1355154"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+1YjAEdL3pBFp3AfMoZy2zKLSkP+/qOCU=" User-Agent: slrn/pre1.0.4-9 (Linux) Cancel-Lock: sha1:XoG/QI4HVnBAoehaicgMevaqX3g= Bytes: 3898 On 2024-11-22, Muttley@DastartdlyHQ.org <Muttley@DastartdlyHQ.org> wrote: > On Thu, 21 Nov 2024 19:12:03 -0000 (UTC) > Kaz Kylheku <643-408-1753@kylheku.com> boring babbled: >>On 2024-11-20, Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote: >>> I'm curious what you mean by Regexps presented in a "procedural" form. >>> Can you give some examples? >> >>Here is an example: using a regex match to capture a C comment /* ... */ >>in Lex compared to just recognizing the start sequence /* and handling >>the discarding of the comment in the action. >> >>Without non-greedy repetition matching, the regex for a C comment is >>quite obtuse. The procedural handling is straightforward: read >>characters until you see a * immediately followed by a /. > > Its not that simple I'm afraid since comments can be commented out. Umm, no. > > eg: > > // int i; /* This /* sequence is inside a // comment, and so the machinery that recognizes /* as the start of a comment would never see it. Just like "int i;" is in a string literal and so not recognized as a keyword, whitespace, identifier and semicolon. > int j; > /* > int k; > */ > ++j; > > A C99 and C++ compiler would see "int j" and compile it, a regex would > simply remove everything from the first /* to */. No, it won't, because that's not how regexes are used in a lexical analyzer. At the start of the input, the lexical analyzer faces the characters "// int i; /*\n". This will trigger the pattern match for // comments. Essentially that entire sequence through the newline is treated as a kind of token, equivalent to a space. Once a token is recognized and removed from the input, it is gone; no other regular expression can match into it. > Also the same probably applies to #ifdef's. Lexically analyzing C requires implementing the translation phases as described in the standard. There are preprocessor phases which delimit the input into preprocessor tokens (pp-tokens). Comments are stripped in preprocessing. But logical lines (backslash continuations) are recognized below comments; i.e. this is one comment: \\ comment \ split \ into \ physical \ lines A lexical scanner can have an input routine which transparently handles this low-level detail, so that it doesn't have to deal with the line continuations in every token pattern. -- TXR Programming Language: http://nongnu.org/txr Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal Mastodon: @Kazinator@mstdn.ca