Article <vmrbac.3i8.1@stefan.msgid.phost.de>

Deutsch English Français Italiano
<vmrbac.3i8.1@stefan.msgid.phost.de>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: Stefan Reuther <stefan.news@arcor.de>
Newsgroups: comp.arch.embedded
Subject: Re: Static regex for embedded systems
Date: Wed, 22 Jan 2025 17:53:15 +0100
Lines: 42
Message-ID: <vmrbac.3i8.1@stefan.msgid.phost.de>
References: <vmob4o$3ssqn$2@dont-email.me>
 <vmok15.1gs.1@stefan.msgid.phost.de> <vmok1j$3ssqn$3@dont-email.me>
 <9me0pjpctevm2k0vjf07iei0a1isf58tqa@4ax.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-Trace: individual.net OSXBdmfOGKOy4ppsJBnLtwQkYUAcdFJmGVOaRkHDygHyqF/ZmF
Cancel-Lock: sha1:uMoUciZg61XLiVJPmwpR67HUhng= sha256:HDuI6cHEG2okQsW766r6zzwuV0duw7FaX0267yJzPws=
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:68.0) Gecko/20100101
 Thunderbird/68.12.1 Hamster/2.1.0.1538
In-Reply-To: <9me0pjpctevm2k0vjf07iei0a1isf58tqa@4ax.com>
Bytes: 2729

Am 22.01.2025 um 01:38 schrieb George Neuner:
> On Tue, 21 Jan 2025 18:03:48 +0100, pozz <pozzugno@gmail.com> wrote:
>>> (Personally, I have no problem with handcrafted parsers.)
> 
> So long as they are correct 8-)

Correctness has an inverse correlation with complexity, so optimize for
non-complexity.

I would implement a two-stage parser: first break the lines into a
buffer, then throw a bunch of statements like

   if (Parser p(str); p.matchString("+")
         && p.matchTextUntil(":", &prefix)
         && p.matchWhitespace() ...)

at this, with Parser being a small C++ class wrapping the individual
matching operations (strncmp, strspn, etc.)

Surely this is more complex as a regex/template, but still easy enough
to be "obviously correct".

> Lex and Flex create table driven lexers (and driver code for them).
> Under certain circumstances Flex can create far smaller tables than
> Lex, but likely either would be massive overkill for the scenario you
> described.

Maybe, maybe not. I find it hard to extrapolate to the complete task
from the two examples given. If there's hundreds of these templates,
that need to be matched bit-by-bit, I have the impression that lex would
be a quick and easy way to pull them out of a byte stream.

But splitting it into lines first, and then tackling each line on its
own (...using lex, maybe? Or any other tool. Or a parser class.) might
be a good option as well. For example, this can answer the question
whether linefeeds are required to be \r\n, or whether a single \n also
suffices, in a central place. And if you decide that you want to do a
hard connection close if you see a \r or \n outside a \r\n sequence (to
prevent an attack such as SMTP smuggling), that would be easy.


  Stefan