Deutsch   English   Français   Italiano  
<sus2pjl5vmtinvp54riq7janhvicaujp98@4ax.com>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!weretis.net!feeder9.news.weretis.net!news.nk.ca!rocksolid2!i2pn2.org!.POSTED!not-for-mail
From: George Neuner <gneuner2@comcast.net>
Newsgroups: comp.arch.embedded
Subject: Re: Static regex for embedded systems
Date: Wed, 22 Jan 2025 18:23:21 -0500
Organization: i2pn2 (i2pn.org)
Message-ID: <sus2pjl5vmtinvp54riq7janhvicaujp98@4ax.com>
References: <vmob4o$3ssqn$2@dont-email.me> <vmok15.1gs.1@stefan.msgid.phost.de> <vmok1j$3ssqn$3@dont-email.me> <9me0pjpctevm2k0vjf07iei0a1isf58tqa@4ax.com> <vmqfh7$uiuc$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Injection-Info: i2pn2.org;
	logging-data="675769"; mail-complaints-to="usenet@i2pn2.org";
	posting-account="h5eMH71iFfocGZucc+SnA0y5I+72/ecoTCcIjMd3Uww";
User-Agent: ForteAgent/8.00.32.1272
X-Spam-Checker-Version: SpamAssassin 4.0.0
Bytes: 4482
Lines: 84

On Wed, 22 Jan 2025 10:59:03 +0100, David Brown
<david.brown@hesbynett.no> wrote:

>On 22/01/2025 01:38, George Neuner wrote:
>> On Tue, 21 Jan 2025 18:03:48 +0100, pozz <pozzugno@gmail.com> wrote:
>> 
>>> Il 21/01/2025 17:03, Stefan Reuther ha scritto:
>>>> Am 21.01.2025 um 15:31 schrieb pozz:
>>>>> Many times I need to parse/decode a text string that comes from an
>>>>> external system, over a serial bus, MQTT, and so on.
>>>>>
>>>>> Many times this string has a fixed syntax/layout. In order to parse this
>>>>> string, I everytime create a custom parser that can be tedious,
>>>>> cumbersom and error prone.
>>>> [...]
>>>>
>>>> I don't see a question in this posting,
>>>
>>> The hiddend question was if there's a better approach than handcrafted
>>> parsers.
>>>
>>>
>>>> but isn't this the task that
>>>> 'lex' is intended to be used for?
>>>
>>> I will look at it.
>>>
>>>
>>>> (Personally, I have no problem with handcrafted parsers.)
>> 
>> So long as they are correct 8-)
>> 
>
>This is vital.  You want a /lot/ of test cases to check the algorithm.
>
>>   
>>>>     Stefan
>> 
>> Lex and Flex create table driven lexers (and driver code for them).
>> Under certain circumstances Flex can create far smaller tables than
>> Lex, but likely either would be massive overkill for the scenario you
>> described.
>> 
>> Minding David's warnings about lexer size, if you really want to try
>> using regex, I would recommend RE2C.  RE2C is a preprocessor that
>> generates simple recursive code to directly implement matching of
>> regex strings in your code. There are versions available for several
>> languages.
>> https://re2c.org/
>> 
>
>The "best" solution depends on the OP's knowledge, the variety of the 
>patterns needed, the resources of the target system, and restrictions on 
>things like programming language support.  For example, the C++ template 
>based project I suggested earlier (which I have not tried myself) should 
>give quite efficient results, but it requires a modern C++ compiler.
>
>I think if the OP is only looking for a few patterns, or styles of 
>pattern, then regex's and powerful code generator systems are overkill. 
>It will take more work to learn and understand them, and code generated 
>by tools like lex and flex is not designed to be human-friendly, nor is 
>it likely to match well with coding standards for small embedded systems.
>
>I'd probably just have a series of matcher functions for different parts 
>(fixed string, numeric field as integer, flag field as boolean, etc.) 
>and have manual parsers for the different types.  As a C++ user I'd be 
>returning std::optional<> types here and using the new "and_then" 
>methods to give neat chains, but a C programmer might want to pass a 
>pointer to a value variable and return "bool" for success.  If I had a 
>lot of such patterns to match, then I might use templates for generating 
>the higher level matchers - for C, it would be either a macro system or 
>an external Python script.
>
>Or just use sscanf() :-)

There /used/ to be some very small regex matchers that did not
"compile", but just directly interpreted the contents of the pattern
string.  A page or three of code, reusable by every regex pattern in
the program.

Obviously they were limited to /simple/ matching: no Perl stuff like
counting, looping, etc.  Unfortunately I haven't seen any of these
tiny regex implementations since the late '70s [coincidentally about
when lex was becoming popular].