Path: ...!weretis.net!feeder9.news.weretis.net!news.nk.ca!rocksolid2!i2pn2.org!.POSTED!not-for-mail From: George Neuner Newsgroups: comp.arch.embedded Subject: Re: Static regex for embedded systems Date: Wed, 22 Jan 2025 18:23:21 -0500 Organization: i2pn2 (i2pn.org) Message-ID: References: <9me0pjpctevm2k0vjf07iei0a1isf58tqa@4ax.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Injection-Info: i2pn2.org; logging-data="675769"; mail-complaints-to="usenet@i2pn2.org"; posting-account="h5eMH71iFfocGZucc+SnA0y5I+72/ecoTCcIjMd3Uww"; User-Agent: ForteAgent/8.00.32.1272 X-Spam-Checker-Version: SpamAssassin 4.0.0 Bytes: 4482 Lines: 84 On Wed, 22 Jan 2025 10:59:03 +0100, David Brown wrote: >On 22/01/2025 01:38, George Neuner wrote: >> On Tue, 21 Jan 2025 18:03:48 +0100, pozz wrote: >> >>> Il 21/01/2025 17:03, Stefan Reuther ha scritto: >>>> Am 21.01.2025 um 15:31 schrieb pozz: >>>>> Many times I need to parse/decode a text string that comes from an >>>>> external system, over a serial bus, MQTT, and so on. >>>>> >>>>> Many times this string has a fixed syntax/layout. In order to parse this >>>>> string, I everytime create a custom parser that can be tedious, >>>>> cumbersom and error prone. >>>> [...] >>>> >>>> I don't see a question in this posting, >>> >>> The hiddend question was if there's a better approach than handcrafted >>> parsers. >>> >>> >>>> but isn't this the task that >>>> 'lex' is intended to be used for? >>> >>> I will look at it. >>> >>> >>>> (Personally, I have no problem with handcrafted parsers.) >> >> So long as they are correct 8-) >> > >This is vital. You want a /lot/ of test cases to check the algorithm. > >> >>>> Stefan >> >> Lex and Flex create table driven lexers (and driver code for them). >> Under certain circumstances Flex can create far smaller tables than >> Lex, but likely either would be massive overkill for the scenario you >> described. >> >> Minding David's warnings about lexer size, if you really want to try >> using regex, I would recommend RE2C. RE2C is a preprocessor that >> generates simple recursive code to directly implement matching of >> regex strings in your code. There are versions available for several >> languages. >> https://re2c.org/ >> > >The "best" solution depends on the OP's knowledge, the variety of the >patterns needed, the resources of the target system, and restrictions on >things like programming language support. For example, the C++ template >based project I suggested earlier (which I have not tried myself) should >give quite efficient results, but it requires a modern C++ compiler. > >I think if the OP is only looking for a few patterns, or styles of >pattern, then regex's and powerful code generator systems are overkill. >It will take more work to learn and understand them, and code generated >by tools like lex and flex is not designed to be human-friendly, nor is >it likely to match well with coding standards for small embedded systems. > >I'd probably just have a series of matcher functions for different parts >(fixed string, numeric field as integer, flag field as boolean, etc.) >and have manual parsers for the different types. As a C++ user I'd be >returning std::optional<> types here and using the new "and_then" >methods to give neat chains, but a C programmer might want to pass a >pointer to a value variable and return "bool" for success. If I had a >lot of such patterns to match, then I might use templates for generating >the higher level matchers - for C, it would be either a macro system or >an external Python script. > >Or just use sscanf() :-) There /used/ to be some very small regex matchers that did not "compile", but just directly interpreted the contents of the pattern string. A page or three of code, reusable by every regex pattern in the program. Obviously they were limited to /simple/ matching: no Perl stuff like counting, looping, etc. Unfortunately I haven't seen any of these tiny regex implementations since the late '70s [coincidentally about when lex was becoming popular].