Deutsch English Français Italiano |
<vhq11q$nq7$1@reader2.panix.com> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!weretis.net!feeder9.news.weretis.net!panix!.POSTED.spitfire.i.gajendra.net!not-for-mail From: cross@spitfire.i.gajendra.net (Dan Cross) Newsgroups: comp.unix.shell,comp.unix.programmer,comp.lang.misc Subject: Re: Command Languages Versus Programming Languages Date: Fri, 22 Nov 2024 13:30:34 -0000 (UTC) Organization: PANIX Public Access Internet and UNIX, NYC Message-ID: <vhq11q$nq7$1@reader2.panix.com> References: <uu54la$3su5b$6@dont-email.me> <875xohbxre.fsf@doppelsaurus.mobileactivedefense.com> <vhngoi$2p6$1@reader2.panix.com> <874j40sk01.fsf@doppelsaurus.mobileactivedefense.com> Injection-Date: Fri, 22 Nov 2024 13:30:34 -0000 (UTC) Injection-Info: reader2.panix.com; posting-host="spitfire.i.gajendra.net:166.84.136.80"; logging-data="24391"; mail-complaints-to="abuse@panix.com" X-Newsreader: trn 4.0-test77 (Sep 1, 2010) Originator: cross@spitfire.i.gajendra.net (Dan Cross) Bytes: 5251 Lines: 124 In article <874j40sk01.fsf@doppelsaurus.mobileactivedefense.com>, Rainer Weikusat <rweikusat@talktalk.net> wrote: >cross@spitfire.i.gajendra.net (Dan Cross) writes: >> Rainer Weikusat <rweikusat@talktalk.net> wrote: >>>Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes: >>> >>>[...] >>> >>>> Personally I think that writing bulky procedural stuff for something >>>> like [0-9]+ can only be much worse, and that further abbreviations >>>> like \d+ are the better direction to go if targeting a good interface. >>>> YMMV. >>> >>>Assuming that p is a pointer to the current position in a string, e is a >>>pointer to the end of it (ie, point just past the last byte) and - >>>that's important - both are pointers to unsigned quantities, the 'bulky' >>>C equivalent of [0-9]+ is >>> >>>while (p < e && *p - '0' < 10) ++p; >>> >>>That's not too bad. And it's really a hell lot faster than a >>>general-purpose automaton programmed to recognize the same pattern >>>(which might not matter most of the time, but sometimes, it does). >> >> It's also not exactly right. `[0-9]+` would match one or more >> characters; this possibly matches 0 (ie, if `p` pointed to >> something that wasn't a digit). > >The regex won't match any digits if there aren't any. In this case, the >match will fail. I didn't include the code for handling that because it >seemed pretty pointless for the example. That's rather the point though, isn't it? The program snippet (modulo the promotion to signed int via the "usual arithmetic conversions" before the subtraction and comparison giving you unexpected values; nothing to do with whether `char` is signed or not) is a snippet that advances a pointer while it points to a digit, starting at the current pointer position; that is, it just increments a pointer over a run of digits. But that's not the same as a regex matcher, which has a semantic notion of success or failure. I could run your snippet against a string such as, say, "ZZZZZZ" and it would "succeed" just as it would against an empty string or a string of one or more digits. And then there are other matters of context; does the user intend for the regexp to match the _whole_ string? Or any portion of the string (a la `grep`)? So, for example, does the string "aaa1234aaa" match `[0-9]+`? As written, the above snippet is actually closer to advancing `p` over `^[0-9]*`. One might differentiate between `*` and `+` after the fact, by examining `p` against some (presumably saved) source value, but that's more code. These are just not equivalent. That's not to say that your snippet is not _useful_ in context, but to pretend that it's the same as the regular expression is pointlessly reductive. By the way, something that _would_ match `^[0-9]+$` might be: term% cat mdp.c #include <assert.h> #include <stdbool.h> #include <stddef.h> #include <stdio.h> #include <stdlib.h> #include <string.h> static bool mdigit(unsigned int c) { return c - '0' < 10; } bool mdp(const char *str, const char *estr) { if (str == NULL || estr == NULL || str == estr) return false; if (!mdigit(*str)) return false; while (str < estr && mdigit(*str)) str++; return str == estr; } bool probe(const char *s, bool expected) { if (mdp(s, s + strlen(s)) != expected) { fprintf(stderr, "test failure: `%s` (expected %s)\n", s, expected ? "true" : "false"); return false; } return true; } int main(void) { bool success = true; success = probe("1234", true) && success; success = probe("", false) && success; success = probe("ab", false) && success; success = probe("0", true) && success; success = probe("0123456789", true) && success; success = probe("a0123456", false) && success; success = probe("0123456b", false) && success; success = probe("0123c456", false) && success; success = probe("0123#456", false) && success; return success ? EXIT_SUCCESS : EXIT_FAILURE; } term% cc -Wall -Wextra -Werror -pedantic -std=c11 mdp.c -o mdp term% ./mdp term% echo $? 0 term% Granted the test scaffolding and `#include` boilerplate makes this appear rather longer than it would be in context, but it's still not nearly as succinct. - Dan C.