Deutsch English Français Italiano |
<vhqik7$nn0$1@reader2.panix.com> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!weretis.net!feeder9.news.weretis.net!panix!.POSTED.spitfire.i.gajendra.net!not-for-mail From: cross@spitfire.i.gajendra.net (Dan Cross) Newsgroups: comp.unix.shell,comp.unix.programmer,comp.lang.misc Subject: Re: Command Languages Versus Programming Languages Date: Fri, 22 Nov 2024 18:30:31 -0000 (UTC) Organization: PANIX Public Access Internet and UNIX, NYC Message-ID: <vhqik7$nn0$1@reader2.panix.com> References: <uu54la$3su5b$6@dont-email.me> <877c8vtgx6.fsf@doppelsaurus.mobileactivedefense.com> <sS30P.4663$YSkc.427@fx40.iad> <VZ30P.4664$YSkc.1894@fx40.iad> Injection-Date: Fri, 22 Nov 2024 18:30:31 -0000 (UTC) Injection-Info: reader2.panix.com; posting-host="spitfire.i.gajendra.net:166.84.136.80"; logging-data="24288"; mail-complaints-to="abuse@panix.com" X-Newsreader: trn 4.0-test77 (Sep 1, 2010) Originator: cross@spitfire.i.gajendra.net (Dan Cross) Bytes: 5333 Lines: 126 In article <VZ30P.4664$YSkc.1894@fx40.iad>, Scott Lurndal <slp53@pacbell.net> wrote: >scott@slp53.sl.home (Scott Lurndal) writes: >>Rainer Weikusat <rweikusat@talktalk.net> writes: >>>cross@spitfire.i.gajendra.net (Dan Cross) writes: >>>> Rainer Weikusat <rweikusat@talktalk.net> wrote: >>>>>cross@spitfire.i.gajendra.net (Dan Cross) writes: >>>>>> Rainer Weikusat <rweikusat@talktalk.net> wrote: >>>>>>>Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes: >>>>>>> >>>>>>>[...] >>>>>>> >>>>>>>> Personally I think that writing bulky procedural stuff for something >>>>>>>> like [0-9]+ can only be much worse, and that further abbreviations >>>>>>>> like \d+ are the better direction to go if targeting a good interface. >>>>>>>> YMMV. >>>>>>> >>>>>>>Assuming that p is a pointer to the current position in a string, e is a >>>>>>>pointer to the end of it (ie, point just past the last byte) and - >>>>>>>that's important - both are pointers to unsigned quantities, the 'bulky' >>>>>>>C equivalent of [0-9]+ is >>>>>>> >>>>>>>while (p < e && *p - '0' < 10) ++p; >>>>>>> >>>>>>>That's not too bad. And it's really a hell lot faster than a >>>>>>>general-purpose automaton programmed to recognize the same pattern >>>>>>>(which might not matter most of the time, but sometimes, it does). >>>>>> >>>>>> It's also not exactly right. `[0-9]+` would match one or more >>>>>> characters; this possibly matches 0 (ie, if `p` pointed to >>>>>> something that wasn't a digit). >>>>> >>>>>The regex won't match any digits if there aren't any. In this case, the >>>>>match will fail. I didn't include the code for handling that because it >>>>>seemed pretty pointless for the example. >>>> >>>> That's rather the point though, isn't it? The program snippet >>>> (modulo the promotion to signed int via the "usual arithmetic >>>> conversions" before the subtraction and comparison giving you >>>> unexpected values; nothing to do with whether `char` is signed >>>> or not) is a snippet that advances a pointer while it points to >>>> a digit, starting at the current pointer position; that is, it >>>> just increments a pointer over a run of digits. >>> >>>That's the core part of matching someting equivalent to the regex [0-9]+ >>>and the only part of it is which is at least remotely interesting. >>> >>>> But that's not the same as a regex matcher, which has a semantic >>>> notion of success or failure. I could run your snippet against >>>> a string such as, say, "ZZZZZZ" and it would "succeed" just as >>>> it would against an empty string or a string of one or more >>>> digits. >>> >>>Why do you believe that p being equivalent to the starting position >>>would be considered a "successful match", considering that this >>>obviously doesn't make any sense? >>> >>>[...] >>> >>>> By the way, something that _would_ match `^[0-9]+$` might be: >>> >>>[too much code] >>> >>>Something which would match [0-9]+ in its first argument (if any) would >>>be: >>> >>>#include "string.h" >>>#include "stdlib.h" >>> >>>int main(int argc, char **argv) >>>{ >>> char *p; >>> unsigned c; >>> >>> p = argv[1]; >>> if (!p) exit(1); >>> while (c = *p, c && c - '0' > 10) ++p; >>> if (!c) exit(1); >>> return 0; >>>} >>> >>>but that's 14 lines of text, 13 of which have absolutely no relation to >>>the problem of recognizing a digit. >> >>Personally, I'd use: > >Albeit this is limited to strings of digits that sum to less than >ULONG_MAX... It's not quite equivalent to his program, which just exit's with success if it sees any input string with a digit in it; your's is closer to what I wrote, which matches `^[0-9]+$`. His is not an interesting program and certainly not a recognizable equivalent a regular expression matcher in any reasonable sense, but I think the cognitive dissonance is too strong to get that across. - Dan C. >>$ cat /tmp/a.c >>#include <stdint.h> >>#include <string.h> >> >>int >>main(int argc, const char **argv) >>{ >> char *cp; >> uint64_t value; >> >> if (argc < 2) return 1; >> >> value = strtoull(argv[1], &cp, 10); >> if ((cp == argv[1]) >> || (*cp != '\0')) { >> return 1; >> } >> return 0; >>} >>$ cc -o /tmp/a /tmp/a.c >>$ /tmp/a 13254 >>$ echo $? >>0 >>$ /tmp/a 23v23 >>$ echo $? >>1