Path: ...!3.eu.feeder.erje.net!2.eu.feeder.erje.net!feeder.erje.net!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail From: Rainer Weikusat Newsgroups: comp.unix.shell,comp.unix.programmer,comp.lang.misc Subject: Re: Command Languages Versus Programming Languages Date: Fri, 22 Nov 2024 15:41:09 +0000 Lines: 79 Message-ID: <877c8vtgx6.fsf@doppelsaurus.mobileactivedefense.com> References: <875xohbxre.fsf@doppelsaurus.mobileactivedefense.com> <874j40sk01.fsf@doppelsaurus.mobileactivedefense.com> Mime-Version: 1.0 Content-Type: text/plain X-Trace: individual.net aYUphB09PBy6lcoaSj34Swm3i0lwPf03FmwSb7ble34V6ED2M= Cancel-Lock: sha1:h9PE3JBvHCiqY64ZP/coYDEUmXk= sha1:CEuBJr0uRnd6WR1Cq99Ai8M9hxU= sha256:+AdCiP/4Fm8VtwatV2sOxScePGqkkSde0Lk3PYua570= User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) Bytes: 4006 cross@spitfire.i.gajendra.net (Dan Cross) writes: > Rainer Weikusat wrote: >>cross@spitfire.i.gajendra.net (Dan Cross) writes: >>> Rainer Weikusat wrote: >>>>Janis Papanagnou writes: >>>> >>>>[...] >>>> >>>>> Personally I think that writing bulky procedural stuff for something >>>>> like [0-9]+ can only be much worse, and that further abbreviations >>>>> like \d+ are the better direction to go if targeting a good interface. >>>>> YMMV. >>>> >>>>Assuming that p is a pointer to the current position in a string, e is a >>>>pointer to the end of it (ie, point just past the last byte) and - >>>>that's important - both are pointers to unsigned quantities, the 'bulky' >>>>C equivalent of [0-9]+ is >>>> >>>>while (p < e && *p - '0' < 10) ++p; >>>> >>>>That's not too bad. And it's really a hell lot faster than a >>>>general-purpose automaton programmed to recognize the same pattern >>>>(which might not matter most of the time, but sometimes, it does). >>> >>> It's also not exactly right. `[0-9]+` would match one or more >>> characters; this possibly matches 0 (ie, if `p` pointed to >>> something that wasn't a digit). >> >>The regex won't match any digits if there aren't any. In this case, the >>match will fail. I didn't include the code for handling that because it >>seemed pretty pointless for the example. > > That's rather the point though, isn't it? The program snippet > (modulo the promotion to signed int via the "usual arithmetic > conversions" before the subtraction and comparison giving you > unexpected values; nothing to do with whether `char` is signed > or not) is a snippet that advances a pointer while it points to > a digit, starting at the current pointer position; that is, it > just increments a pointer over a run of digits. That's the core part of matching someting equivalent to the regex [0-9]+ and the only part of it is which is at least remotely interesting. > But that's not the same as a regex matcher, which has a semantic > notion of success or failure. I could run your snippet against > a string such as, say, "ZZZZZZ" and it would "succeed" just as > it would against an empty string or a string of one or more > digits. Why do you believe that p being equivalent to the starting position would be considered a "successful match", considering that this obviously doesn't make any sense? [...] > By the way, something that _would_ match `^[0-9]+$` might be: [too much code] Something which would match [0-9]+ in its first argument (if any) would be: #include "string.h" #include "stdlib.h" int main(int argc, char **argv) { char *p; unsigned c; p = argv[1]; if (!p) exit(1); while (c = *p, c && c - '0' > 10) ++p; if (!c) exit(1); return 0; } but that's 14 lines of text, 13 of which have absolutely no relation to the problem of recognizing a digit.