Article <vhqik7$nn0$1@reader2.panix.com>

Deutsch English Français Italiano
<vhqik7$nn0$1@reader2.panix.com>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!weretis.net!feeder9.news.weretis.net!panix!.POSTED.spitfire.i.gajendra.net!not-for-mail
From: cross@spitfire.i.gajendra.net (Dan Cross)
Newsgroups: comp.unix.shell,comp.unix.programmer,comp.lang.misc
Subject: Re: Command Languages Versus Programming Languages
Date: Fri, 22 Nov 2024 18:30:31 -0000 (UTC)
Organization: PANIX Public Access Internet and UNIX, NYC
Message-ID: <vhqik7$nn0$1@reader2.panix.com>
References: <uu54la$3su5b$6@dont-email.me> <877c8vtgx6.fsf@doppelsaurus.mobileactivedefense.com> <sS30P.4663$YSkc.427@fx40.iad> <VZ30P.4664$YSkc.1894@fx40.iad>
Injection-Date: Fri, 22 Nov 2024 18:30:31 -0000 (UTC)
Injection-Info: reader2.panix.com; posting-host="spitfire.i.gajendra.net:166.84.136.80";
	logging-data="24288"; mail-complaints-to="abuse@panix.com"
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
Originator: cross@spitfire.i.gajendra.net (Dan Cross)
Bytes: 5333
Lines: 126

In article <VZ30P.4664$YSkc.1894@fx40.iad>,
Scott Lurndal <slp53@pacbell.net> wrote:
>scott@slp53.sl.home (Scott Lurndal) writes:
>>Rainer Weikusat <rweikusat@talktalk.net> writes:
>>>cross@spitfire.i.gajendra.net (Dan Cross) writes:
>>>> Rainer Weikusat  <rweikusat@talktalk.net> wrote:
>>>>>cross@spitfire.i.gajendra.net (Dan Cross) writes:
>>>>>> Rainer Weikusat  <rweikusat@talktalk.net> wrote:
>>>>>>>Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
>>>>>>>
>>>>>>>[...]
>>>>>>>
>>>>>>>> Personally I think that writing bulky procedural stuff for something
>>>>>>>> like [0-9]+ can only be much worse, and that further abbreviations
>>>>>>>> like \d+ are the better direction to go if targeting a good interface.
>>>>>>>> YMMV.
>>>>>>>
>>>>>>>Assuming that p is a pointer to the current position in a string, e is a
>>>>>>>pointer to the end of it (ie, point just past the last byte) and -
>>>>>>>that's important - both are pointers to unsigned quantities, the 'bulky'
>>>>>>>C equivalent of [0-9]+ is
>>>>>>>
>>>>>>>while (p < e && *p - '0' < 10) ++p;
>>>>>>>
>>>>>>>That's not too bad. And it's really a hell lot faster than a
>>>>>>>general-purpose automaton programmed to recognize the same pattern
>>>>>>>(which might not matter most of the time, but sometimes, it does).
>>>>>>
>>>>>> It's also not exactly right.  `[0-9]+` would match one or more
>>>>>> characters; this possibly matches 0 (ie, if `p` pointed to
>>>>>> something that wasn't a digit).
>>>>>
>>>>>The regex won't match any digits if there aren't any. In this case, the
>>>>>match will fail. I didn't include the code for handling that because it
>>>>>seemed pretty pointless for the example.
>>>>
>>>> That's rather the point though, isn't it?  The program snippet
>>>> (modulo the promotion to signed int via the "usual arithmetic
>>>> conversions" before the subtraction and comparison giving you
>>>> unexpected values; nothing to do with whether `char` is signed
>>>> or not) is a snippet that advances a pointer while it points to
>>>> a digit, starting at the current pointer position; that is, it
>>>> just increments a pointer over a run of digits.
>>>
>>>That's the core part of matching someting equivalent to the regex [0-9]+
>>>and the only part of it is which is at least remotely interesting. 
>>>
>>>> But that's not the same as a regex matcher, which has a semantic
>>>> notion of success or failure.  I could run your snippet against
>>>> a string such as, say, "ZZZZZZ" and it would "succeed" just as
>>>> it would against an empty string or a string of one or more
>>>> digits.
>>>
>>>Why do you believe that p being equivalent to the starting position
>>>would be considered a "successful match", considering that this
>>>obviously doesn't make any sense?
>>>
>>>[...]
>>>
>>>> By the way, something that _would_ match `^[0-9]+$` might be:
>>>
>>>[too much code]
>>>
>>>Something which would match [0-9]+ in its first argument (if any) would
>>>be:
>>>
>>>#include "string.h"
>>>#include "stdlib.h"
>>>
>>>int main(int argc, char **argv)
>>>{
>>>    char *p;
>>>    unsigned c;
>>>
>>>    p = argv[1];
>>>    if (!p) exit(1);
>>>    while (c = *p, c && c - '0' > 10) ++p;
>>>    if (!c) exit(1);
>>>    return 0;
>>>}
>>>
>>>but that's 14 lines of text, 13 of which have absolutely no relation to
>>>the problem of recognizing a digit.
>>
>>Personally, I'd use:
>
>Albeit this is limited to strings of digits that sum to less than
>ULONG_MAX...

It's not quite equivalent to his program, which just exit's with
success if it sees any input string with a digit in it; your's
is closer to what I wrote, which matches `^[0-9]+$`.  His is not
an interesting program and certainly not a recognizable
equivalent a regular expression matcher in any reasonable sense,
but I think the cognitive dissonance is too strong to get that
across.

	- Dan C.

>>$ cat /tmp/a.c
>>#include <stdint.h>
>>#include <string.h>
>>
>>int
>>main(int argc, const char **argv)
>>{
>>    char *cp;
>>    uint64_t value;
>>
>>    if (argc < 2) return 1;
>>
>>    value = strtoull(argv[1], &cp, 10);
>>    if ((cp == argv[1])
>>     || (*cp != '\0')) {
>>        return 1;
>>    }
>>   return 0;
>>}
>>$ cc -o /tmp/a /tmp/a.c
>>$ /tmp/a 13254
>>$ echo $?
>>0
>>$ /tmp/a 23v23
>>$ echo $?
>>1