Path: ...!3.eu.feeder.erje.net!2.eu.feeder.erje.net!feeder.erje.net!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: Rainer Weikusat <rweikusat@talktalk.net>
Newsgroups: comp.unix.shell,comp.unix.programmer,comp.lang.misc
Subject: Re: Command Languages Versus Programming Languages
Date: Fri, 22 Nov 2024 15:41:09 +0000
Lines: 79
Message-ID: <877c8vtgx6.fsf@doppelsaurus.mobileactivedefense.com>
References: <uu54la$3su5b$6@dont-email.me>
	<875xohbxre.fsf@doppelsaurus.mobileactivedefense.com>
	<vhngoi$2p6$1@reader2.panix.com>
	<874j40sk01.fsf@doppelsaurus.mobileactivedefense.com>
	<vhq11q$nq7$1@reader2.panix.com>
Mime-Version: 1.0
Content-Type: text/plain
X-Trace: individual.net aYUphB09PBy6lcoaSj34Swm3i0lwPf03FmwSb7ble34V6ED2M=
Cancel-Lock: sha1:h9PE3JBvHCiqY64ZP/coYDEUmXk= sha1:CEuBJr0uRnd6WR1Cq99Ai8M9hxU= sha256:+AdCiP/4Fm8VtwatV2sOxScePGqkkSde0Lk3PYua570=
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux)
Bytes: 4006

cross@spitfire.i.gajendra.net (Dan Cross) writes:
> Rainer Weikusat  <rweikusat@talktalk.net> wrote:
>>cross@spitfire.i.gajendra.net (Dan Cross) writes:
>>> Rainer Weikusat  <rweikusat@talktalk.net> wrote:
>>>>Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
>>>>
>>>>[...]
>>>>
>>>>> Personally I think that writing bulky procedural stuff for something
>>>>> like [0-9]+ can only be much worse, and that further abbreviations
>>>>> like \d+ are the better direction to go if targeting a good interface.
>>>>> YMMV.
>>>>
>>>>Assuming that p is a pointer to the current position in a string, e is a
>>>>pointer to the end of it (ie, point just past the last byte) and -
>>>>that's important - both are pointers to unsigned quantities, the 'bulky'
>>>>C equivalent of [0-9]+ is
>>>>
>>>>while (p < e && *p - '0' < 10) ++p;
>>>>
>>>>That's not too bad. And it's really a hell lot faster than a
>>>>general-purpose automaton programmed to recognize the same pattern
>>>>(which might not matter most of the time, but sometimes, it does).
>>>
>>> It's also not exactly right.  `[0-9]+` would match one or more
>>> characters; this possibly matches 0 (ie, if `p` pointed to
>>> something that wasn't a digit).
>>
>>The regex won't match any digits if there aren't any. In this case, the
>>match will fail. I didn't include the code for handling that because it
>>seemed pretty pointless for the example.
>
> That's rather the point though, isn't it?  The program snippet
> (modulo the promotion to signed int via the "usual arithmetic
> conversions" before the subtraction and comparison giving you
> unexpected values; nothing to do with whether `char` is signed
> or not) is a snippet that advances a pointer while it points to
> a digit, starting at the current pointer position; that is, it
> just increments a pointer over a run of digits.

That's the core part of matching someting equivalent to the regex [0-9]+
and the only part of it is which is at least remotely interesting. 

> But that's not the same as a regex matcher, which has a semantic
> notion of success or failure.  I could run your snippet against
> a string such as, say, "ZZZZZZ" and it would "succeed" just as
> it would against an empty string or a string of one or more
> digits.

Why do you believe that p being equivalent to the starting position
would be considered a "successful match", considering that this
obviously doesn't make any sense?

[...]

> By the way, something that _would_ match `^[0-9]+$` might be:

[too much code]

Something which would match [0-9]+ in its first argument (if any) would
be:

#include "string.h"
#include "stdlib.h"

int main(int argc, char **argv)
{
    char *p;
    unsigned c;

    p = argv[1];
    if (!p) exit(1);
    while (c = *p, c && c - '0' > 10) ++p;
    if (!c) exit(1);
    return 0;
}

but that's 14 lines of text, 13 of which have absolutely no relation to
the problem of recognizing a digit.