| Deutsch English Français Italiano |
|
<877c8vtgx6.fsf@doppelsaurus.mobileactivedefense.com> View for Bookmarking (what is this?) Look up another Usenet article |
Path: news.eternal-september.org!eternal-september.org!news.eternal-september.org!feeder2.eternal-september.org!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: Rainer Weikusat <rweikusat@talktalk.net>
Newsgroups: comp.unix.shell,comp.unix.programmer,comp.lang.misc
Subject: Re: Command Languages Versus Programming Languages
Date: Fri, 22 Nov 2024 15:41:09 +0000
Lines: 79
Message-ID: <877c8vtgx6.fsf@doppelsaurus.mobileactivedefense.com>
References: <uu54la$3su5b$6@dont-email.me>
<875xohbxre.fsf@doppelsaurus.mobileactivedefense.com>
<vhngoi$2p6$1@reader2.panix.com>
<874j40sk01.fsf@doppelsaurus.mobileactivedefense.com>
<vhq11q$nq7$1@reader2.panix.com>
Mime-Version: 1.0
Content-Type: text/plain
X-Trace: individual.net aYUphB09PBy6lcoaSj34Swm3i0lwPf03FmwSb7ble34V6ED2M=
Cancel-Lock: sha1:h9PE3JBvHCiqY64ZP/coYDEUmXk= sha1:CEuBJr0uRnd6WR1Cq99Ai8M9hxU= sha256:+AdCiP/4Fm8VtwatV2sOxScePGqkkSde0Lk3PYua570=
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux)
cross@spitfire.i.gajendra.net (Dan Cross) writes:
> Rainer Weikusat <rweikusat@talktalk.net> wrote:
>>cross@spitfire.i.gajendra.net (Dan Cross) writes:
>>> Rainer Weikusat <rweikusat@talktalk.net> wrote:
>>>>Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
>>>>
>>>>[...]
>>>>
>>>>> Personally I think that writing bulky procedural stuff for something
>>>>> like [0-9]+ can only be much worse, and that further abbreviations
>>>>> like \d+ are the better direction to go if targeting a good interface.
>>>>> YMMV.
>>>>
>>>>Assuming that p is a pointer to the current position in a string, e is a
>>>>pointer to the end of it (ie, point just past the last byte) and -
>>>>that's important - both are pointers to unsigned quantities, the 'bulky'
>>>>C equivalent of [0-9]+ is
>>>>
>>>>while (p < e && *p - '0' < 10) ++p;
>>>>
>>>>That's not too bad. And it's really a hell lot faster than a
>>>>general-purpose automaton programmed to recognize the same pattern
>>>>(which might not matter most of the time, but sometimes, it does).
>>>
>>> It's also not exactly right. `[0-9]+` would match one or more
>>> characters; this possibly matches 0 (ie, if `p` pointed to
>>> something that wasn't a digit).
>>
>>The regex won't match any digits if there aren't any. In this case, the
>>match will fail. I didn't include the code for handling that because it
>>seemed pretty pointless for the example.
>
> That's rather the point though, isn't it? The program snippet
> (modulo the promotion to signed int via the "usual arithmetic
> conversions" before the subtraction and comparison giving you
> unexpected values; nothing to do with whether `char` is signed
> or not) is a snippet that advances a pointer while it points to
> a digit, starting at the current pointer position; that is, it
> just increments a pointer over a run of digits.
That's the core part of matching someting equivalent to the regex [0-9]+
and the only part of it is which is at least remotely interesting.
> But that's not the same as a regex matcher, which has a semantic
> notion of success or failure. I could run your snippet against
> a string such as, say, "ZZZZZZ" and it would "succeed" just as
> it would against an empty string or a string of one or more
> digits.
Why do you believe that p being equivalent to the starting position
would be considered a "successful match", considering that this
obviously doesn't make any sense?
[...]
> By the way, something that _would_ match `^[0-9]+$` might be:
[too much code]
Something which would match [0-9]+ in its first argument (if any) would
be:
#include "string.h"
#include "stdlib.h"
int main(int argc, char **argv)
{
char *p;
unsigned c;
p = argv[1];
if (!p) exit(1);
while (c = *p, c && c - '0' > 10) ++p;
if (!c) exit(1);
return 0;
}
but that's 14 lines of text, 13 of which have absolutely no relation to
the problem of recognizing a digit.