Article <vhqebq$c71$1@reader2.panix.com>

Deutsch English Français Italiano
<vhqebq$c71$1@reader2.panix.com>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!news.misty.com!weretis.net!feeder9.news.weretis.net!panix!.POSTED.spitfire.i.gajendra.net!not-for-mail
From: cross@spitfire.i.gajendra.net (Dan Cross)
Newsgroups: comp.unix.shell,comp.unix.programmer,comp.lang.misc
Subject: Re: Command Languages Versus Programming Languages
Date: Fri, 22 Nov 2024 17:17:46 -0000 (UTC)
Organization: PANIX Public Access Internet and UNIX, NYC
Message-ID: <vhqebq$c71$1@reader2.panix.com>
References: <uu54la$3su5b$6@dont-email.me> <874j40sk01.fsf@doppelsaurus.mobileactivedefense.com> <vhq11q$nq7$1@reader2.panix.com> <877c8vtgx6.fsf@doppelsaurus.mobileactivedefense.com>
Injection-Date: Fri, 22 Nov 2024 17:17:46 -0000 (UTC)
Injection-Info: reader2.panix.com; posting-host="spitfire.i.gajendra.net:166.84.136.80";
	logging-data="12513"; mail-complaints-to="abuse@panix.com"
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
Originator: cross@spitfire.i.gajendra.net (Dan Cross)
Bytes: 4796
Lines: 117

In article <877c8vtgx6.fsf@doppelsaurus.mobileactivedefense.com>,
Rainer Weikusat  <rweikusat@talktalk.net> wrote:
>cross@spitfire.i.gajendra.net (Dan Cross) writes:
>> Rainer Weikusat  <rweikusat@talktalk.net> wrote:
>>>cross@spitfire.i.gajendra.net (Dan Cross) writes:
>>>> [snip]
>>>> It's also not exactly right.  `[0-9]+` would match one or more
>>>> characters; this possibly matches 0 (ie, if `p` pointed to
>>>> something that wasn't a digit).
>>>
>>>The regex won't match any digits if there aren't any. In this case, the
>>>match will fail. I didn't include the code for handling that because it
>>>seemed pretty pointless for the example.
>>
>> That's rather the point though, isn't it?  The program snippet
>> (modulo the promotion to signed int via the "usual arithmetic
>> conversions" before the subtraction and comparison giving you
>> unexpected values; nothing to do with whether `char` is signed
>> or not) is a snippet that advances a pointer while it points to
>> a digit, starting at the current pointer position; that is, it
>> just increments a pointer over a run of digits.
>
>That's the core part of matching someting equivalent to the regex [0-9]+
>and the only part of it is which is at least remotely interesting. 

Not really, no.  The interesting thing in this case appears to
be knowing whether or not the match succeeded, but you omited
that part.

>> But that's not the same as a regex matcher, which has a semantic
>> notion of success or failure.  I could run your snippet against
>> a string such as, say, "ZZZZZZ" and it would "succeed" just as
>> it would against an empty string or a string of one or more
>> digits.
>
>Why do you believe that p being equivalent to the starting position
>would be considered a "successful match", considering that this
>obviously doesn't make any sense?

Because absent any surrounding context, there's no indication
that the source is even saved.  You'll note that I did mention
that as a means to differentiate later on, but that's not the
snippet you posted.

>[...]
>
>> By the way, something that _would_ match `^[0-9]+$` might be:
>
>[too much code]
>
>Something which would match [0-9]+ in its first argument (if any) would
>be:
>
>#include "string.h"
>#include "stdlib.h"
>
>int main(int argc, char **argv)
>{
>    char *p;
>    unsigned c;
>
>    p = argv[1];
>    if (!p) exit(1);
>    while (c = *p, c && c - '0' > 10) ++p;
>    if (!c) exit(1);
>    return 0;
>}
>
>but that's 14 lines of text, 13 of which have absolutely no relation to
>the problem of recognizing a digit.

This is wrong in many ways.  Did you actually test that program?

First of all, why `"string.h"` and not `<string.h>`?  Ok, that's
not technically an error, but it's certainly unconventional, and
raises questions that are ultimately a distraction.

Second, suppose that `argc==0` (yes, this can happen under
POSIX).

Third, the loop: why `> 10`? Don't you mean `< 10`?  You are
trying to match digits, not non-digits.

Fourth, you exit with failure (`exit(1)`) if `!p` *and* if `!c`
at the end, but `!c` there means you've reached the end of the
string; which should be success.

Fifth and finally, you `return 0;` which is EXIT_SUCCESS, in the
failure case.

Compare:

#include <regex.h>
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>

int
main(int argc, char *argv[])
{
	regex_t reprog;
	int ret;

	if (argc != 2) {
		fprintf(stderr, "Usage: regexp pattern\n");
		return(EXIT_FAILURE);
	}
	(void)regcomp(&reprog, "^[0-9]+$", REG_EXTENDED | REG_NOSUB);
	ret = regexec(&reprog, argv[1], 0, NULL, 0);
	regfree(&reprog);

	return ret == 0 ? EXIT_SUCCESS : EXIT_FAILURE;
}

This is only marginally longer, but is correct.

	- Dan C.