Deutsch   English   Français   Italiano  
<87msm65vjz.fsf@bsb.me.uk>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!weretis.net!feeder9.news.weretis.net!2.eu.feeder.erje.net!feeder.erje.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Ben Bacarisse <ben@bsb.me.uk>
Newsgroups: comp.unix.shell
Subject: Re: bash aesthetics question: special characters in reg exp in [[ ... =~~ ... ]]
Date: Wed, 24 Jul 2024 22:28:32 +0100
Organization: A noiseless patient Spider
Lines: 78
Message-ID: <87msm65vjz.fsf@bsb.me.uk>
References: <v7mknf$3plab$1@news.xmission.com>
	<v7n9s1$2p39$1@nnrp.usenet.blueworldhosting.com>
	<v7nfbb$3q3of$1@news.xmission.com> <20240723112050.105@kylheku.com>
	<87y15r650v.fsf@bsb.me.uk> <20240723202055.122@kylheku.com>
	<87sevz53qd.fsf@bsb.me.uk> <20240724112619.254@kylheku.com>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Date: Wed, 24 Jul 2024 23:28:32 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="cecbc031db9d83bdacabd1935f084c00";
	logging-data="2011422"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX18apQeZDw0Yg9JzT2JeL1iMPZBTRZv+j4Q="
User-Agent: Gnus/5.13 (Gnus v5.13)
Cancel-Lock: sha1:uXbT38DXzsoOsNWngBnJReZXESA=
	sha1:HfA35nOzaYo7LgglexX8GLZffr8=
X-BSB-Auth: 1.0864bdfbc636e95ba92f.20240724222832BST.87msm65vjz.fsf@bsb.me.uk
Bytes: 4744

Kaz Kylheku <643-408-1753@kylheku.com> writes:

> On 2024-07-24, Ben Bacarisse <ben@bsb.me.uk> wrote:
>> Kaz Kylheku <643-408-1753@kylheku.com> writes:
>>
>>> On 2024-07-23, Ben Bacarisse <ben@bsb.me.uk> wrote:
>>>> Kaz Kylheku <643-408-1753@kylheku.com> writes:
>>>>> This matters when regexes are used for matching a prefix of the input;
>>>>> if the regex is interpreted according to the theory should match
>>>>> the longest possible prefix; it cannot ignore R3, which matches
>>>>> thousands of symbols, because R2 matched three symbols.
>>>>
>>>> This is more a consequence of the different views. The in the formal
>>>> theory there is no notion of "matching".  Regular expressions define
>>>> languages (i.e. sets of sequences of symbols) according to a recursive
>>>> set of rules.  The whole idea of an RE matching a string is from their
>>>> use in practical applications.
>>>
>>> Under the set view, we can ask, what is the longest prefix of
>>> the input which belongs to the language R1|R2. The answer is the
>>> same for R2|R1, which denote the same set, since | corresponds
>>> to set union.
>>
>> What is "the input" in the set view.  The set view is simply a recursive
>> definition of the language.
>
> It is a separate string under consideration.
>
> We have a set, and are asking the question "what is the longest prefix
> of the given string which is a member of the set".

It's better, then, (as in the latter wording) not to use a term from the
"implementation" view of REs.

>>> Broken regular expressions identify the longest prefix, except
>>> when the | operator is used; then they just identify a prefix,
>>> not necessarily longest.
>>
>> What is a "broken" RE in the set view?
>
> Inconsistency in being able to answer the question "what is the longest
> prefix of the string which is a member of the set".
>
> Broken regexes contain a pitfall: they deliver the right answer
> for expressions like ab*. If the input is "abbbbbbbc",
>
> they identify the entire "abbbbbbb" prefix. But if the branch
> operator is used, as in "a|ab*", oops, they short-circuit.
> The "a" matches a prefix of the input, and so that's done; no need
> to match the "ab*" part of the branch.

I don't see any "pitfall".  The answer to you question "what is the
longest prefix of the given string which is a member of the set" is not
"a" and nothing in the either the formal definition of the language
"a|ab*" nor in the wording of the question is a pitfall.  The longest
prefix of "abbbbbbbc" that is in the language "a|ab*" is, unambiguously,
"abbbbbbb".

> The "a" prefix is in the language described from the language; a
> set element has been identified. But it's not the longest one.

Yes.  But there is no "pitfall" and the RE is not "broken" in any formal
sense at all.

An implementation might be broken and there are pitfalls to look out for
when viewing REs as patterns to match, but that's my whole point.  This
is all about the "other" view, not the view of REs as defining formal
languages.

> It is an inconsistency. If the longest match is not required, why
> bother finding one for "ab*"; for that expression, the "a" prefix could
> also just be returned.

We could, of course, ask about other prefixes of "abbbbbbbc" that are in
the language "a|ab*".  I don't see anything inconsistent here at all.

-- 
Ben.