Deutsch   English   Français   Italiano  
<20240724112619.254@kylheku.com>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Kaz Kylheku <643-408-1753@kylheku.com>
Newsgroups: comp.unix.shell
Subject: Re: bash aesthetics question: special characters in reg exp in [[
 ... =~~ ... ]]
Date: Wed, 24 Jul 2024 18:35:51 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 56
Message-ID: <20240724112619.254@kylheku.com>
References: <v7mknf$3plab$1@news.xmission.com>
 <v7n9s1$2p39$1@nnrp.usenet.blueworldhosting.com>
 <v7nfbb$3q3of$1@news.xmission.com> <20240723112050.105@kylheku.com>
 <87y15r650v.fsf@bsb.me.uk> <20240723202055.122@kylheku.com>
 <87sevz53qd.fsf@bsb.me.uk>
Injection-Date: Wed, 24 Jul 2024 20:35:51 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="22dbec4c97aef40d7cd38abdf24b02a3";
	logging-data="1949461"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1+uvBPjY44snRs1zIMCrxKMwN5oeeURQAE="
User-Agent: slrn/pre1.0.4-9 (Linux)
Cancel-Lock: sha1:xhgEIIU6fH3RycG8i55z/vj5q6A=
Bytes: 3579

On 2024-07-24, Ben Bacarisse <ben@bsb.me.uk> wrote:
> Kaz Kylheku <643-408-1753@kylheku.com> writes:
>
>> On 2024-07-23, Ben Bacarisse <ben@bsb.me.uk> wrote:
>>> Kaz Kylheku <643-408-1753@kylheku.com> writes:
>>>> This matters when regexes are used for matching a prefix of the input;
>>>> if the regex is interpreted according to the theory should match
>>>> the longest possible prefix; it cannot ignore R3, which matches
>>>> thousands of symbols, because R2 matched three symbols.
>>>
>>> This is more a consequence of the different views. The in the formal
>>> theory there is no notion of "matching".  Regular expressions define
>>> languages (i.e. sets of sequences of symbols) according to a recursive
>>> set of rules.  The whole idea of an RE matching a string is from their
>>> use in practical applications.
>>
>> Under the set view, we can ask, what is the longest prefix of
>> the input which belongs to the language R1|R2. The answer is the
>> same for R2|R1, which denote the same set, since | corresponds
>> to set union.
>
> What is "the input" in the set view.  The set view is simply a recursive
> definition of the language.

It is a separate string under consideration.

We have a set, and are asking the question "what is the longest prefix
of the given string which is a member of the set".

>> Broken regular expressions identify the longest prefix, except
>> when the | operator is used; then they just identify a prefix,
>> not necessarily longest.
>
> What is a "broken" RE in the set view?

Inconsistency in being able to answer the question "what is the longest
prefix of the string which is a member of the set".

Broken regexes contain a pitfall: they deliver the right answer
for expressions like ab*. If the input is "abbbbbbbc",
they identify the entire "abbbbbbb" prefix. But if the branch
operator is used, as in "a|ab*", oops, they short-circuit.
The "a" matches a prefix of the input, and so that's done; no need
to match the "ab*" part of the branch.

The "a" prefix is in the language described from the language; a
set element has been identified. But it's not the longest one.

It is an inconsistency. If the longest match is not required, why
bother finding one for "ab*"; for that expression, the "a" prefix could
also just be returned.

-- 
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca