| Deutsch English Français Italiano |
|
<87msm65vjz.fsf@bsb.me.uk> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!weretis.net!feeder9.news.weretis.net!2.eu.feeder.erje.net!feeder.erje.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Ben Bacarisse <ben@bsb.me.uk> Newsgroups: comp.unix.shell Subject: Re: bash aesthetics question: special characters in reg exp in [[ ... =~~ ... ]] Date: Wed, 24 Jul 2024 22:28:32 +0100 Organization: A noiseless patient Spider Lines: 78 Message-ID: <87msm65vjz.fsf@bsb.me.uk> References: <v7mknf$3plab$1@news.xmission.com> <v7n9s1$2p39$1@nnrp.usenet.blueworldhosting.com> <v7nfbb$3q3of$1@news.xmission.com> <20240723112050.105@kylheku.com> <87y15r650v.fsf@bsb.me.uk> <20240723202055.122@kylheku.com> <87sevz53qd.fsf@bsb.me.uk> <20240724112619.254@kylheku.com> MIME-Version: 1.0 Content-Type: text/plain Injection-Date: Wed, 24 Jul 2024 23:28:32 +0200 (CEST) Injection-Info: dont-email.me; posting-host="cecbc031db9d83bdacabd1935f084c00"; logging-data="2011422"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18apQeZDw0Yg9JzT2JeL1iMPZBTRZv+j4Q=" User-Agent: Gnus/5.13 (Gnus v5.13) Cancel-Lock: sha1:uXbT38DXzsoOsNWngBnJReZXESA= sha1:HfA35nOzaYo7LgglexX8GLZffr8= X-BSB-Auth: 1.0864bdfbc636e95ba92f.20240724222832BST.87msm65vjz.fsf@bsb.me.uk Bytes: 4744 Kaz Kylheku <643-408-1753@kylheku.com> writes: > On 2024-07-24, Ben Bacarisse <ben@bsb.me.uk> wrote: >> Kaz Kylheku <643-408-1753@kylheku.com> writes: >> >>> On 2024-07-23, Ben Bacarisse <ben@bsb.me.uk> wrote: >>>> Kaz Kylheku <643-408-1753@kylheku.com> writes: >>>>> This matters when regexes are used for matching a prefix of the input; >>>>> if the regex is interpreted according to the theory should match >>>>> the longest possible prefix; it cannot ignore R3, which matches >>>>> thousands of symbols, because R2 matched three symbols. >>>> >>>> This is more a consequence of the different views. The in the formal >>>> theory there is no notion of "matching". Regular expressions define >>>> languages (i.e. sets of sequences of symbols) according to a recursive >>>> set of rules. The whole idea of an RE matching a string is from their >>>> use in practical applications. >>> >>> Under the set view, we can ask, what is the longest prefix of >>> the input which belongs to the language R1|R2. The answer is the >>> same for R2|R1, which denote the same set, since | corresponds >>> to set union. >> >> What is "the input" in the set view. The set view is simply a recursive >> definition of the language. > > It is a separate string under consideration. > > We have a set, and are asking the question "what is the longest prefix > of the given string which is a member of the set". It's better, then, (as in the latter wording) not to use a term from the "implementation" view of REs. >>> Broken regular expressions identify the longest prefix, except >>> when the | operator is used; then they just identify a prefix, >>> not necessarily longest. >> >> What is a "broken" RE in the set view? > > Inconsistency in being able to answer the question "what is the longest > prefix of the string which is a member of the set". > > Broken regexes contain a pitfall: they deliver the right answer > for expressions like ab*. If the input is "abbbbbbbc", > > they identify the entire "abbbbbbb" prefix. But if the branch > operator is used, as in "a|ab*", oops, they short-circuit. > The "a" matches a prefix of the input, and so that's done; no need > to match the "ab*" part of the branch. I don't see any "pitfall". The answer to you question "what is the longest prefix of the given string which is a member of the set" is not "a" and nothing in the either the formal definition of the language "a|ab*" nor in the wording of the question is a pitfall. The longest prefix of "abbbbbbbc" that is in the language "a|ab*" is, unambiguously, "abbbbbbb". > The "a" prefix is in the language described from the language; a > set element has been identified. But it's not the longest one. Yes. But there is no "pitfall" and the RE is not "broken" in any formal sense at all. An implementation might be broken and there are pitfalls to look out for when viewing REs as patterns to match, but that's my whole point. This is all about the "other" view, not the view of REs as defining formal languages. > It is an inconsistency. If the longest match is not required, why > bother finding one for "ab*"; for that expression, the "a" prefix could > also just be returned. We could, of course, ask about other prefixes of "abbbbbbbc" that are in the language "a|ab*". I don't see anything inconsistent here at all. -- Ben.