| Deutsch English Français Italiano |
|
<20240724112619.254@kylheku.com> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Kaz Kylheku <643-408-1753@kylheku.com> Newsgroups: comp.unix.shell Subject: Re: bash aesthetics question: special characters in reg exp in [[ ... =~~ ... ]] Date: Wed, 24 Jul 2024 18:35:51 -0000 (UTC) Organization: A noiseless patient Spider Lines: 56 Message-ID: <20240724112619.254@kylheku.com> References: <v7mknf$3plab$1@news.xmission.com> <v7n9s1$2p39$1@nnrp.usenet.blueworldhosting.com> <v7nfbb$3q3of$1@news.xmission.com> <20240723112050.105@kylheku.com> <87y15r650v.fsf@bsb.me.uk> <20240723202055.122@kylheku.com> <87sevz53qd.fsf@bsb.me.uk> Injection-Date: Wed, 24 Jul 2024 20:35:51 +0200 (CEST) Injection-Info: dont-email.me; posting-host="22dbec4c97aef40d7cd38abdf24b02a3"; logging-data="1949461"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+uvBPjY44snRs1zIMCrxKMwN5oeeURQAE=" User-Agent: slrn/pre1.0.4-9 (Linux) Cancel-Lock: sha1:xhgEIIU6fH3RycG8i55z/vj5q6A= Bytes: 3579 On 2024-07-24, Ben Bacarisse <ben@bsb.me.uk> wrote: > Kaz Kylheku <643-408-1753@kylheku.com> writes: > >> On 2024-07-23, Ben Bacarisse <ben@bsb.me.uk> wrote: >>> Kaz Kylheku <643-408-1753@kylheku.com> writes: >>>> This matters when regexes are used for matching a prefix of the input; >>>> if the regex is interpreted according to the theory should match >>>> the longest possible prefix; it cannot ignore R3, which matches >>>> thousands of symbols, because R2 matched three symbols. >>> >>> This is more a consequence of the different views. The in the formal >>> theory there is no notion of "matching". Regular expressions define >>> languages (i.e. sets of sequences of symbols) according to a recursive >>> set of rules. The whole idea of an RE matching a string is from their >>> use in practical applications. >> >> Under the set view, we can ask, what is the longest prefix of >> the input which belongs to the language R1|R2. The answer is the >> same for R2|R1, which denote the same set, since | corresponds >> to set union. > > What is "the input" in the set view. The set view is simply a recursive > definition of the language. It is a separate string under consideration. We have a set, and are asking the question "what is the longest prefix of the given string which is a member of the set". >> Broken regular expressions identify the longest prefix, except >> when the | operator is used; then they just identify a prefix, >> not necessarily longest. > > What is a "broken" RE in the set view? Inconsistency in being able to answer the question "what is the longest prefix of the string which is a member of the set". Broken regexes contain a pitfall: they deliver the right answer for expressions like ab*. If the input is "abbbbbbbc", they identify the entire "abbbbbbb" prefix. But if the branch operator is used, as in "a|ab*", oops, they short-circuit. The "a" matches a prefix of the input, and so that's done; no need to match the "ab*" part of the branch. The "a" prefix is in the language described from the language; a set element has been identified. But it's not the longest one. It is an inconsistency. If the longest match is not required, why bother finding one for "ab*"; for that expression, the "a" prefix could also just be returned. -- TXR Programming Language: http://nongnu.org/txr Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal Mastodon: @Kazinator@mstdn.ca