Deutsch English Français Italiano |
<viac5m$l8oh$1@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Janis Papanagnou <janis_papanagnou+ng@hotmail.com> Newsgroups: comp.lang.awk Subject: GNU Awk's types of regular expressions Date: Thu, 28 Nov 2024 19:18:29 +0100 Organization: A noiseless patient Spider Lines: 53 Message-ID: <viac5m$l8oh$1@dont-email.me> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Injection-Date: Thu, 28 Nov 2024 19:18:31 +0100 (CET) Injection-Info: dont-email.me; posting-host="4cf73dc104147ae903f27ef6d248be73"; logging-data="697105"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+O3aDpm+EixRjm7+IAXomT" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 Cancel-Lock: sha1:0uTTp0IRdOjG6/ZJpaynB//3MzI= X-Mozilla-News-Host: news://news.eternal-september.org:119 X-Enigmail-Draft-Status: N1110 Bytes: 3001 In GNU Awk there's currently three types of regular expressions, in addition to the standard regexp-constants (/regex/) and the dynamic regexps ("regex", or variables containing "regex") there's in newer versions also first class regexp objects (@/regex/, "Strongly Typed Regexp Constants") supported. One principal advantage of regexp-constants is that the engine to parse the regexp can be created in advance, while a dynamic regexp may be constructed dynamically (from strings) and needs an explicit runtime-step to create the engine before the matching can be done. Now I assumed that @/regex-const/ would in that respect behave as /regex-const/ ... - until I found in the GNU Awk manual this text: | | Thus, if you have something like this: | | re = @/don't panic/ | sub(/don't/, "do", re) | print typeof(re), re | | then re retains its type, but now attempts to match the string ‘do | panic’. This provides a (very indirect) way to create regexp-typed | variables at runtime. | (I'm astonished that first class regexp objects can be dynamically changed. But that is not my point here; I'm interested in potential pre-compiles of regexp constants...) This would imply that the first class regexp constants can be changed like dynamic regexps and that there's no regexp pre-compile involved. This would also rise suspicion that the "normal" regexp-constants are probably also not precomputed. So constant-regexps (both forms) have (only?) the advantage that the regexp-syntax can be (initially during awk parsing) checked, e.g., re = @/don't panic[/ ^ unterminated regexp And dynamic regexps and first class regexps that got changed (e.g. by code like sub(/don't/, "do[", re) in above sample snippet) would both create runtime errors, e.g. error: Unmatched [, [^, [:, [., or [=: /do[ panic/ fatal: could not make typed regex (as all ill-formed regexp-types will produce a runtime error). Janis