Deutsch   English   Français   Italiano  
<viac5m$l8oh$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Janis Papanagnou <janis_papanagnou+ng@hotmail.com>
Newsgroups: comp.lang.awk
Subject: GNU Awk's types of regular expressions
Date: Thu, 28 Nov 2024 19:18:29 +0100
Organization: A noiseless patient Spider
Lines: 53
Message-ID: <viac5m$l8oh$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 28 Nov 2024 19:18:31 +0100 (CET)
Injection-Info: dont-email.me; posting-host="4cf73dc104147ae903f27ef6d248be73";
	logging-data="697105"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1+O3aDpm+EixRjm7+IAXomT"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
 Thunderbird/45.8.0
Cancel-Lock: sha1:0uTTp0IRdOjG6/ZJpaynB//3MzI=
X-Mozilla-News-Host: news://news.eternal-september.org:119
X-Enigmail-Draft-Status: N1110
Bytes: 3001

In GNU Awk there's currently three types of regular expressions, in
addition to the standard regexp-constants (/regex/) and the dynamic
regexps ("regex", or variables containing "regex") there's in newer
versions also first class regexp objects (@/regex/, "Strongly Typed
Regexp Constants") supported.

One principal advantage of regexp-constants is that the engine to
parse the regexp can be created in advance, while a dynamic regexp
may be constructed dynamically (from strings) and needs an explicit
runtime-step to create the engine before the matching can be done.
Now I assumed that  @/regex-const/  would in that respect behave as
 /regex-const/ ... - until I found in the GNU Awk manual this text:

|
| Thus, if you have something like this:
|
|   re = @/don't panic/
|   sub(/don't/, "do", re)
|   print typeof(re), re
|
| then re retains its type, but now attempts to match the string ‘do
| panic’. This provides a (very indirect) way to create regexp-typed
| variables at runtime.
|

(I'm astonished that first class regexp objects can be dynamically
changed. But that is not my point here; I'm interested in potential
pre-compiles of regexp constants...)

This would imply that the first class regexp constants can be changed
like dynamic regexps and that there's no regexp pre-compile involved.
This would also rise suspicion that the "normal" regexp-constants are
probably also not precomputed.

So constant-regexps (both forms) have (only?) the advantage that the
regexp-syntax can be (initially during awk parsing) checked, e.g.,

 	re = @/don't panic[/
 	     ^ unterminated regexp

And dynamic regexps and first class regexps that got changed (e.g.
by code like

  sub(/don't/, "do[", re)

in above sample snippet) would both create runtime errors, e.g.

  error: Unmatched [, [^, [:, [., or [=: /do[ panic/
  fatal: could not make typed regex

(as all ill-formed regexp-types will produce a runtime error).

Janis