| Deutsch English Français Italiano |
|
<87h60zrbea.fsf@bsb.me.uk> View for Bookmarking (what is this?) Look up another Usenet article |
Path: news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail From: Ben Bacarisse <ben@bsb.me.uk> Newsgroups: comp.lang.awk Subject: Re: substr() - copying or not copying, that is here the question. Date: Sun, 01 Jun 2025 11:42:21 +0100 Organization: A noiseless patient Spider Lines: 39 Message-ID: <87h60zrbea.fsf@bsb.me.uk> References: <101f9oo$18edp$1@dont-email.me> <683b5389$0$683$14726298@news.sunsite.dk> <101fv4s$1g5c8$1@dont-email.me> MIME-Version: 1.0 Content-Type: text/plain Injection-Date: Sun, 01 Jun 2025 12:42:21 +0200 (CEST) Injection-Info: dont-email.me; posting-host="bc033858cb262d68d13e5a0a4dd4670f"; logging-data="2113755"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+DKYeJxguhE+0LI5Ng8sWHsJW1hFuD7P8=" User-Agent: Gnus/5.13 (Gnus v5.13) Cancel-Lock: sha1:TLA65JGQfRKE/eVtyePRBW/pW98= sha1:88jZpU6B+YV6Nh3AV7Bz0pz6Tyg= X-BSB-Auth: 1.d7ec601ee1e20a799fab.20250601114221BST.87h60zrbea.fsf@bsb.me.uk Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes: > On 31.05.2025 21:07, Mack The Knife wrote: >> In article <101f9oo$18edp$1@dont-email.me>, >> Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote: >>> In the context p=index(substr(t,s),r) >>> it would not be necessary to copy the substr(t,s), >>> the index() function could operate on the original >>> using some access "descriptor" (say, a pointer and >>> a length) in read-only mode. >>> >>> Will (GNU) Awk do a copy of the data value or does >>> it use a read-only descriptor access to the already >>> existing substring of variable "t"? >>> >>> Currently I'm playing with some huge data and copies >>> of MB sized data is costly (if it's repeatedly done >>> with various substr() subscripts). >> >> substr() makes a copy. This is clear in the code. > > Okay. Thanks for checking that! .... > Okay, maybe I could write an extension to work on memory > mapped files - the data originally stems from a file - > and seek/read through "C" mechanisms. (But that's huge > effort compared to some natively available function. And > then I'd probably better implement that straightly in "C" > instead of using Awk, in the first place, since I'd have > to implement the GNU Awk Extension anyway in "C".) An alternative (depending on the context) would be to consider an extension that provides an index function with a third argument giving the initial offset. I've not looked at how extensions get access to GAWK strings, so this many not be as easy as it sounds, but I would guess that it might be relatively simple to do. -- Ben.