| Deutsch English Français Italiano |
|
<101fv4s$1g5c8$1@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!weretis.net!feeder9.news.weretis.net!news.quux.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail From: Janis Papanagnou <janis_papanagnou+ng@hotmail.com> Newsgroups: comp.lang.awk Subject: Re: substr() - copying or not copying, that is here the question. Date: Sun, 1 Jun 2025 00:16:58 +0200 Organization: A noiseless patient Spider Lines: 59 Message-ID: <101fv4s$1g5c8$1@dont-email.me> References: <101f9oo$18edp$1@dont-email.me> <683b5389$0$683$14726298@news.sunsite.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Injection-Date: Sun, 01 Jun 2025 00:17:00 +0200 (CEST) Injection-Info: dont-email.me; posting-host="5efe03dbd7af97f43c3764a2772b692a"; logging-data="1578376"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+Y9hgbcaxVsT7y3eJ/bl4u" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 Cancel-Lock: sha1:qIosECmLx/g2wrsQFrb3dx/A5fg= In-Reply-To: <683b5389$0$683$14726298@news.sunsite.dk> X-Enigmail-Draft-Status: N1110 Bytes: 3288 On 31.05.2025 21:07, Mack The Knife wrote: > In article <101f9oo$18edp$1@dont-email.me>, > Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote: >> In the context p=index(substr(t,s),r) >> it would not be necessary to copy the substr(t,s), >> the index() function could operate on the original >> using some access "descriptor" (say, a pointer and >> a length) in read-only mode. >> >> Will (GNU) Awk do a copy of the data value or does >> it use a read-only descriptor access to the already >> existing substring of variable "t"? >> >> Currently I'm playing with some huge data and copies >> of MB sized data is costly (if it's repeatedly done >> with various substr() subscripts). > > substr() makes a copy. This is clear in the code. Okay. Thanks for checking that! > > It's almost impossible to do this via read-only descriptor. > Consider something like > > x = substr($0, 10, 15) > getline > print x Well, it's possible to do that with a descriptor if GNU Awk had a delayed/lazy evaluation principle implemented. (Before 'getline' invalidates $0 a copy is necessary, of course.) (It's been reported that there's some optimizations in GNU Awk implemented, so it could have also be the case here. That's why I'm asking.) > > Gawk manages the storage such that for something like > your example the copy will be released after index() > returns a value. As said, I'm working on a huge string of data. What are other options to efficiently work on substring parts of the data? With the result of your code-check I don't see a chance to achieve that with GNU or maybe any Awk using only standard functionality. Okay, maybe I could write an extension to work on memory mapped files - the data originally stems from a file - and seek/read through "C" mechanisms. (But that's huge effort compared to some natively available function. And then I'd probably better implement that straightly in "C" instead of using Awk, in the first place, since I'd have to implement the GNU Awk Extension anyway in "C".) Janis