Deutsch   English   Français   Italiano  
<101fv4s$1g5c8$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!weretis.net!feeder9.news.weretis.net!news.quux.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail
From: Janis Papanagnou <janis_papanagnou+ng@hotmail.com>
Newsgroups: comp.lang.awk
Subject: Re: substr() - copying or not copying, that is here the question.
Date: Sun, 1 Jun 2025 00:16:58 +0200
Organization: A noiseless patient Spider
Lines: 59
Message-ID: <101fv4s$1g5c8$1@dont-email.me>
References: <101f9oo$18edp$1@dont-email.me>
 <683b5389$0$683$14726298@news.sunsite.dk>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 01 Jun 2025 00:17:00 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="5efe03dbd7af97f43c3764a2772b692a";
	logging-data="1578376"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1+Y9hgbcaxVsT7y3eJ/bl4u"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
 Thunderbird/45.8.0
Cancel-Lock: sha1:qIosECmLx/g2wrsQFrb3dx/A5fg=
In-Reply-To: <683b5389$0$683$14726298@news.sunsite.dk>
X-Enigmail-Draft-Status: N1110
Bytes: 3288

On 31.05.2025 21:07, Mack The Knife wrote:
> In article <101f9oo$18edp$1@dont-email.me>,
> Janis Papanagnou  <janis_papanagnou+ng@hotmail.com> wrote:
>> In the context   p=index(substr(t,s),r)
>> it would not be necessary to copy the substr(t,s),
>> the index() function could operate on the original
>> using some access "descriptor" (say, a pointer and
>> a length) in read-only mode.
>>
>> Will (GNU) Awk do a copy of the data value or does
>> it use a read-only descriptor access to the already
>> existing substring of variable "t"?
>>
>> Currently I'm playing with some huge data and copies
>> of MB sized data is costly (if it's repeatedly done
>> with various substr() subscripts).
> 
> substr() makes a copy. This is clear in the code.

Okay. Thanks for checking that!

> 
> It's almost impossible to do this via read-only descriptor.
> Consider something like
> 
> 	x = substr($0, 10, 15)
> 	getline
> 	print x

Well, it's possible to do that with a descriptor if GNU
Awk had a delayed/lazy evaluation principle implemented.
(Before 'getline' invalidates $0 a copy is necessary, of
course.)

(It's been reported that there's some optimizations in
GNU Awk implemented, so it could have also be the case
here. That's why I'm asking.)

> 
> Gawk manages the storage such that for something like
> your example the copy will be released after index()
> returns a value.

As said, I'm working on a huge string of data. What are
other options to efficiently work on substring parts of
the data? With the result of your code-check I don't see
a chance to achieve that with GNU or maybe any Awk using
only standard functionality.

Okay, maybe I could write an extension to work on memory
mapped files - the data originally stems from a file -
and seek/read through "C" mechanisms. (But that's huge
effort compared to some natively available function. And
then I'd probably better implement that straightly in "C"
instead of using Awk, in the first place, since I'd have
to implement the GNU Awk Extension anyway in "C".)

Janis