Deutsch   English   Français   Italiano  
<87h60zrbea.fsf@bsb.me.uk>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail
From: Ben Bacarisse <ben@bsb.me.uk>
Newsgroups: comp.lang.awk
Subject: Re: substr() - copying or not copying, that is here the question.
Date: Sun, 01 Jun 2025 11:42:21 +0100
Organization: A noiseless patient Spider
Lines: 39
Message-ID: <87h60zrbea.fsf@bsb.me.uk>
References: <101f9oo$18edp$1@dont-email.me>
	<683b5389$0$683$14726298@news.sunsite.dk>
	<101fv4s$1g5c8$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Date: Sun, 01 Jun 2025 12:42:21 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="bc033858cb262d68d13e5a0a4dd4670f";
	logging-data="2113755"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1+DKYeJxguhE+0LI5Ng8sWHsJW1hFuD7P8="
User-Agent: Gnus/5.13 (Gnus v5.13)
Cancel-Lock: sha1:TLA65JGQfRKE/eVtyePRBW/pW98=
	sha1:88jZpU6B+YV6Nh3AV7Bz0pz6Tyg=
X-BSB-Auth: 1.d7ec601ee1e20a799fab.20250601114221BST.87h60zrbea.fsf@bsb.me.uk

Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:

> On 31.05.2025 21:07, Mack The Knife wrote:
>> In article <101f9oo$18edp$1@dont-email.me>,
>> Janis Papanagnou  <janis_papanagnou+ng@hotmail.com> wrote:
>>> In the context   p=index(substr(t,s),r)
>>> it would not be necessary to copy the substr(t,s),
>>> the index() function could operate on the original
>>> using some access "descriptor" (say, a pointer and
>>> a length) in read-only mode.
>>>
>>> Will (GNU) Awk do a copy of the data value or does
>>> it use a read-only descriptor access to the already
>>> existing substring of variable "t"?
>>>
>>> Currently I'm playing with some huge data and copies
>>> of MB sized data is costly (if it's repeatedly done
>>> with various substr() subscripts).
>> 
>> substr() makes a copy. This is clear in the code.
>
> Okay. Thanks for checking that!
....
> Okay, maybe I could write an extension to work on memory
> mapped files - the data originally stems from a file -
> and seek/read through "C" mechanisms. (But that's huge
> effort compared to some natively available function. And
> then I'd probably better implement that straightly in "C"
> instead of using Awk, in the first place, since I'd have
> to implement the GNU Awk Extension anyway in "C".)

An alternative (depending on the context) would be to consider an
extension that provides an index function with a third argument giving
the initial offset.  I've not looked at how extensions get access to
GAWK strings, so this many not be as easy as it sounds, but I would
guess that it might be relatively simple to do.

-- 
Ben.