Path: ...!news.nobody.at!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: David Brown <david.brown@hesbynett.no>
Newsgroups: comp.lang.c
Subject: Re: realloc() - frequency, conditions, or experiences about
 relocation?
Date: Mon, 17 Jun 2024 20:11:48 +0200
Organization: A noiseless patient Spider
Lines: 52
Message-ID: <v4pu94$rlv8$1@dont-email.me>
References: <v4ojs8$gvji$1@dont-email.me> <875xu8vsen.fsf@bsb.me.uk>
 <v4ovqf$j5hq$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 17 Jun 2024 20:11:48 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="c4e4b5756e2db6b50d94114c01c52c76";
	logging-data="907240"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1+vZHzQ924OK8Uhg3e+SZwoh06+tm9mgHw="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:BJ2wRFizYpsBjMf9s254l3077tk=
In-Reply-To: <v4ovqf$j5hq$1@dont-email.me>
Content-Language: en-GB
Bytes: 3680

On 17/06/2024 11:31, Malcolm McLean wrote:
> On 17/06/2024 10:18, Ben Bacarisse wrote:
>> Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
>>
>>> In a recent thread realloc() was a substantial part of the discussion.
>>> "Occasionally" the increased data storage will be relocated along
>>> with the previously stored data. On huge data sets that might be a
>>> performance factor. Is there any experience or are there any concrete
>>> factors about the conditions when this relocation happens? - I could
>>> imagine that it's no issue as long as you're in some kB buffer range,
>>> but if, say, we're using realloc() to substantially increase buffers
>>> often it might be an issue to consider. It would be good to get some
>>> feeling about that internal.
>>
>> There is obviously a cost, but there is (usually) no alternative if
>> contiguous storage is required.  In practice, the cost is usually
>> moderate and can be very effectively managed by using an exponential
>> allocation scheme: at every reallocation multiply the storage space by
>> some factor greater than 1 (I often use 3/2, but doubling is often used
>> as well).  This results in O(log(N)) rather than O(N) allocations as in
>> your code that added a constant to the size.  Of course, some storage is
>> wasted (that /might/ be retrieved by a final realloc down to the final
>> size) but that's rarely significant.
>>
> So can we work it out?
> 
> Let's assume for the moment that the allocations have a semi-normal 
> distribution, with negative values disallowed. Now ignoring the first 
> few values, if we have allocated, say, 1K, we ought to be able to 
> predict the value by integrating the distribution from 1k to infinity 
> and taking the mean.
> 

First, there is no reason for assuming such a distribution, other than 
saying "lots of things are roughly normal".

Secondly, knowing the distribution gives you /no/ information about any 
given particular case.  You know the distribution for the results of 
rolling two die - does that mean you can predict the next roll?

Thirdly, not all distributions have a mean (look up the Cauchy 
distribution if you like).

Fourthly, even if you know the mean, it tells you nothing of use.


Knowing a bit about the distribution of file sizes can be useful, but 
not nearly in the way you describe here.  If you know that the files are 
rarely or never bigger than 10 MB, malloc 10 MB and forget the realloc. 
If you know they are often bigger than that, mmap the file and forget 
the realloc.