Path: ...!weretis.net!feeder8.news.weretis.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Anton Shepelev Newsgroups: comp.lang.c,sci.stat.math Subject: Re: realloc() - frequency, conditions, or experiences about relocation? Date: Thu, 20 Jun 2024 01:53:47 +0300 Organization: A noiseless patient Spider Lines: 44 Message-ID: <20240620015347.9bdcc4df03ab63d096375450@gmail.moc> References: <875xu8vsen.fsf@bsb.me.uk> <87zfrjvqp6.fsf@bsb.me.uk> <20240617180249.96dfaafa89392827aa162434@g{oogle}mail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Injection-Date: Thu, 20 Jun 2024 00:53:49 +0200 (CEST) Injection-Info: dont-email.me; posting-host="b394b4f438bd83bf46fe16e962fd5edb"; logging-data="2318502"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/r1HnWRqgy24DdnETNx71smiLltbSz8lM=" Cancel-Lock: sha1:LE8/zn1c9L7SSeuTU5M4augVWi0= X-Newsreader: Sylpheed 3.7.0 (GTK+ 2.24.30; i686-pc-mingw32) Bytes: 3000 Malcolm McLean: > We have to have some knowledge. And what we probaby know > is that the input is a file stored on someone's personal > computer. And someone has published on the statistical > distribution of such files And they have a log-normal > distribution with a mean and a median which he gives. So > with that informaton, we can work out, given that a file > is at least N characters, what is the prbablity that an > allocation of any size will contain the whole file, and > how many bytes, on average will be wasted. Observe that the standard algorithm of exponential growth is memoryless and self-similar in that in does not depend on context, or the history of previous reallocations. These properties belong to (or even identify?) the exponential distribution. We can therefore assume that exponential- growth strategy is ideal for exponentially distributed buffer sizes, and under that assumption determine the relation between the CDF values (p) corresponding to consequent re-allcoations: p = e^x/L , p0 = 1-e^(L*x0) , p1 = 1-e^(L*x1) , x1 = k*x0 (by our strategy), => p1 = 1-(1-p0)^k . which does not depend on the distribution and lets us generalise this approach for any distribution: x1 = Q( 1 - ( 1 - CDF(x0) )^k ) where: x0 : the required size x1 : the new recommended capacity Q(p) : the p-Quantile of the given distribution CDF(x): the CDF of the given distribution k>1 : balance between speed and space efficiency -- () ascii ribbon campaign -- against html e-mail /\ www.asciiribbon.org -- against proprietary attachments