Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: David Brown Newsgroups: comp.lang.c,sci.stat.math Subject: Re: realloc() - frequency, conditions, or experiences about relocation? Date: Wed, 19 Jun 2024 19:41:49 +0200 Organization: A noiseless patient Spider Lines: 42 Message-ID: References: <875xu8vsen.fsf@bsb.me.uk> <87zfrjvqp6.fsf@bsb.me.uk> <20240617180249.96dfaafa89392827aa162434@g{oogle}mail.com> <875xu5t066.fsf@bsb.me.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Date: Wed, 19 Jun 2024 19:41:49 +0200 (CEST) Injection-Info: dont-email.me; posting-host="496e798c2f4d7717891a7779c6d418c6"; logging-data="2196337"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+doEZ1BuKX138+9PzKwHPN0j+0ZbYsBcc=" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.11.0 Cancel-Lock: sha1:XziIqipJUSn9Yw42qwBC83cPZwc= In-Reply-To: <875xu5t066.fsf@bsb.me.uk> Content-Language: en-GB Bytes: 3302 On 19/06/2024 17:36, Ben Bacarisse wrote: > Malcolm McLean writes: > >> No. We have to have some knowledge. And what we probaby know is that the >> input is a file stored on someone's personal computer. And someone has >> published on the statistical distribution of such files > > That's not the case that matters (to me at least). If the input is a > file, we have a much better way of "guessing" the size than guessing and > growing -- just ask for the size. Sure, we might need to make > adjustments if the file is changing, but there is always a better > measure than any statistical analysis. > > To some extent this seems like a solution in search of a problem. It seems more like a solution that doesn't exist in search of a problem with absurdly unrealistic requirements. And even if Malcolm's solution existed, and the problem existed, it /still/ wouldn't work - knowing the distribution of file sizes tells us nothing about the size of any given file. > Growing the buffer exponentially is simple and effective. > Yes, that's the general way to handle buffers when you don't know what size they should be. A better solutions for this sort of program is usually, as you say, asking the OS for the file size (there is no standard library function for getting the file size, but it's not hard to do for any realistic target OS). And then for big files, prefer mmap to reading the file into a buffer. It's only really for unsized "files" such as piped input that you have no way of getting the size, and then exponential growth is the way to go. Personally, I'd start with a big size (perhaps 10 MB) that is bigger than you are likely to need in practice, but small enough that it is negligible on even vaguely modern computers. Then the realloc code is unlikely to be used (but it can still be there for completeness).