Deutsch English Français Italiano |
<20240606133534.00005c5d@yahoo.com> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!3.eu.feeder.erje.net!feeder.erje.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Michael S <already5chosen@yahoo.com> Newsgroups: comp.lang.c Subject: Re: Good hash for pointers Date: Thu, 6 Jun 2024 13:35:34 +0300 Organization: A noiseless patient Spider Lines: 49 Message-ID: <20240606133534.00005c5d@yahoo.com> References: <v2n88p$1nlcc$1@dont-email.me> <v2qm8m$2el55$1@raubtier-asyl.eternal-september.org> <v2qnue$2evlu$1@dont-email.me> <v2r9br$2hva2$1@dont-email.me> <86fru6gsqr.fsf@linuxsc.com> <v2sudq$2trh1$1@raubtier-asyl.eternal-september.org> <8634q5hjsp.fsf@linuxsc.com> <v2vmhr$3ffjk$1@raubtier-asyl.eternal-september.org> <86le3wfsmd.fsf@linuxsc.com> <v2voe7$3fr50$1@raubtier-asyl.eternal-september.org> <86ed9ofq14.fsf@linuxsc.com> <20240605005916.00001b33@yahoo.com> <86a5jzmle1.fsf@linuxsc.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Injection-Date: Thu, 06 Jun 2024 12:35:21 +0200 (CEST) Injection-Info: dont-email.me; posting-host="e60a8e6a76916644fa1bcd4273b02481"; logging-data="1497433"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/26Njf03ZJhK/F5pZLKB39K5XtX890HM4=" Cancel-Lock: sha1:8c37C7HEFCVd7Fhig4m9jU8QM8o= X-Newsreader: Claws Mail 3.19.1 (GTK+ 2.24.33; x86_64-w64-mingw32) Bytes: 4147 On Wed, 05 Jun 2024 08:58:46 -0700 Tim Rentsch <tr.17687@z991.linuxsc.com> wrote: > > If we're trying to write a general hash function, it should work > well across a wide range of input key sets, and also for different > kinds of hash table management. Let's look at the output side > first. There are two independent axes for hash tables: table > size, and direct mapping versus using rehashing. For table size, > the two prominent choices are some prime in the appropriate range, > or else a power of two. Clearly any size could be chosen, but > primes have the nice property that they divide up the space of hash > values very evenly, whereas powers of two allow a cheaper way of > reducing a hash value to the range of table indices (anding versus a > modulo operation). A good hash function should work well with both. > The choice of direct mapping versus rehashing depends on expected > occupancy rates (which I am used to calling "load factor"). For a > load factor of 75 to 80 percent or less, generally rehashing is > better. Direct mapping has the advantage that it can handle load > factors above 100%, at a cost of dealing with whatever subsidiary > data structure (usually linked lists) is used when two or more keys > map to the same index in the hash table. (I expect you know all > that, I wrote it mostly just to clarify my own thinking.) > I never had interest in implementation of hash methods, typically just took what language or library provides and used it. All my knowledge of internals are 50 y.o. news (TAoCP, and no I did not read it 50 years ago, but I did read it first time slightly less than 40 years ago and not sure if I re-read this particular topic since then). So, I am not sure that I understood everything above. In particular, w.r.t. rehashing, IIRC, Knuth came to conclusion that simple circular increment of the index is superior to more elaborate methods. I don't know if conclusion was changed in the following years, not even sure that I recollect it correctly. Despite (or due to?) my relative ignorance of the topic, I'd dare to disagree with couple of your points: 1. I don't think that hash function should be evaluated in isolation from the rest of algorithm. IMHO, they should be co-developed. There is nothing wrong with hash function that works well with one particular "back end" and poorly with all others as long as limitations are clearly stated. 2. I don't like "modulo" method of reduction of hash value to range. I don't like it for any chosen size of the bucket, not just for power of two, but even for prime. Of course, for power of two size my dislike to it is deeper.