Path: ...!weretis.net!feeder8.news.weretis.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Michael S Newsgroups: comp.lang.c Subject: Re: Good hash for pointers Date: Thu, 6 Jun 2024 11:00:09 +0300 Organization: A noiseless patient Spider Lines: 53 Message-ID: <20240606110009.00001096@yahoo.com> References: <86fru6gsqr.fsf@linuxsc.com> <8634q5hjsp.fsf@linuxsc.com> <86le3wfsmd.fsf@linuxsc.com> <86ed9ofq14.fsf@linuxsc.com> <20240605005916.00001b33@yahoo.com> <86a5jzmle1.fsf@linuxsc.com> <20240605195905.00002484@yahoo.com> <86y17ilm4k.fsf@linuxsc.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Injection-Date: Thu, 06 Jun 2024 09:59:57 +0200 (CEST) Injection-Info: dont-email.me; posting-host="e60a8e6a76916644fa1bcd4273b02481"; logging-data="1497433"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/k2BpY+R/zmGyY1VPyGNxZMGJ492/PFzg=" Cancel-Lock: sha1:FvsBYaSBftVPG0ce5ZAYms2ZnUQ= X-Newsreader: Claws Mail 3.19.1 (GTK+ 2.24.33; x86_64-w64-mingw32) Bytes: 3494 On Wed, 05 Jun 2024 21:40:27 -0700 Tim Rentsch wrote: > Michael S writes: > > > On Wed, 05 Jun 2024 08:58:46 -0700 > > Tim Rentsch wrote: > > > >> I did get your own hash function out and put it into my little > >> test rig. There may be summary comments below. > > > > > > > > As you probably already paid attention, I like bottom lines. > > What is a bottom line here? > > The bottom line is both the multiply-by-a-large-prime hashes > should be avoided. > I am not convinced. So far I had seen no disadvantages for particular application mentioned by Malcolm. Of course, it is no good for Malcolm, because it can't be coded in what Malcolm erroneously call "ANSI C". It is not easy to be coded even in ISO C11. But in my own practice it does not matter. I happily use language extensions, like gnu __int128 or Microsoft's __umulh() when they are the best tool for a job. > > Did you you encounter cases in which almost-bonita's-but-not-quite > > hash function performed badly? > > Yes. A clear example is using directly mapped hash table that > has a power of two elements, with inputs all from malloc( 48 ) > calls. All keys map to one of only 1024 entries, in a hash > table that has 65536 slots. I don't see it. $ ./a 0x10000 0xC000 48 ref: 49152 keys, 65536 slots, 34590 occupied 11295 collisions. Worst collision: 7 uut: 49152 keys, 65536 slots, 35791 occupied 11872 collisions. Worst collision: 3 It sounds like in your tests you don't use 'linear scaling to table size' for translation of hash value to table index. IMHO, you should. May be, for some particular hash functions other methods of translation also work, but I see no reasons to use other methods. This one is quite fast, robust and looks to me like the most logical. For power-of-2 sized tables it degenerates into simple right shift.