Deutsch English Français Italiano |
<86zfrkj93b.fsf@linuxsc.com> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Tim Rentsch <tr.17687@z991.linuxsc.com> Newsgroups: comp.lang.c Subject: Re: Good hash for pointers Date: Mon, 17 Jun 2024 00:56:40 -0700 Organization: A noiseless patient Spider Lines: 94 Message-ID: <86zfrkj93b.fsf@linuxsc.com> References: <v2n88p$1nlcc$1@dont-email.me> <v2qm8m$2el55$1@raubtier-asyl.eternal-september.org> <v2qnue$2evlu$1@dont-email.me> <v2r9br$2hva2$1@dont-email.me> <86fru6gsqr.fsf@linuxsc.com> <v2sudq$2trh1$1@raubtier-asyl.eternal-september.org> <8634q5hjsp.fsf@linuxsc.com> <v2vmhr$3ffjk$1@raubtier-asyl.eternal-september.org> <86le3wfsmd.fsf@linuxsc.com> <v2voe7$3fr50$1@raubtier-asyl.eternal-september.org> <86ed9ofq14.fsf@linuxsc.com> <20240605005916.00001b33@yahoo.com> <86a5jzmle1.fsf@linuxsc.com> <20240605195905.00002484@yahoo.com> <86y17ilm4k.fsf@linuxsc.com> <20240606110009.00001096@yahoo.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Injection-Date: Mon, 17 Jun 2024 09:56:42 +0200 (CEST) Injection-Info: dont-email.me; posting-host="692163eb8eeb93c0cba3302737714278"; logging-data="596259"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18M5nNwGzVrdBUdcb1TxRCl8dzZQcq9bvY=" User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux) Cancel-Lock: sha1:qyH83AFsvFRNcKn7YlmB64D2Cls= sha1:47+vZEcT4mVuLw5clMV8JVz2L/s= Bytes: 4954 Michael S <already5chosen@yahoo.com> writes: > On Wed, 05 Jun 2024 21:40:27 -0700 > Tim Rentsch <tr.17687@z991.linuxsc.com> wrote: > >> Michael S <already5chosen@yahoo.com> writes: >> >>> On Wed, 05 Jun 2024 08:58:46 -0700 >>> Tim Rentsch <tr.17687@z991.linuxsc.com> wrote: >>> >>>> I did get your own hash function out and put it into my little >>>> test rig. There may be summary comments below. >>> >>> <snip> >>> >>> As you probably already paid attention, I like bottom lines. >>> What is a bottom line here? >> >> The bottom line is both the multiply-by-a-large-prime hashes >> should be avoided. > > I am not convinced. Let me amend my earlier statement. A hash function that simply multiplies its input by an integer constant should be avoided if the goal is to produce a hash function of decent quaility rather than one of mediocre quality. > Of course, it is no good for Malcolm, because it can't be coded > in what Malcolm erroneously call "ANSI C". I don't know why you say that. C was an ANSI standard before it was an ISO standard. Or is it that you think that the language Malcolm is intent on using does not conform to C90/C89/ANSI C? >>> Did you you encounter cases in which almost-bonita's-but-not-quite >>> hash function performed badly? >> >> Yes. A clear example is using directly mapped hash table that >> has a power of two elements, with inputs all from malloc( 48 ) >> calls. All keys map to one of only 1024 entries, in a hash >> table that has 65536 slots. > > I don't see it. > $ ./a 0x10000 0xC000 48 > ref: 49152 keys, 65536 slots, 34590 occupied 11295 collisions. > Worst collision: 7 > uut: 49152 keys, 65536 slots, 35791 occupied 11872 collisions. > Worst collision: 3 You have run a poor test. The point of quality evaluation testing is to look for potential problems, not to disguise them. > It sounds like in your tests you don't use 'linear scaling to > table size' for translation of hash value to table index. > IMHO, you should. Like I said in another response, the question of how hash tables should make use of hash values is a separate topic. The point of the tests I ran was to evaluate the quality of hash functions, not to look at how hash tables should be coded. Here are some statistics I gathered from a more comprehensive set of tests for several hash functions. The tests look at aggregates of measured values for all output bits (considered eight bits at a time) as a function of all input bits (considered sixteen bits at a time. The first line in each pair summarizes the low values in each set, and the second line summarizes the high values in each set, in each case showing the range, average, and standard deviation. The first set, hash 0, uses the aes code you posted. I think it's easy to see the quality get worse going from top to bottom. hash 0 186 .. 201 avg: 195.39 dev: 3.71 312 .. 339 avg: 322.31 dev: 5.85 hash 1 186 .. 203 avg: 195.48 dev: 4.01 310 .. 344 avg: 322.45 dev: 6.31 hash 2 160 .. 202 avg: 193.56 dev: 7.70 311 .. 348 avg: 322.22 dev: 7.91 hash 3 143 .. 202 avg: 185.52 dev: 10.15 311 .. 387 avg: 329.52 dev: 14.10 hash 4 0 .. 212 avg: 9.94 dev: 45.16 270 .. 65536 avg: 44546.56 dev: 28809.47