Article <86zfrkj93b.fsf@linuxsc.com>

Deutsch English Français Italiano
<86zfrkj93b.fsf@linuxsc.com>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Tim Rentsch <tr.17687@z991.linuxsc.com>
Newsgroups: comp.lang.c
Subject: Re: Good hash for pointers
Date: Mon, 17 Jun 2024 00:56:40 -0700
Organization: A noiseless patient Spider
Lines: 94
Message-ID: <86zfrkj93b.fsf@linuxsc.com>
References: <v2n88p$1nlcc$1@dont-email.me> <v2qm8m$2el55$1@raubtier-asyl.eternal-september.org> <v2qnue$2evlu$1@dont-email.me> <v2r9br$2hva2$1@dont-email.me> <86fru6gsqr.fsf@linuxsc.com> <v2sudq$2trh1$1@raubtier-asyl.eternal-september.org> <8634q5hjsp.fsf@linuxsc.com> <v2vmhr$3ffjk$1@raubtier-asyl.eternal-september.org> <86le3wfsmd.fsf@linuxsc.com> <v2voe7$3fr50$1@raubtier-asyl.eternal-september.org> <86ed9ofq14.fsf@linuxsc.com> <20240605005916.00001b33@yahoo.com> <86a5jzmle1.fsf@linuxsc.com> <20240605195905.00002484@yahoo.com> <86y17ilm4k.fsf@linuxsc.com> <20240606110009.00001096@yahoo.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Injection-Date: Mon, 17 Jun 2024 09:56:42 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="692163eb8eeb93c0cba3302737714278";
	logging-data="596259"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX18M5nNwGzVrdBUdcb1TxRCl8dzZQcq9bvY="
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux)
Cancel-Lock: sha1:qyH83AFsvFRNcKn7YlmB64D2Cls=
	sha1:47+vZEcT4mVuLw5clMV8JVz2L/s=
Bytes: 4954

Michael S <already5chosen@yahoo.com> writes:

> On Wed, 05 Jun 2024 21:40:27 -0700
> Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
>
>> Michael S <already5chosen@yahoo.com> writes:
>>
>>> On Wed, 05 Jun 2024 08:58:46 -0700
>>> Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
>>>
>>>> I did get your own hash function out and put it into my little
>>>> test rig.  There may be summary comments below.
>>>
>>> <snip>
>>>
>>> As you probably already paid attention, I like bottom lines.
>>> What is a bottom line here?
>>
>> The bottom line is both the multiply-by-a-large-prime hashes
>> should be avoided.
>
> I am not convinced.

Let me amend my earlier statement.  A hash function that simply
multiplies its input by an integer constant should be avoided
if the goal is to produce a hash function of decent quaility
rather than one of mediocre quality.

> Of course, it is no good for Malcolm, because it can't be coded
> in what Malcolm erroneously call "ANSI C".

I don't know why you say that.  C was an ANSI standard before it
was an ISO standard.  Or is it that you think that the language
Malcolm is intent on using does not conform to C90/C89/ANSI C?


>>> Did you you encounter cases in which almost-bonita's-but-not-quite
>>> hash function performed badly?
>>
>> Yes.  A clear example is using directly mapped hash table that
>> has a power of two elements, with inputs all from malloc( 48 )
>> calls.  All keys map to one of only 1024 entries, in a hash
>> table that has 65536 slots.
>
> I don't see it.
> $ ./a 0x10000 0xC000 48
> ref: 49152 keys, 65536 slots, 34590 occupied 11295 collisions.
>    Worst collision: 7
> uut: 49152 keys, 65536 slots, 35791 occupied 11872 collisions.
>    Worst collision: 3

You have run a poor test.  The point of quality evaluation
testing is to look for potential problems, not to disguise
them.

> It sounds like in your tests you don't use 'linear scaling to
> table size' for translation of hash value to table index.
> IMHO, you should.

Like I said in another response, the question of how hash tables
should make use of hash values is a separate topic.  The point
of the tests I ran was to evaluate the quality of hash functions,
not to look at how hash tables should be coded.

Here are some statistics I gathered from a more comprehensive
set of tests for several hash functions.  The tests look at
aggregates of measured values for all output bits (considered
eight bits at a time) as a function of all input bits (considered
sixteen bits at a time.  The first line in each pair summarizes
the low values in each set, and the second line summarizes the
high values in each set, in each case showing the range, average,
and standard deviation.  The first set, hash 0, uses the aes
code you posted.  I think it's easy to see the quality get
worse going from top to bottom.

   hash 0
       186 ..   201   avg:   195.39   dev:  3.71
       312 ..   339   avg:   322.31   dev:  5.85

   hash 1
       186 ..   203   avg:   195.48   dev:  4.01
       310 ..   344   avg:   322.45   dev:  6.31

   hash 2
       160 ..   202   avg:   193.56   dev:  7.70
       311 ..   348   avg:   322.22   dev:  7.91

   hash 3
       143 ..   202   avg:   185.52   dev: 10.15
       311 ..   387   avg:   329.52   dev: 14.10

   hash 4
         0 ..   212   avg:     9.94   dev:    45.16
       270 .. 65536   avg: 44546.56   dev: 28809.47