Article <86r0ddmsf6.fsf@linuxsc.com>

Warning: mysqli::__construct(): (HY000/1203): User howardkn already has more than 'max_user_connections' active connections in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\includes\artfuncs.php on line 21
Failed to connect to MySQL: (1203) User howardkn already has more than 'max_user_connections' active connections
Warning: mysqli::query(): Couldn't fetch mysqli in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\index.php on line 66
Article <86r0ddmsf6.fsf@linuxsc.com>

Deutsch English Français Italiano

<86r0ddmsf6.fsf@linuxsc.com>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Tim Rentsch <tr.17687@z991.linuxsc.com>
Newsgroups: comp.lang.c
Subject: Re: Good hash for pointers
Date: Mon, 03 Jun 2024 18:02:21 -0700
Organization: A noiseless patient Spider
Lines: 101
Message-ID: <86r0ddmsf6.fsf@linuxsc.com>
References: <v2n88p$1nlcc$1@dont-email.me> <v2qm8m$2el55$1@raubtier-asyl.eternal-september.org> <v2qnue$2evlu$1@dont-email.me> <v2r9br$2hva2$1@dont-email.me> <86fru6gsqr.fsf@linuxsc.com> <v2sudq$2trh1$1@raubtier-asyl.eternal-september.org> <8634q5hjsp.fsf@linuxsc.com> <v2vmhr$3ffjk$1@raubtier-asyl.eternal-september.org> <86le3wfsmd.fsf@linuxsc.com> <v2voe7$3fr50$1@raubtier-asyl.eternal-september.org> <86ed9ofq14.fsf@linuxsc.com> <v2vs40$3gflh$1@raubtier-asyl.eternal-september.org> <86sexypvff.fsf@linuxsc.com> <20240602104506.000072e4@yahoo.com> <86le3nne36.fsf@linuxsc.com> <20240603105005.0000091f@yahoo.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Injection-Date: Tue, 04 Jun 2024 03:02:23 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="2e3473276f70e973f0e20911b7f6b99d";
	logging-data="136753"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX18K3YfQ994DrKXk9nyjq/yrR7V86UI4IeE="
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux)
Cancel-Lock: sha1:T0NMVEU/0WkrtKdbN657sxGlb7k=
	sha1:ITmaDoXZw2c5jOU8hMrwVVS3vpY=
Bytes: 6116

Michael S <already5chosen@yahoo.com> writes:

> On Sun, 02 Jun 2024 16:02:05 -0700
> Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
>
>> Michael S <already5chosen@yahoo.com> writes:
>>
>>> On Thu, 30 May 2024 19:27:48 -0700
>>> Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

[... concerning quality of hash functions ...]

>>> So, what were your conclusions?
>>> Ignoring the speed of computation, would something like
>>> cryptographic hash scaled to bucket size be a best hash for
>>> this type of application?  Or some sort of less thorough
>>> grinding of the bits is better?
>>
>> Factors that matter:
>
> <snip>
>
>> Probably much of the above was obvious.  I make no apologies for
>> that.  Also it may seem disjointed or unorganized.  If so then
>> sorry about that chief, it's a big topic and there's a lot of
>> ground to cover, and I tried to hit the most important
>> highlights.
>
> It sounds like you *postulate* that crypto is an ideal.

I think it's more accurate to say that I _expect_ an established
crypto-quality hash function such as MD5 to be good approximation
to an ideal hash function.  I expect that because (a) the design
criteria for such hash functions resembles or includes the design
criteria for a near-ideal hash function, (b) the people who have
done these are smart people with (usually) a fair amount of
experience doing such things, and (c) the established ones have
gone through a lot of use and testing by the user community, who
likely would have found any serious weaknesses.

> I am less in axioms and more interested in your experimental
> findings.

I'm not sure what you're looking for here.  The testing I did for
the various recently posted hash functions was simply running a
few ad hoc test scenarios and comparing the results against each
other and also against the results of hash function of my own
devising likely to produce high quality results.  One piece of
empirical evidence:  in all of the tests I ran, that hash function
never gave results that varied by more than 10% or so, not matter
what test was done.  By contrast, most or all of the other hash
functions sometimes gave results that varied by a factor of 4 or
more, depending on what scenario was being tested.  Let me say
again that the test scenarios were ad hoc, and I was not being
systematic but just eyeballing the numbers.

I have a fair amount of experience (at least hundreds of hours)
experimenting with various random number generators, including
several of my own devising.  The experience includes systematic
testing using extensive test suites (done by people who really
know what they are doing, which in this arena does not include
myself).  Random number generators and hash functions have a lot
of principles in common.  Suppose we have a hash function that
takes a 64-bit key and produces a 32-bit hash value.  If we use
that hash function with simply successive 64-bit integer values,
and the outputs pass the statistical tests of a good random
number generator test suite, you can be sure that hash function
has very high quality.  A key observation about random number
generators is this:  if the output values are N bits, generally
the amount of internal state should be at least 2N bits.  The
same thing is true of hash functions.  If a hash function
produces, say, 32-bit values, internally there should be at least
64 bits of state that is mushed together in various ways, finally
being combined in the final step to produce the 32-bit result.
Another design principle is to replicate information.  For
example, if we are hashing a character string to produce a 32-bit
hash value, we might combine each character in the string with
two different 32-bit "accumulators", perhaps adding into one and
xoring into the other, and rotating them by different amounts
after each character scanned.  Composing code on the fly (not
compiled, and certainly not tested):

    unsigned
    string_hash( const char *s ){
        unsigned a = 12345, b = 0xabcdef;
        char c;
        while(  c = *s++  ){
            a += c,  b ^= c;
            a = a << 23 | a >> 32-23;
            b = b << 10 | b >> 32-10;
            a -= b;  // why not?  ;)
        }
        return  a ^ b;
    }

This code has a fair chance of producing acceptable hash values
for strings that have at least three characters.  (Let me say
again that I have done no testing at all.)

So I hope this posting has some of what you're looking for.  And
if it doesn't, well, then I am as much in the dark as ever.  :)