Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Michael S Newsgroups: comp.lang.c Subject: Re: Good hash for pointers Date: Mon, 3 Jun 2024 20:16:46 +0300 Organization: A noiseless patient Spider Lines: 55 Message-ID: <20240603201646.0000319d@yahoo.com> References: <86fru6gsqr.fsf@linuxsc.com> <8634q5hjsp.fsf@linuxsc.com> <86le3wfsmd.fsf@linuxsc.com> <86ed9ofq14.fsf@linuxsc.com> <86sexypvff.fsf@linuxsc.com> <20240602104506.000072e4@yahoo.com> <20240603174604.000014d4@yahoo.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Injection-Date: Mon, 03 Jun 2024 19:16:34 +0200 (CEST) Injection-Info: dont-email.me; posting-host="2e115e67b3843932598c276339879624"; logging-data="4002254"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/3whscdOykL50qly2RYbX5dREPpjb7Mv0=" Cancel-Lock: sha1:3zIvLIlWkOO+dd6qOLUut589ouI= X-Newsreader: Claws Mail 3.19.1 (GTK+ 2.24.33; x86_64-w64-mingw32) Bytes: 3484 On Mon, 3 Jun 2024 17:24:36 +0100 bart wrote: > On 03/06/2024 16:54, Bonita Montero wrote: > > Am 03.06.2024 um 16:46 schrieb Michael S: > >> On Mon, 3 Jun 2024 16:34:37 +0200 > >> Bonita Montero wrote: > >> > >>> Am 02.06.2024 um 09:45 schrieb Michael S: > >>> > >>>> So, what were your conclusions? > >>>> Ignoring the speed of computation, would something like > >>>> cryptographic hash scaled to bucket size be a best hash for this > >>>> type of application? Or some sort of less thorough grinding of > >>>> the bits is better? > >>> > >>> There's no need for a crypto-hash here. > >>> > >> > >> Do you think I don't know? > >> Crypto hash is just an example of near-ideal pseudo-random > >> uniformity. > > > > As I've shown for pointers you get a perfect equal distribution with > > multiplying by an appropriate prime. > > > > A pointer with 8-byte or 16-byte alignment will have the bottom 3-4 > bits zero. > > No matter what number you multiply them by, prime or otherwise, those > 3-4 bits will always be zero. > > If you mask the result to fit a table of size power-of-two, then the > resulting index will can only ever refer to every 8th or every 16th > slot; there will 8-16x as many clashes as there ought to be. > > So some extra work is needed to get around that, for example > right-shifting before masking as some here have done, something you > have never mentioned. > > According to my understanding, Bonita and Tim are discussing hash generator which output is not used as is. They assume that index of the slot will be calculated as (Hash(key)*bucket_size)/(Hash_max+1). For big enough Hash_max (Bonita suggests 2**63-1), poor quality of few LS bits of Hash(key) does not matter.