Deutsch   English   Français   Italiano  
<v3l35r$ig0$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!news.mixmin.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: bart <bc@freeuk.com>
Newsgroups: comp.lang.c
Subject: Re: Good hash for pointers
Date: Mon, 3 Jun 2024 19:48:29 +0100
Organization: A noiseless patient Spider
Lines: 82
Message-ID: <v3l35r$ig0$1@dont-email.me>
References: <v2n88p$1nlcc$1@dont-email.me>
 <v2qm8m$2el55$1@raubtier-asyl.eternal-september.org>
 <v2qnue$2evlu$1@dont-email.me> <v2r9br$2hva2$1@dont-email.me>
 <86fru6gsqr.fsf@linuxsc.com>
 <v2sudq$2trh1$1@raubtier-asyl.eternal-september.org>
 <8634q5hjsp.fsf@linuxsc.com>
 <v2vmhr$3ffjk$1@raubtier-asyl.eternal-september.org>
 <86le3wfsmd.fsf@linuxsc.com>
 <v2voe7$3fr50$1@raubtier-asyl.eternal-september.org>
 <86ed9ofq14.fsf@linuxsc.com>
 <v2vs40$3gflh$1@raubtier-asyl.eternal-september.org>
 <86sexypvff.fsf@linuxsc.com> <20240602104506.000072e4@yahoo.com>
 <v3kk9p$3u8qu$1@raubtier-asyl.eternal-september.org>
 <20240603174604.000014d4@yahoo.com>
 <v3kovm$3v2ie$1@raubtier-asyl.eternal-september.org>
 <v3kqo3$3v9m2$1@dont-email.me> <20240603201646.0000319d@yahoo.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 03 Jun 2024 20:48:28 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="d2cae1aa4235874fd2214b538c145732";
	logging-data="18944"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX19skpY3THm8m1Z6xjU8PpiV"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:UuSVV1cMA0JKbGO63FEI2eyrcII=
In-Reply-To: <20240603201646.0000319d@yahoo.com>
Content-Language: en-GB
Bytes: 4674

On 03/06/2024 18:16, Michael S wrote:
> On Mon, 3 Jun 2024 17:24:36 +0100
> bart <bc@freeuk.com> wrote:
> 
>> On 03/06/2024 16:54, Bonita Montero wrote:
>>> Am 03.06.2024 um 16:46 schrieb Michael S:
>>>> On Mon, 3 Jun 2024 16:34:37 +0200
>>>> Bonita Montero <Bonita.Montero@gmail.com> wrote:
>>>>   
>>>>> Am 02.06.2024 um 09:45 schrieb Michael S:
>>>>>   
>>>>>> So, what were your conclusions?
>>>>>> Ignoring the speed of computation, would something like
>>>>>> cryptographic hash scaled to bucket size be a best hash for this
>>>>>> type of application? Or some sort of less thorough grinding of
>>>>>> the bits is better?
>>>>>
>>>>> There's no need for a crypto-hash here.
>>>>>   
>>>>
>>>> Do you think I don't know?
>>>> Crypto hash is just an example of near-ideal pseudo-random
>>>> uniformity.
>>>
>>> As I've shown for pointers you get a perfect equal distribution with
>>> multiplying by an appropriate prime.
>>>    
>>
>> A pointer with 8-byte or 16-byte alignment will have the bottom 3-4
>> bits zero.
>>
>> No matter what number you multiply them by, prime or otherwise, those
>> 3-4 bits will always be zero.
>>
>> If you mask the result to fit a table of size power-of-two, then the
>> resulting index will can only ever refer to every 8th or every 16th
>> slot; there will 8-16x as many clashes as there ought to be.
>>
>> So some extra work is needed to get around that, for example
>> right-shifting before masking as some here have done, something you
>> have never mentioned.
>>
>>
> 
> According to my understanding, Bonita and Tim are discussing hash
> generator which output is not used as is.

> For big enough Hash_max (Bonita suggests 2**63-1), poor quality of
> few LS bits of Hash(key) does not matter.

Tim might, I'm not sure sure about BM.

This is an exchange between Malcolm and BM:

MM:
 > The lower bits of a pointer are often all zeroes. And mutlipying by
 > any  integer will not set them. And that is catastrophic for a hash.
 >

BM:
If you take a large integer they will be scrambled and if the
prime is large enough there will be a good equal distribution.

I've reiterated MM's point, which BM just ignored.

 > They assume that index of
 > the slot will be calculated as (Hash(key)*bucket_size)/(Hash_max+1).

I don't remember seeing that anywhere. But, bucket_size is the size of 
the hash-table?

And Hash_max+1 is likely to be a power of two?

I prefer my hash values to be evaluated independently of table size. 
They usually don't have a meaningful maximum value, other than the 64 
bit of their type, and I'd rather they didn't have sequences of zero 
bits especially at the bottom end.

If hash_max was 2**63-1, say, then dividing by hash_max+1 would probably 
give you zero anyway, especially if you first multiplied by a bucket 
size that was a power of two, as all the interesting bits would 
disappear past the top bit!