Deutsch English Français Italiano |
<v3l35r$ig0$1@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!news.mixmin.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: bart <bc@freeuk.com> Newsgroups: comp.lang.c Subject: Re: Good hash for pointers Date: Mon, 3 Jun 2024 19:48:29 +0100 Organization: A noiseless patient Spider Lines: 82 Message-ID: <v3l35r$ig0$1@dont-email.me> References: <v2n88p$1nlcc$1@dont-email.me> <v2qm8m$2el55$1@raubtier-asyl.eternal-september.org> <v2qnue$2evlu$1@dont-email.me> <v2r9br$2hva2$1@dont-email.me> <86fru6gsqr.fsf@linuxsc.com> <v2sudq$2trh1$1@raubtier-asyl.eternal-september.org> <8634q5hjsp.fsf@linuxsc.com> <v2vmhr$3ffjk$1@raubtier-asyl.eternal-september.org> <86le3wfsmd.fsf@linuxsc.com> <v2voe7$3fr50$1@raubtier-asyl.eternal-september.org> <86ed9ofq14.fsf@linuxsc.com> <v2vs40$3gflh$1@raubtier-asyl.eternal-september.org> <86sexypvff.fsf@linuxsc.com> <20240602104506.000072e4@yahoo.com> <v3kk9p$3u8qu$1@raubtier-asyl.eternal-september.org> <20240603174604.000014d4@yahoo.com> <v3kovm$3v2ie$1@raubtier-asyl.eternal-september.org> <v3kqo3$3v9m2$1@dont-email.me> <20240603201646.0000319d@yahoo.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Date: Mon, 03 Jun 2024 20:48:28 +0200 (CEST) Injection-Info: dont-email.me; posting-host="d2cae1aa4235874fd2214b538c145732"; logging-data="18944"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19skpY3THm8m1Z6xjU8PpiV" User-Agent: Mozilla Thunderbird Cancel-Lock: sha1:UuSVV1cMA0JKbGO63FEI2eyrcII= In-Reply-To: <20240603201646.0000319d@yahoo.com> Content-Language: en-GB Bytes: 4674 On 03/06/2024 18:16, Michael S wrote: > On Mon, 3 Jun 2024 17:24:36 +0100 > bart <bc@freeuk.com> wrote: > >> On 03/06/2024 16:54, Bonita Montero wrote: >>> Am 03.06.2024 um 16:46 schrieb Michael S: >>>> On Mon, 3 Jun 2024 16:34:37 +0200 >>>> Bonita Montero <Bonita.Montero@gmail.com> wrote: >>>> >>>>> Am 02.06.2024 um 09:45 schrieb Michael S: >>>>> >>>>>> So, what were your conclusions? >>>>>> Ignoring the speed of computation, would something like >>>>>> cryptographic hash scaled to bucket size be a best hash for this >>>>>> type of application? Or some sort of less thorough grinding of >>>>>> the bits is better? >>>>> >>>>> There's no need for a crypto-hash here. >>>>> >>>> >>>> Do you think I don't know? >>>> Crypto hash is just an example of near-ideal pseudo-random >>>> uniformity. >>> >>> As I've shown for pointers you get a perfect equal distribution with >>> multiplying by an appropriate prime. >>> >> >> A pointer with 8-byte or 16-byte alignment will have the bottom 3-4 >> bits zero. >> >> No matter what number you multiply them by, prime or otherwise, those >> 3-4 bits will always be zero. >> >> If you mask the result to fit a table of size power-of-two, then the >> resulting index will can only ever refer to every 8th or every 16th >> slot; there will 8-16x as many clashes as there ought to be. >> >> So some extra work is needed to get around that, for example >> right-shifting before masking as some here have done, something you >> have never mentioned. >> >> > > According to my understanding, Bonita and Tim are discussing hash > generator which output is not used as is. > For big enough Hash_max (Bonita suggests 2**63-1), poor quality of > few LS bits of Hash(key) does not matter. Tim might, I'm not sure sure about BM. This is an exchange between Malcolm and BM: MM: > The lower bits of a pointer are often all zeroes. And mutlipying by > any integer will not set them. And that is catastrophic for a hash. > BM: If you take a large integer they will be scrambled and if the prime is large enough there will be a good equal distribution. I've reiterated MM's point, which BM just ignored. > They assume that index of > the slot will be calculated as (Hash(key)*bucket_size)/(Hash_max+1). I don't remember seeing that anywhere. But, bucket_size is the size of the hash-table? And Hash_max+1 is likely to be a power of two? I prefer my hash values to be evaluated independently of table size. They usually don't have a meaningful maximum value, other than the 64 bit of their type, and I'd rather they didn't have sequences of zero bits especially at the bottom end. If hash_max was 2**63-1, say, then dividing by hash_max+1 would probably give you zero anyway, especially if you first multiplied by a bucket size that was a power of two, as all the interesting bits would disappear past the top bit!