Article <v2si1h$2rs39$1@dont-email.me>

Warning: mysqli::__construct(): (HY000/1203): User howardkn already has more than 'max_user_connections' active connections in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\includes\artfuncs.php on line 21
Failed to connect to MySQL: (1203) User howardkn already has more than 'max_user_connections' active connections
Warning: mysqli::query(): Couldn't fetch mysqli in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\index.php on line 66
Article <v2si1h$2rs39$1@dont-email.me>

Deutsch English Français Italiano

<v2si1h$2rs39$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!3.eu.feeder.erje.net!feeder.erje.net!news2.arglkargh.de!news.in-chemnitz.de!news.swapon.de!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: bart <bc@freeuk.com>
Newsgroups: comp.lang.c
Subject: Re: Good hash for pointers
Date: Sat, 25 May 2024 12:28:49 +0100
Organization: A noiseless patient Spider
Lines: 54
Message-ID: <v2si1h$2rs39$1@dont-email.me>
References: <v2n88p$1nlcc$1@dont-email.me>
 <v2qm8m$2el55$1@raubtier-asyl.eternal-september.org>
 <v2qnue$2evlu$1@dont-email.me> <v2r9br$2hva2$1@dont-email.me>
 <86fru6gsqr.fsf@linuxsc.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 25 May 2024 13:28:49 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="a9f17f030c74c0000dda6058d5c2eb70";
	logging-data="3010665"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1+dHe+OcZTiyuy+MV0PtE2Q"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:ZBMQ0NKTAjE8PGUrfmRKNiR06X8=
In-Reply-To: <86fru6gsqr.fsf@linuxsc.com>
Content-Language: en-GB
Bytes: 3390

On 25/05/2024 10:12, Tim Rentsch wrote:
> bart <bc@freeuk.com> writes:
> 

> It looks like your hash function was tuned for this testing
> setup.  With different choices for testing it does much
> worse.

It wasn't tuned at all; it was a copy of what I've longed used within my 
lexers.

It is designed to give a good spread when inputs are similar or change 
sequentially. (Well, not exactly designed; more trial and error.)

> The testing is done with malloc() blocks all of the same
> requested size, and that size is 1.  Typical workloads
> are likely to be both larger and more variable.

> When adding a new entry finds a collision with an old
> entry, linear probing is used to find a free slot.  It's
> well understood that linear probing suffers badly as the
> load factor goes up.  Better to take a few high bits of
> the hash value -- as few as five or six is fine -- to
> have the reprobe sequences have different strides.
> 
> Your hash function is expensive to compute, moreso even
> than the "FNV" function shown earlier.

You mean Malcolm's version? That also uses a loop. Neither need to 
process all 8 bytes of a 64-bit pointer; 6 will do, and probably just 4.
And such a loop can be trivially unrolled.

   In a case like
> this one where the compares are cheap, it's better to
> have a dumb-but-fast hash function that might need a
> few more looks to find an open slot, because the cost
> of looking is so cheap compared to computing the hash
> function.

My first thought in this thread was to just make use of the pointer 
value directly. My version was written only to compare against Malcolm's 
FNV code.

If I try the other suggestions, RH's and KK's which simply shift the 
pointer value (similar to my idea), the results I got were interesting: 
sometimes there were zero clashes (almost as though bits 4 to 19 of the 
pointer were a perfect hash), but sometimes there were millions.

But when I tried to mix up the input values, then all methods seemed to 
converge to similar results.

In that case you might as well use the simplest and fastest method.

However it really needs testing within the actual application.