Article <20240606133534.00005c5d@yahoo.com>

Warning: mysqli::__construct(): (HY000/1203): User howardkn already has more than 'max_user_connections' active connections in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\includes\artfuncs.php on line 21
Failed to connect to MySQL: (1203) User howardkn already has more than 'max_user_connections' active connections
Warning: mysqli::query(): Couldn't fetch mysqli in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\index.php on line 66
Article <20240606133534.00005c5d@yahoo.com>

Deutsch English Français Italiano

<20240606133534.00005c5d@yahoo.com>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!3.eu.feeder.erje.net!feeder.erje.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Michael S <already5chosen@yahoo.com>
Newsgroups: comp.lang.c
Subject: Re: Good hash for pointers
Date: Thu, 6 Jun 2024 13:35:34 +0300
Organization: A noiseless patient Spider
Lines: 49
Message-ID: <20240606133534.00005c5d@yahoo.com>
References: <v2n88p$1nlcc$1@dont-email.me>
	<v2qm8m$2el55$1@raubtier-asyl.eternal-september.org>
	<v2qnue$2evlu$1@dont-email.me>
	<v2r9br$2hva2$1@dont-email.me>
	<86fru6gsqr.fsf@linuxsc.com>
	<v2sudq$2trh1$1@raubtier-asyl.eternal-september.org>
	<8634q5hjsp.fsf@linuxsc.com>
	<v2vmhr$3ffjk$1@raubtier-asyl.eternal-september.org>
	<86le3wfsmd.fsf@linuxsc.com>
	<v2voe7$3fr50$1@raubtier-asyl.eternal-september.org>
	<86ed9ofq14.fsf@linuxsc.com>
	<20240605005916.00001b33@yahoo.com>
	<86a5jzmle1.fsf@linuxsc.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 06 Jun 2024 12:35:21 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="e60a8e6a76916644fa1bcd4273b02481";
	logging-data="1497433"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1/26Njf03ZJhK/F5pZLKB39K5XtX890HM4="
Cancel-Lock: sha1:8c37C7HEFCVd7Fhig4m9jU8QM8o=
X-Newsreader: Claws Mail 3.19.1 (GTK+ 2.24.33; x86_64-w64-mingw32)
Bytes: 4147

On Wed, 05 Jun 2024 08:58:46 -0700
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

> 
> If we're trying to write a general hash function, it should work
> well across a wide range of input key sets, and also for different
> kinds of hash table management.  Let's look at the output side
> first.  There are two independent axes for hash tables:  table
> size, and direct mapping versus using rehashing.  For table size,
> the two prominent choices are some prime in the appropriate range,
> or else a power of two.  Clearly any size could be chosen, but
> primes have the nice property that they divide up the space of hash
> values very evenly, whereas powers of two allow a cheaper way of
> reducing a hash value to the range of table indices (anding versus a
> modulo operation).  A good hash function should work well with both.
> The choice of direct mapping versus rehashing depends on expected
> occupancy rates (which I am used to calling "load factor").  For a
> load factor of 75 to 80 percent or less, generally rehashing is
> better.  Direct mapping has the advantage that it can handle load
> factors above 100%, at a cost of dealing with whatever subsidiary
> data structure (usually linked lists) is used when two or more keys
> map to the same index in the hash table.  (I expect you know all
> that, I wrote it mostly just to clarify my own thinking.)
>

I never had interest in implementation of hash methods, typically just
took what language or library provides and used it.
All my knowledge of internals are 50 y.o. news (TAoCP, and no I did not
read it 50 years ago, but I did read it first time slightly less than
40 years ago and not sure if I re-read this particular topic since
then).
So, I am not sure that I understood everything above. 
In particular, w.r.t. rehashing, IIRC, Knuth came to conclusion that
simple circular increment of the index is superior to more elaborate
methods. I don't know if conclusion was changed in the following years,
not even sure that I recollect it correctly.

Despite (or due to?) my relative ignorance of the topic, I'd dare to
disagree with couple of your points:
1. I don't think that hash function should be evaluated in isolation
from the rest of algorithm. IMHO, they should be co-developed. There
is nothing wrong with hash function that works well with one particular
"back end" and poorly with all others as long as limitations are
clearly stated.
2. I don't like "modulo" method of reduction of hash value to range. I
don't like it for any chosen size of the bucket, not just for power of
two, but even for prime. Of course, for power of two size my dislike to
it is deeper.