Article <v3vkvp$268b2$1@raubtier-asyl.eternal-september.org>

Deutsch English Français Italiano
<v3vkvp$268b2$1@raubtier-asyl.eternal-september.org>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!raubtier-asyl.eternal-september.org!.POSTED!not-for-mail
From: Bonita Montero <Bonita.Montero@gmail.com>
Newsgroups: comp.lang.c
Subject: Re: Good hash for pointers
Date: Fri, 7 Jun 2024 20:53:49 +0200
Organization: A noiseless patient Spider
Lines: 132
Message-ID: <v3vkvp$268b2$1@raubtier-asyl.eternal-september.org>
References: <v2n88p$1nlcc$1@dont-email.me>
 <v2qm8m$2el55$1@raubtier-asyl.eternal-september.org>
 <v2qnue$2evlu$1@dont-email.me> <v2r9br$2hva2$1@dont-email.me>
 <86fru6gsqr.fsf@linuxsc.com>
 <v2sudq$2trh1$1@raubtier-asyl.eternal-september.org>
 <8634q5hjsp.fsf@linuxsc.com>
 <v2vmhr$3ffjk$1@raubtier-asyl.eternal-september.org>
 <86le3wfsmd.fsf@linuxsc.com>
 <v2voe7$3fr50$1@raubtier-asyl.eternal-september.org>
 <86ed9ofq14.fsf@linuxsc.com> <20240605005916.00001b33@yahoo.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 07 Jun 2024 20:53:46 +0200 (CEST)
Injection-Info: raubtier-asyl.eternal-september.org; posting-host="faca01816453e2890dcc5a8627425084";
	logging-data="2302306"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX19eWHlpkZmCPSoRzeEu1yLtYAAbkmyjEec="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:YcDsDLZaMT9IbITV/ZlWWsEvjQg=
In-Reply-To: <20240605005916.00001b33@yahoo.com>
Content-Language: de-DE
Bytes: 5521

I tested the AES hash function with a C++20 program. The hash-function
is also re-written in C++-style:

#include <iostream>
#include <utility>
#include <vector>
#include <cstdint>
#include <algorithm>
#include <execution>
#include <thread>
#include <span>
#include <intrin.h>

using namespace std;

uint64_t MichaelsHash( uint64_t key );

int main()
{
	static_assert(sizeof(size_t) == 8,"");
	// number of buckets
#if defined(NDEBUG)
	constexpr ptrdiff_t N_BUCKETS = 1ull << 32;
#else
	constexpr ptrdiff_t N_BUCKETS = 1 << 16;
#endif
	// don't zero-initialize update-array
	union ndi_sz { size_t s; ndi_sz() {} };
	vector<ndi_sz> rawUpdates( N_BUCKETS );
	// native size_t span to update-array
	span<size_t> updates( &rawUpdates[0].s, N_BUCKETS );
	int hc = thread::hardware_concurrency();
	// thread partitition end to update-array
	ptrdiff_t end = N_BUCKETS;
	// partitition index to update array
	ptrdiff_t p = hc;
	vector<jthread> threads;
	threads.reserve( hc );
	do
	{
		// begin to update array
		ptrdiff_t begin = (ptrdiff_t)((double)--p / hc * N_BUCKETS);
		threads.emplace_back( [&]( size_t begin, size_t end )
			{
				for( ; begin != end; ++begin )
					updates[begin] = MichaelsHash( begin ) % N_BUCKETS;
			}, begin, end );
		// next end to update array is the begin before
		end = begin;
	} while( p > 0 );
	// wait for all threads to finish
	threads.resize( 0 );
	// sort update array parallel
	sort( execution::parallel_policy(), updates.begin(), updates.end() );
	vector<uint8_t> buckets( N_BUCKETS );
	// update buckets according to the update-list
	for( size_t u : updates )
		if( !++buckets[u] )
			return EXIT_FAILURE;
	// release memory of the update list
	rawUpdates.clear();
	rawUpdates.shrink_to_fit();
	// sum up loads
	vector<size_t> loads;
	for( size_t bucketLoad : buckets )
	{
		if( loads.size() <= bucketLoad )
			loads.resize( bucketLoad + 1 );
		++loads[bucketLoad];
	}
	// print loads percentage
	for( size_t ld = 0; ptrdiff_t nLoad : loads )
		cout << ld++ << ": " << (double)nLoad / (ptrdiff_t)buckets.size() << endl;
}

uint64_t MichaelsHash( uint64_t key )
{
	__m128i xkey = _mm_set_epi64x( key, 42 );
	using bar_t = pair<uint64_t, uint64_t>;
	static bar_t const bars[8] =
	{
		{ 0xBB09BBCC90B24BF2, 0x825C622FF2792A01 },
		{ 0x94F0535CB06D4060, 0x939C756246DBFD1D },
		{ 0x5B835E01A7E14CA1, 0xAC2BDAFC023CDD06 },
		{ 0xE0B6A4735B774AEC, 0x9CAFB43E7DDE494C },
	};
	for( bar_t const &bar : bars )
	{
		__m128i rkey;
		rkey.m128i_u64[0] = bar.first;
		rkey.m128i_u64[1] = bar.second;
		xkey = _mm_aesenc_si128( xkey, rkey );
	}
	return xkey.m128i_u64[0];
}

The hash function is tested with 2 ^ 32 buckets. The problem with that
is that the bucket-counters are updated with total random memory access,
which is extremely slow. So I first prepare an update-list, wich is a
vector of size_t's with the indices of the buckets being updated later.
The memory access for this update-list is updated lineary, which is very
fast. The update-list is written with the available number of threads.
This list is sorted after that and quicksorting also has linear memory
acces. To improve that further the quicksort is done with parallel
execution. The update-list takes 2 ^ 32 size_t's, which is 32GiB. The
sorted update list ressults in linear updates of the buckets.
This sounds rather complex, but I first tried it with random updates
to the buckets on a single core and on my 7950X Zen4-CPU I stopped
after 20min.
The resulting load-depths are according to the coupon collector'
problem (1 / (e * N!)):

C:\Users\Boni\Documents\Source\MichaelAesHash>timep 
x64\Release\MichaelAesHash.exe
0: 36.7875%
1: 36.7884%
2: 18.3945%
3: 6.13053%
4: 1.53304%
5: 0.306619%
6: 0.0510489%
7: 0.00729787%
8: 0.000908948%
9: 0.000102189%
10: 1.07335e-05%
11: 8.14907e-07%
12: 4.65661e-08%
real   53.783s
user   11m:39.984s
sys    6.141s
cylces 3.446.315.603.052