Deutsch English Français Italiano |
<v3vkvp$268b2$1@raubtier-asyl.eternal-september.org> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!raubtier-asyl.eternal-september.org!.POSTED!not-for-mail From: Bonita Montero <Bonita.Montero@gmail.com> Newsgroups: comp.lang.c Subject: Re: Good hash for pointers Date: Fri, 7 Jun 2024 20:53:49 +0200 Organization: A noiseless patient Spider Lines: 132 Message-ID: <v3vkvp$268b2$1@raubtier-asyl.eternal-september.org> References: <v2n88p$1nlcc$1@dont-email.me> <v2qm8m$2el55$1@raubtier-asyl.eternal-september.org> <v2qnue$2evlu$1@dont-email.me> <v2r9br$2hva2$1@dont-email.me> <86fru6gsqr.fsf@linuxsc.com> <v2sudq$2trh1$1@raubtier-asyl.eternal-september.org> <8634q5hjsp.fsf@linuxsc.com> <v2vmhr$3ffjk$1@raubtier-asyl.eternal-september.org> <86le3wfsmd.fsf@linuxsc.com> <v2voe7$3fr50$1@raubtier-asyl.eternal-september.org> <86ed9ofq14.fsf@linuxsc.com> <20240605005916.00001b33@yahoo.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Date: Fri, 07 Jun 2024 20:53:46 +0200 (CEST) Injection-Info: raubtier-asyl.eternal-september.org; posting-host="faca01816453e2890dcc5a8627425084"; logging-data="2302306"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19eWHlpkZmCPSoRzeEu1yLtYAAbkmyjEec=" User-Agent: Mozilla Thunderbird Cancel-Lock: sha1:YcDsDLZaMT9IbITV/ZlWWsEvjQg= In-Reply-To: <20240605005916.00001b33@yahoo.com> Content-Language: de-DE Bytes: 5521 I tested the AES hash function with a C++20 program. The hash-function is also re-written in C++-style: #include <iostream> #include <utility> #include <vector> #include <cstdint> #include <algorithm> #include <execution> #include <thread> #include <span> #include <intrin.h> using namespace std; uint64_t MichaelsHash( uint64_t key ); int main() { static_assert(sizeof(size_t) == 8,""); // number of buckets #if defined(NDEBUG) constexpr ptrdiff_t N_BUCKETS = 1ull << 32; #else constexpr ptrdiff_t N_BUCKETS = 1 << 16; #endif // don't zero-initialize update-array union ndi_sz { size_t s; ndi_sz() {} }; vector<ndi_sz> rawUpdates( N_BUCKETS ); // native size_t span to update-array span<size_t> updates( &rawUpdates[0].s, N_BUCKETS ); int hc = thread::hardware_concurrency(); // thread partitition end to update-array ptrdiff_t end = N_BUCKETS; // partitition index to update array ptrdiff_t p = hc; vector<jthread> threads; threads.reserve( hc ); do { // begin to update array ptrdiff_t begin = (ptrdiff_t)((double)--p / hc * N_BUCKETS); threads.emplace_back( [&]( size_t begin, size_t end ) { for( ; begin != end; ++begin ) updates[begin] = MichaelsHash( begin ) % N_BUCKETS; }, begin, end ); // next end to update array is the begin before end = begin; } while( p > 0 ); // wait for all threads to finish threads.resize( 0 ); // sort update array parallel sort( execution::parallel_policy(), updates.begin(), updates.end() ); vector<uint8_t> buckets( N_BUCKETS ); // update buckets according to the update-list for( size_t u : updates ) if( !++buckets[u] ) return EXIT_FAILURE; // release memory of the update list rawUpdates.clear(); rawUpdates.shrink_to_fit(); // sum up loads vector<size_t> loads; for( size_t bucketLoad : buckets ) { if( loads.size() <= bucketLoad ) loads.resize( bucketLoad + 1 ); ++loads[bucketLoad]; } // print loads percentage for( size_t ld = 0; ptrdiff_t nLoad : loads ) cout << ld++ << ": " << (double)nLoad / (ptrdiff_t)buckets.size() << endl; } uint64_t MichaelsHash( uint64_t key ) { __m128i xkey = _mm_set_epi64x( key, 42 ); using bar_t = pair<uint64_t, uint64_t>; static bar_t const bars[8] = { { 0xBB09BBCC90B24BF2, 0x825C622FF2792A01 }, { 0x94F0535CB06D4060, 0x939C756246DBFD1D }, { 0x5B835E01A7E14CA1, 0xAC2BDAFC023CDD06 }, { 0xE0B6A4735B774AEC, 0x9CAFB43E7DDE494C }, }; for( bar_t const &bar : bars ) { __m128i rkey; rkey.m128i_u64[0] = bar.first; rkey.m128i_u64[1] = bar.second; xkey = _mm_aesenc_si128( xkey, rkey ); } return xkey.m128i_u64[0]; } The hash function is tested with 2 ^ 32 buckets. The problem with that is that the bucket-counters are updated with total random memory access, which is extremely slow. So I first prepare an update-list, wich is a vector of size_t's with the indices of the buckets being updated later. The memory access for this update-list is updated lineary, which is very fast. The update-list is written with the available number of threads. This list is sorted after that and quicksorting also has linear memory acces. To improve that further the quicksort is done with parallel execution. The update-list takes 2 ^ 32 size_t's, which is 32GiB. The sorted update list ressults in linear updates of the buckets. This sounds rather complex, but I first tried it with random updates to the buckets on a single core and on my 7950X Zen4-CPU I stopped after 20min. The resulting load-depths are according to the coupon collector' problem (1 / (e * N!)): C:\Users\Boni\Documents\Source\MichaelAesHash>timep x64\Release\MichaelAesHash.exe 0: 36.7875% 1: 36.7884% 2: 18.3945% 3: 6.13053% 4: 1.53304% 5: 0.306619% 6: 0.0510489% 7: 0.00729787% 8: 0.000908948% 9: 0.000102189% 10: 1.07335e-05% 11: 8.14907e-07% 12: 4.65661e-08% real 53.783s user 11m:39.984s sys 6.141s cylces 3.446.315.603.052