Path: ...!news.mixmin.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Tim Rentsch Newsgroups: comp.lang.c++ Subject: Re: OT: Re: Sieve of Erastosthenes optimized to the max Date: Sat, 10 Aug 2024 07:07:07 -0700 Organization: A noiseless patient Spider Lines: 224 Message-ID: <868qx45v5g.fsf@linuxsc.com> References: <86r0duwqgg.fsf@linuxsc.com> <86o78mpnlf.fsf@linuxsc.com> <86ttibod7n.fsf@linuxsc.com> <86h6eaoi2r.fsf@linuxsc.com> <867celixcw.fsf@linuxsc.com> <86zfr0b9hw.fsf@linuxsc.com> <861q3o5do1.fsf@linuxsc.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Injection-Date: Sat, 10 Aug 2024 16:07:10 +0200 (CEST) Injection-Info: dont-email.me; posting-host="7d9cce6c3ad2abe5c41dd742cdd34d80"; logging-data="723870"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18suUC29n7H54MtoDtMaYwGkcqrZ3rTkgU=" User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux) Cancel-Lock: sha1:hIkOlG/YmN9eZECVc/8guGnLRO8= sha1:IGBNWWa0JvMI0PPpRSsgIO1X9JQ= Bytes: 10922 Vir Campestris writes: > On 23/07/2024 15:34, Tim Rentsch wrote: > >> Vir Campestris writes: >> >> [...] >> >> I was hoping for some feedback, even if very brief, >> before I continue. > > Sorry, I was slow. It's really odd, I'm retired, and I ought to > have lots of time for things like this... and yet I seem to be > busy all the time. No worries, I understand. I'm sorry to be so long in responding. Among other things there was some sort of snafu with my news server, resulting in hundreds of lost and/or duplicated messages, and that took a while to sort out. > On 20/07/2024 15:41, Tim Rentsch wrote: > >> Vir Campestris writes: >> >>> On 02/07/2024 07:20, Tim Rentsch wrote: [...] >> Both divide and mod can be simplified if we use a different >> representation for numbers. The idea is to separate the >> number into a mod 30 part and divided by 30 part. An >> arbitrary number N can be split into an ordered pair >> >> N/30 , N%30 >> >> stored in, let's say, an 8 byte quantity with seven bytes >> for N/30 and one byte for N%30. Let me call these two >> parts i and a for one number N, and j and b for a second >> number M. Then two form N*M we need to find > > I suspect the cost of extracting the divided value from the 7 > bytes will be prohibitive. You may find it's best to have one > table for the N/30 parts (big entries, word aligned) and another > for the N%30 part (small entries, byte aligned). BICBW. Two tables, absolutely. What does BICBW stand for? >> (i*30+a) * (j*30+b) >> >> which is >> >> (i*30*j*30) + (i*30*b) + (j*30*a) + a*b >> >> Of course we want to take the product and express it in >> the form (product/30, product%30), which can be done by >> >> 30*i*j + i*b + j*a + (a*b)/30, (a*b)%30 >> >> The two numbers a and b are both under 30, so a*b/30 and >> a*b%30 can be done by lookup in a small array. >> >> 30*i*j + i*b + j*a + divide_30[a][b], modulo_30[a][b] >> >> Now all operations can be done using just multiplies and >> table lookups. Dividing by 30 is just taking the first >> component; remainder 30 is just taking the second component. >> >> One further refinement: for numbers relatively prime to 2, >> 3, and 5, there are only 8 possibles, so the modulo 30 >> component can be reduced to just three bits. That makes the >> tables smaller, at a cost of needing a table lookup to find >> and 'a' or 'b' part. >> >> Does this all make sense? > > Yes, it does. It's not a direction I was thinking of at all, > although I too had table lookups in mind. > > What I had considered is an extrapolation from my original code. I read the comments below in conjunction with getting a better understanding of your original code, which is part of what took me so long before responding. Before I get to the stuff below (which I think I will save for a second reply), here is some code expanding on what I said above. It should compile cleanly as is. However, its purpose is to illustrate what I said above. typedef unsigned long NN, Index, Count, Mask; typedef unsigned char Byte; static void remove_multiples( Index, Count, Index ); static Byte bytes[1000000000]; static NN mods[] = { 1, 7, 11, 13, 17, 19, 23, 29, }; Count sieve( NN limit ){ Count r = 0; Index i; Count shift; Index i_limit = (limit + 29)/30; bytes[0] = 1; for( i = 0; i < i_limit; i++ ){ Byte bits = bytes[i]; for( shift = 0; shift < 8; shift++ ){ NN p = i*30 + mods[shift]; if( bits & 1<= limit ) continue; r++; // count how many primes found // could print the prime 'p' here if desired if( p*p < limit ) remove_multiples( i, shift, i_limit ); } } return r; } static Byte shift_for[30] = { 9,0,9,9,9,9, 9,1,9,9,9,2, 9,3,9,9,9,4, 9,5,9,9,9,6, 9,9,9,9,9,7, }; static Byte ab_div_30[8][8] = { { 1*1/30, 1*7/30, 1*11/30, 1*13/30, 1*17/30, 1*19/30, 1*23/30, 1*29/30 }, { 7*1/30, 7*7/30, 7*11/30, 7*13/30, 7*17/30, 7*19/30, 7*23/30, 7*29/30 }, { 11*1/30,11*7/30,11*11/30,11*13/30,11*17/30,11*19/30,11*23/30,11*29/30 }, { 13*1/30,13*7/30,13*11/30,13*13/30,13*17/30,13*19/30,13*23/30,13*29/30 }, { 17*1/30,17*7/30,17*11/30,17*13/30,17*17/30,17*19/30,17*23/30,17*29/30 }, { 19*1/30,19*7/30,19*11/30,19*13/30,19*17/30,19*19/30,19*23/30,19*29/30 }, { 23*1/30,23*7/30,23*11/30,23*13/30,23*17/30,23*19/30,23*23/30,23*29/30 }, { 29*1/30,29*7/30,29*11/30,29*13/30,29*17/30,29*19/30,29*23/30,29*29/30 }, }; static Byte ab_mod_30[8][8] = { { 1*1%30, 1*7%30, 1*11%30, 1*13%30, 1*17%30, 1*19%30, 1*23%30, 1*29%30 }, { 7*1%30, 7*7%30, 7*11%30, 7*13%30, 7*17%30, 7*19%30, 7*23%30, 7*29%30 }, { 11*1%30,11*7%30,11*11%30,11*13%30,11*17%30,11*19%30,11*23%30,11*29%30 }, { 13*1%30,13*7%30,13*11%30,13*13%30,13*17%30,13*19%30,13*23%30,13*29%30 }, { 17*1%30,17*7%30,17*11%30,17*13%30,17*17%30,17*19%30,17*23%30,17*29%30 }, { 19*1%30,19*7%30,19*11%30,19*13%30,19*17%30,19*19%30,19*23%30,19*29%30 }, { 23*1%30,23*7%30,23*11%30,23*13%30,23*17%30,23*19%30,23*23%30,23*29%30 }, { 29*1%30,29*7%30,29*11%30,29*13%30,29*17%30,29*19%30,29*23%30,29*29%30 }, }; void remove_multiples( Index i, Count shift_a, Index i_limit ){ Index j, k; NN a = mods[shift_a], b, c; Count shift; for( shift = shift_a; shift < 8; shift++ ){ j = i; b = mods[shift]; k = i*j*30 + i*b + j*a + ab_div_30[ shift_a ][ shift ]; c = ab_mod_30[ shift_a ][ shift ]; if( k >= i_limit ) continue; bytes[k] |= 1 << shift_for[c]; } for( j = i+1; (30UL*i+a)*(30UL*j+1) < i_limit * 30; j++ ){ for( shift = 0; shift < 8; shift++ ){ b = mods[shift]; k = i*j*30 + i*b + j*a + ab_div_30[ shift_a ][ shift ]; c = ab_mod_30[ shift_a ][ shift ]; if( k >= i_limit ) continue; bytes[k] |= 1 << shift_for[c]; } } } In both sieve() and remove_multiples(), there is an inner loop that ranges over the possible shift values. (There is also an unnested loop in remove_multiples() that ranges over a subset of shift values, but for now please ignore that.) Since these loops always perform exactly eight iterations, they can be completely unrolled without expanding the code too much. Here is a revised version of the outer loop in sieve() (actually the code below is not yet ready for compilation, because some things have changed): ========== REMAINDER OF ARTICLE TRUNCATED ==========