Article <868qx45v5g.fsf@linuxsc.com>

Deutsch English Français Italiano
<868qx45v5g.fsf@linuxsc.com>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!news.mixmin.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Tim Rentsch <tr.17687@z991.linuxsc.com>
Newsgroups: comp.lang.c++
Subject: Re: OT: Re: Sieve of Erastosthenes optimized to the max
Date: Sat, 10 Aug 2024 07:07:07 -0700
Organization: A noiseless patient Spider
Lines: 224
Message-ID: <868qx45v5g.fsf@linuxsc.com>
References: <ul41d4$2koct$1@raubtier-asyl.eternal-september.org> <utn1fp$3oeks$2@raubtier-asyl.eternal-september.org> <utng4n$3rn1f$2@dont-email.me> <utoh9d$6lrr$1@raubtier-asyl.eternal-september.org> <utq0ag$hvrl$3@dont-email.me> <utq0os$ibqn$1@raubtier-asyl.eternal-september.org> <utq11p$icmm$1@dont-email.me> <v25c87$1ld9m$1@dont-email.me> <86r0duwqgg.fsf@linuxsc.com> <v39o3l$1lvju$1@dont-email.me> <86o78mpnlf.fsf@linuxsc.com> <v3fv2u$2ursd$1@dont-email.me> <86ttibod7n.fsf@linuxsc.com> <86h6eaoi2r.fsf@linuxsc.com> <v4sool$1grge$1@dont-email.me> <867celixcw.fsf@linuxsc.com> <v5sg9s$mat4$1@dont-email.me> <86zfr0b9hw.fsf@linuxsc.com> <v638ud$25623$1@dont-email.me> <861q3o5do1.fsf@linuxsc.com> <v7tdts$29195$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Injection-Date: Sat, 10 Aug 2024 16:07:10 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="7d9cce6c3ad2abe5c41dd742cdd34d80";
	logging-data="723870"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX18suUC29n7H54MtoDtMaYwGkcqrZ3rTkgU="
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux)
Cancel-Lock: sha1:hIkOlG/YmN9eZECVc/8guGnLRO8=
	sha1:IGBNWWa0JvMI0PPpRSsgIO1X9JQ=
Bytes: 10922

Vir Campestris <vir.campestris@invalid.invalid> writes:

> On 23/07/2024 15:34, Tim Rentsch wrote:
>
>> Vir Campestris <vir.campestris@invalid.invalid> writes:
>>
>> [...]
>>
>> I was hoping for some feedback, even if very brief,
>> before I continue.
>
> Sorry, I was slow.  It's really odd, I'm retired, and I ought to
> have lots of time for things like this... and yet I seem to be
> busy all the time.

No worries, I understand.

I'm sorry to be so long in responding.  Among other things
there was some sort of snafu with my news server, resulting
in hundreds of lost and/or duplicated messages, and that
took a while to sort out.

> On 20/07/2024 15:41, Tim Rentsch wrote:
>
>> Vir Campestris <vir.campestris@invalid.invalid> writes:
>>
>>> On 02/07/2024 07:20, Tim Rentsch wrote:
[...]
>> Both divide and mod can be simplified if we use a different
>> representation for numbers.  The idea is to separate the
>> number into a mod 30 part and divided by 30 part.  An
>> arbitrary number N can be split into an ordered pair
>>
>>     N/30 , N%30
>>
>> stored in, let's say, an 8 byte quantity with seven bytes
>> for N/30 and one byte for N%30.  Let me call these two
>> parts i and a for one number N, and j and b for a second
>> number M.  Then two form N*M we need to find
>
> I suspect the cost of extracting the divided value from the 7
> bytes will be prohibitive.  You may find it's best to have one
> table for the N/30 parts (big entries, word aligned) and another
> for the N%30 part (small entries, byte aligned).  BICBW.

Two tables, absolutely.  What does BICBW stand for?

>>     (i*30+a) * (j*30+b)
>>
>> which is
>>
>>     (i*30*j*30) + (i*30*b) + (j*30*a) + a*b
>>
>> Of course we want to take the product and express it in
>> the form (product/30, product%30), which can be done by
>>
>>     30*i*j + i*b + j*a + (a*b)/30,   (a*b)%30
>>
>> The two numbers a and b are both under 30, so a*b/30 and
>> a*b%30 can be done by lookup in a small array.
>>
>>     30*i*j + i*b + j*a + divide_30[a][b],   modulo_30[a][b]
>>
>> Now all operations can be done using just multiplies and
>> table lookups.  Dividing by 30 is just taking the first
>> component;  remainder 30 is just taking the second component.
>>
>> One further refinement:  for numbers relatively prime to 2,
>> 3, and 5, there are only 8 possibles, so the modulo 30
>> component can be reduced to just three bits.  That makes the
>> tables smaller, at a cost of needing a table lookup to find
>> and 'a' or 'b' part.
>>
>> Does this all make sense?
>
> Yes, it does.  It's not a direction I was thinking of at all,
> although I too had table lookups in mind.
>
> What I had considered is an extrapolation from my original code.

I read the comments below in conjunction with getting a better
understanding of your original code, which is part of what took
me so long before responding.  Before I get to the stuff below
(which I think I will save for a second reply), here is some code
expanding on what I said above.  It should compile cleanly as is.
However, its purpose is to illustrate what I said above.

  typedef unsigned long NN, Index, Count, Mask;
  typedef unsigned char Byte;

  static  void    remove_multiples( Index, Count, Index );

  static  Byte    bytes[1000000000];
  static  NN      mods[] = { 1, 7, 11, 13, 17, 19, 23, 29, };

  Count
  sieve( NN limit ){
      Count r = 0;
      Index i;
      Count shift;
      Index i_limit = (limit + 29)/30;

      bytes[0] = 1;
      for(  i = 0;  i < i_limit;  i++  ){
          Byte bits = bytes[i];
          for(  shift = 0;  shift < 8;  shift++  ){
              NN p = i*30 + mods[shift];
              if(  bits & 1<<shift  )  continue;
              if(  p >= limit  )  continue;
              r++;        // count how many primes found
                          // could print the prime 'p' here if desired
              if(  p*p < limit  )  remove_multiples( i, shift, i_limit );
          }
      }

      return  r;
  }

  static  Byte    shift_for[30] = {
      9,0,9,9,9,9,  9,1,9,9,9,2,  9,3,9,9,9,4,  9,5,9,9,9,6,  9,9,9,9,9,7,
  };

  static  Byte  ab_div_30[8][8] = {
   {  1*1/30, 1*7/30, 1*11/30, 1*13/30, 1*17/30, 1*19/30, 1*23/30, 1*29/30 },
   {  7*1/30, 7*7/30, 7*11/30, 7*13/30, 7*17/30, 7*19/30, 7*23/30, 7*29/30 },
   { 11*1/30,11*7/30,11*11/30,11*13/30,11*17/30,11*19/30,11*23/30,11*29/30 },
   { 13*1/30,13*7/30,13*11/30,13*13/30,13*17/30,13*19/30,13*23/30,13*29/30 },
   { 17*1/30,17*7/30,17*11/30,17*13/30,17*17/30,17*19/30,17*23/30,17*29/30 },
   { 19*1/30,19*7/30,19*11/30,19*13/30,19*17/30,19*19/30,19*23/30,19*29/30 },
   { 23*1/30,23*7/30,23*11/30,23*13/30,23*17/30,23*19/30,23*23/30,23*29/30 },
   { 29*1/30,29*7/30,29*11/30,29*13/30,29*17/30,29*19/30,29*23/30,29*29/30 },
  };

  static  Byte  ab_mod_30[8][8] = {
   {  1*1%30, 1*7%30, 1*11%30, 1*13%30, 1*17%30, 1*19%30, 1*23%30, 1*29%30 },
   {  7*1%30, 7*7%30, 7*11%30, 7*13%30, 7*17%30, 7*19%30, 7*23%30, 7*29%30 },
   { 11*1%30,11*7%30,11*11%30,11*13%30,11*17%30,11*19%30,11*23%30,11*29%30 },
   { 13*1%30,13*7%30,13*11%30,13*13%30,13*17%30,13*19%30,13*23%30,13*29%30 },
   { 17*1%30,17*7%30,17*11%30,17*13%30,17*17%30,17*19%30,17*23%30,17*29%30 },
   { 19*1%30,19*7%30,19*11%30,19*13%30,19*17%30,19*19%30,19*23%30,19*29%30 },
   { 23*1%30,23*7%30,23*11%30,23*13%30,23*17%30,23*19%30,23*23%30,23*29%30 },
   { 29*1%30,29*7%30,29*11%30,29*13%30,29*17%30,29*19%30,29*23%30,29*29%30 },
  };

  void
  remove_multiples( Index i, Count shift_a, Index i_limit ){
      Index j, k;
      NN a = mods[shift_a], b, c;
      Count shift;

      for(  shift = shift_a;  shift < 8;  shift++  ){
          j = i;
          b = mods[shift];
          k = i*j*30 + i*b + j*a + ab_div_30[ shift_a ][ shift ];
          c = ab_mod_30[ shift_a ][ shift ];
          if(  k >= i_limit  )  continue;
          bytes[k] |= 1 << shift_for[c];
      }

      for(  j = i+1;  (30UL*i+a)*(30UL*j+1) < i_limit * 30;  j++  ){
          for(  shift = 0;  shift < 8;  shift++  ){
              b = mods[shift];
              k = i*j*30 + i*b + j*a + ab_div_30[ shift_a ][ shift ];
              c = ab_mod_30[ shift_a ][ shift ];
              if(  k >= i_limit  )  continue;
              bytes[k] |= 1 << shift_for[c];
          }
      }
  }

In both sieve() and remove_multiples(), there is an inner loop
that ranges over the possible shift values.  (There is also an
unnested loop in remove_multiples() that ranges over a subset of
shift values, but for now please ignore that.)

Since these loops always perform exactly eight iterations, they
can be completely unrolled without expanding the code too much.
Here is a revised version of the outer loop in sieve() (actually
the code below is not yet ready for compilation, because some
things have changed):

========== REMAINDER OF ARTICLE TRUNCATED ==========