Article <86v8166bl1.fsf@linuxsc.com>

Warning: mysqli::__construct(): (HY000/1203): User howardkn already has more than 'max_user_connections' active connections in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\includes\artfuncs.php on line 21
Failed to connect to MySQL: (1203) User howardkn already has more than 'max_user_connections' active connections
Warning: mysqli::query(): Couldn't fetch mysqli in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\index.php on line 66
Article <86v8166bl1.fsf@linuxsc.com>
Deutsch English Français Italiano
<86v8166bl1.fsf@linuxsc.com>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!weretis.net!feeder8.news.weretis.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Tim Rentsch <tr.17687@z991.linuxsc.com>
Newsgroups: comp.lang.c++
Subject: Re: Sieve of Erastosthenes optimized to the max
Date: Mon, 15 Jul 2024 06:15:06 -0700
Organization: A noiseless patient Spider
Lines: 112
Message-ID: <86v8166bl1.fsf@linuxsc.com>
References: <ul41d4$2koct$1@raubtier-asyl.eternal-september.org> <umgbci$3qpao$1@raubtier-asyl.eternal-september.org> <utlf3h$393l6$1@dont-email.me> <utn1fp$3oeks$2@raubtier-asyl.eternal-september.org> <utng4n$3rn1f$2@dont-email.me> <utoh9d$6lrr$1@raubtier-asyl.eternal-september.org> <utq0ag$hvrl$3@dont-email.me> <utq0os$ibqn$1@raubtier-asyl.eternal-september.org> <utq11p$icmm$1@dont-email.me> <v25c87$1ld9m$1@dont-email.me> <86r0duwqgg.fsf@linuxsc.com> <v39o3l$1lvju$1@dont-email.me> <86o78mpnlf.fsf@linuxsc.com> <v3fv2u$2ursd$1@dont-email.me> <86ttibod7n.fsf@linuxsc.com> <86h6eaoi2r.fsf@linuxsc.com> <v4sool$1grge$1@dont-email.me> <867celixcw.fsf@linuxsc.com> <v5sg9s$mat4$1@dont-email.me> <86zfr0b9hw.fsf@linuxsc.com> <v638ud$25623$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Injection-Date: Mon, 15 Jul 2024 15:15:13 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="819b551f7a7bb17b7686f7af35a87b55";
	logging-data="758447"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1+0UGJEPrBl/zWgq2iSn4YJPjgemAakkDs="
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux)
Cancel-Lock: sha1:j2Xcbi0MpYQ3GKoqoDev2a5g7Ig=
	sha1:CjneB83FTPzS2rSIrrsdnTcmytU=
Bytes: 7230

Vir Campestris <vir.campestris@invalid.invalid> writes:

> On 02/07/2024 07:20, Tim Rentsch wrote:
>
>> I got the code.  There were some line wrappings but I just went
>> through and fixed those by hand.  After fixing the problem line
>> wraps I was able to compile and run the code (I used std=c++17,
>> which apparently works okay).  I did a simple check to make sure
>> the primes being found are right, which they do indeed seem to
>> be.  I haven't really experimented with changing the code;  I
>> only just ran it with various bounds on what primes to find,
>> mostly to get a sense of the performance.
>>
>> I admit I don't understand the scheme the code is using.  It
>> looks to me like the code is doing divisions and remainders a
>> lot, which my code basically doesn't do at all (it does do one
>> division for each prime up to the square root of the limit of
>> primes being sought, but that's all).  I know the big template
>> near the beginning is crucial to understanding what the code is
>> doing, but I haven't been able to figure out what abstraction
>> that template is supposed to represent.  The core of my program
>> is between 50 and 60 lines, and all pretty straightforward.
>>
>> Is there some suggestion you could make or a question you would
>> like to ask?  What can I do that might be helpful?
>
> I checked.  The modulus operations (%) are easy to check for.
>
> There are two for each prime number.  Not for each multiple of it, but
> for the primes themselves.
>
> And there's one in the "get" function that checks to see if a number
> is prime.
>
> The complexity is at least partly due to an optimisation.

It's taken me a while to get into this.  I think I understand
your code fairly well now.  My idea is to respond in pieces,
starting with the simpler aspects first.

> You'll recall perhaps that Bonita's original code preset the entire
> bitmap to aa - which is faster than setting all the numbers divisible
> by 2.  I pointed out that you don't even need to set them, and set off
> to write code that stored odd numbers only.
>
> But then I realised that if you store the mask for 3 only you get a
> pattern.  It has one odd word at the beginning, then a repeating
> pattern of 3 words.  It doesn't matter what the word size is.
>
> This is equally true of 5, 7, 11 and the first few primes.  It's also
> true if you set the mask up for 3 and 5 together - except this time
> the repeat length is 15.
>
> And it's far faster to duplicate those 3, or 15, or 3*5*7, words than
> to do each prime individually.
>
> There comes a point where it isn't worth it though - the repeat length
> for all the primes up to 31 is 1.5E12.
>
> Exactly when it stops being worthwhile isn't obvious.
>
> Then you came along with the mod30 suggestion.  There is still a repeat
> length though, and it's still worth merging masks for small primes
> together rather than doing each one individually.  But the code is
> bigger.  MUCH bigger.

I am assuming a mod30 representation in all of my followup
comments.  Just to be explicit, there is an array of 8-bit
bytes, with bytes[i] indicating primeness or compositeness
for the 8 numbers 30*i + { 1, 7, 11, 13, 17, 19, 23, 29 },
one bit for each of those 8 possibilities.  (There may be
optimizations if the array elements are 64-bit quantities
rather than 8-bit quantities, but let's ignore that for
now.)

There are two plausible strategies for "pre-setting" the
multiples of small primes.  The first strategy uses two stages.
The first stage makes a mask for the four primes 7, 11, 13, and
17, which has 17017 elements, and simply replicates that mask
over and over through the entire array.  The second stage makes a
mask for the three primes 19, 23, and 29, which has 12673
elements, and uses bitwise operations to combine that mask
throughout the primes array.  Incidentally the convention I chose
is that a 1 bit means the number is definitely composite, so the
array could be initialized with all zero bits.

The second strategy makes a mask for all seven of 7, 11, 13, 17,
19, 23, and 29, which has 215656441 elements.  This mask is then
copied over and over throughout the array.  Because 215656441 is
quite a bit larger than your L1 cache number, it might be a win
to divide the mask into strips of size (L1 cache / 2), and copy
each strip separately, to get better cache performance.

Note that for the "setting" step (that is, the second strategy
or the first stage of the first strategy), the mask can be formed
in the initial part of the prime/composite array.  After taking
care of all multiples of the seven primes in the initial byte,
the initial byte should be set to indicate that those numbers
are prime, since otherwise they would be set wrongly.

I haven't experimented to see which of the above ideas has better
performance.  Like you say it's the small primes that matter,
because they have so many multiples, and it isn't obvious where
the cutoff should be.  But it's clear that every prime in the
first byte is worth doing, so you might try playing around with
just that part to see which approach does better.  I suggest
using a primes array that is as large as the memory on your test
system can accommodate without swapping.

I might be able to write followon segments at the rate of one per
day or so.  I'm not expecting to do them any faster than that,
and very likely will miss some days because of other activities.