Article <v39o3l$1lvju$1@dont-email.me>

Warning: mysqli::__construct(): (HY000/1203): User howardkn already has more than 'max_user_connections' active connections in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\includes\artfuncs.php on line 21
Failed to connect to MySQL: (1203) User howardkn already has more than 'max_user_connections' active connections
Warning: mysqli::query(): Couldn't fetch mysqli in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\index.php on line 66
Article <v39o3l$1lvju$1@dont-email.me>
Deutsch English Français Italiano
<v39o3l$1lvju$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!news.nobody.at!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Vir Campestris <vir.campestris@invalid.invalid>
Newsgroups: comp.lang.c++
Subject: Re: Sieve of Erastosthenes optimized to the max
Date: Thu, 30 May 2024 12:32:04 +0100
Organization: A noiseless patient Spider
Lines: 152
Message-ID: <v39o3l$1lvju$1@dont-email.me>
References: <ul41d4$2koct$1@raubtier-asyl.eternal-september.org>
 <um2dsb$17vgg$2@dont-email.me>
 <um2vpr$1e7pn$1@raubtier-asyl.eternal-september.org>
 <um31o0$1edq3$1@dont-email.me>
 <um4f1l$1l1c2$1@raubtier-asyl.eternal-september.org>
 <um7haa$274oh$3@dont-email.me>
 <um8vlp$2h8oc$1@raubtier-asyl.eternal-september.org>
 <uma7i4$2nagg$1@dont-email.me>
 <umdmkf$3clht$1@raubtier-asyl.eternal-september.org>
 <umdovb$3cmi3$2@dont-email.me>
 <ume6ap$3ecjk$1@raubtier-asyl.eternal-september.org>
 <umfcqi$3jktj$2@dont-email.me>
 <umgbci$3qpao$1@raubtier-asyl.eternal-september.org>
 <utlf3h$393l6$1@dont-email.me>
 <utn1fp$3oeks$2@raubtier-asyl.eternal-september.org>
 <utng4n$3rn1f$2@dont-email.me>
 <utoh9d$6lrr$1@raubtier-asyl.eternal-september.org>
 <utq0ag$hvrl$3@dont-email.me>
 <utq0os$ibqn$1@raubtier-asyl.eternal-september.org>
 <utq11p$icmm$1@dont-email.me> <v25c87$1ld9m$1@dont-email.me>
 <86r0duwqgg.fsf@linuxsc.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 30 May 2024 13:32:06 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="7fc8a3f5d0ad4666fe11428dc10997e5";
	logging-data="1769086"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1+OHv3SOCTMY76MWXxakco1h1aZu15mFu0="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:XRgytXmh8xwm1vLoOhpBC8dfM7k=
In-Reply-To: <86r0duwqgg.fsf@linuxsc.com>
Content-Language: en-GB
Bytes: 8082

On 22/05/2024 03:06, Tim Rentsch wrote:
> Vir Campestris <vir.campestris@invalid.invalid> writes:
> 
>> I've been playing with this.  Has it really been this long?  I
>> ought to have more time than this...  [...]
>>
>> I missed Tim's Mod 30 trick, and that might well help.  But I
>> think I'm bored with this.  It would save a lot of memory, but
>> the extra computation might make it slower.
> 
> Two comments.  Using the mod 30 encoding can be speed competitive
> with simpler encodings.  The second comment is, it isn't just
> that less memory is used, but that more primes can be found.
> Bonita brags about how fast some code is, but it's an apples
> and oranges comparison - that code doesn't do the job because
> it runs into a memory limit way sooner than using a mod 30
> encoding, and trying to run past that limit would make the
> code R E A L L Y  S L O W.
> 
>> Maybe I'll try one day ;)
> 
> I hope you do.  If you have some difficulty seeing how to
> avoid the seeming speed penalties, feel free to ask, I am
> happy to help.

I think I've come to the end of what my code will do. I'm not going to 
post it up (unless you really want it), but I'll put in some comments.

Firstly the C++ bit:

std::vector<bool> is much too slow to be any use. I rolled my own.

I played with std::vector<uint8_t>, uint32_t and uint64_t. The last was 
fastest.

But... if you allocate a std::vector it insists on initialising all the 
elements. That takes a while. I then played with reserve and 
emplace_back - and that was slow too. So I've rolled my own store.

Allocating a zero filled buffer of enormous size with calloc is 
surprisingly fast. I have a feeling (which I can't prove) that under the 
hood Linux is allocating one zero filled page to me, and setting up the 
page tables for the rest of the buffer as unwriteable. It then copies 
the zero buffer whenever a write fault happens.

Of course I can when needed use malloc, not calloc - sometimes I don't 
care what is in the buffer.

That's the end of the on-topic part.

I don't store even numbers. That's an optimisation, and is of course 
equivalent to Tim's Mod 30 trick, but with mod 2 instead. And it doesn't 
require an divide which may be slow.

When I mask out just 3 I get a pattern like this:

2492492492492490

There is a single nybble at the end of that which is zero, but the rest 
is a 3-nybble repeat. You see something similar with 5,

4210842108421080

except this time the repeat is 5 nybbles.

It doesn't matter what your word size is - nybble, byte, uint64_t - the 
repeat is always the prime. It also incidentally doesn't matter if you 
do store evens, the repeat is still the size of the prime.

Now 3 is the most expensive prime to write into the output mask. I have 
to write every third bit, which is a lot of operations. 5 is nearly as 
bad, and as the prime gets bigger the cost falls.

But... I don't have to write all those bits. I can just copy the 
repeating part of the mask. With uint64_t the mask for 3 is

2492492492492490,[9249249249249249,4924924924924924,2492492492492492]

I don't copy that first word, only the three repeats.

for 5 it's

4210842108421080,[8421084210842108,0842108421084210,1084210842108421,2108421084210842,4210842108421084]

I then realised that running an iterator over just 3 words was fairly 
expensive too - not as expensive as every bit, but it still hurts.

However I can build a mask from the ones for 3 and 5, which has a repeat 
of 15 (3*5)
6692cd259a4b3490,
96692cd259a4b349,496692cd259a4b34,3496692cd259a4b3,b3496692cd259a4b,
4b3496692cd259a4,a4b3496692cd259a,9a4b3496692cd259,59a4b3496692cd25,
259a4b3496692cd2,d259a4b3496692cd,cd259a4b3496692c,2cd259a4b3496692,
92cd259a4b349669,692cd259a4b34966,6692cd259a4b3496

and copy that.

In fact I can do that with any number of single prime masks I want.

It gets big quite quickly - for 3 to 23 the repeat length is 
111,546,435. I found that it was best to:

- Make a mask for the first 5 primes, 3 5 7 11 13.
- Make a mask from all of these. That has a repeat of 15015 words.
- Make masks for the next few primes, up to 31.
- Combine the mask from the 15015 and the others up to 31. That's about 
100E9 in size. (It's better to do that in two stages, otherwise we reset 
the index on the small ones really often.)
- Work out the biggest prime we have stored accurately. That will be 
just under 37 squared, 37 being the next one we haven't masked in
- Collect all the primes between 31 and 37^2
- Go though the buffer in chunks of about 16k words, marking the 
multiples of all these primes.
- We can now work out all the rest of the primes we need. Collect them, 
then go through the buffer again.
- It's not necessary to do this a third time.

Note that although this has been quite a fun thing to play with it's had 
almost no C++ content. In fact I found that to get best performance I 
had to drop down to C type code. I kept a class to manage my buffers 
though, it's much easier to drop an object that manages a malloc-ed 
buffer than to remember exactly when I need to call free. Correctness 
trumps speed.

Now Mod 30 - by storing the odd numbers only I'm halving the storage, 
and I don't need to do an expensive divide. It's only shifts and ANDs.

If I use mod 30 (30 = 2*3*5) conveniently there are 8 candidate numbers 
I need to store a mask for. Sadly my computer likes 64 bit operations 
better. But the required storage is now only 27% of the original mask if 
all numbers are put in it.

I can carry on with this, but the storage doesn't have any more nice 
sizes. The number of bits in each page are:
Prime, MOD value, number of bits per page, saving
2,	2	1,	50%
3,	6,	2,	33%
5,	30,	8,	27%
7,	210,	48,	23%
11,	2310,	480,	21%
13,	30030,	5760,	19%
17,	510510,	92160,	18%
19,	9699690,	1658880,	17%
23,	223092870,	36495360,	16%

That's probably best pasted into a spreadsheet as CSV to make the 
columns line up.

Maybe I'll play with it. But the rain stopped, and I must get on with 
the weeding!

Andy