Deutsch English Français Italiano |
<86o78mpnlf.fsf@linuxsc.com> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Tim Rentsch <tr.17687@z991.linuxsc.com> Newsgroups: comp.lang.c++ Subject: Re: Sieve of Erastosthenes optimized to the max Date: Thu, 30 May 2024 22:17:00 -0700 Organization: A noiseless patient Spider Lines: 138 Message-ID: <86o78mpnlf.fsf@linuxsc.com> References: <ul41d4$2koct$1@raubtier-asyl.eternal-september.org> <um31o0$1edq3$1@dont-email.me> <um4f1l$1l1c2$1@raubtier-asyl.eternal-september.org> <um7haa$274oh$3@dont-email.me> <um8vlp$2h8oc$1@raubtier-asyl.eternal-september.org> <uma7i4$2nagg$1@dont-email.me> <umdmkf$3clht$1@raubtier-asyl.eternal-september.org> <umdovb$3cmi3$2@dont-email.me> <ume6ap$3ecjk$1@raubtier-asyl.eternal-september.org> <umfcqi$3jktj$2@dont-email.me> <umgbci$3qpao$1@raubtier-asyl.eternal-september.org> <utlf3h$393l6$1@dont-email.me> <utn1fp$3oeks$2@raubtier-asyl.eternal-september.org> <utng4n$3rn1f$2@dont-email.me> <utoh9d$6lrr$1@raubtier-asyl.eternal-september.org> <utq0ag$hvrl$3@dont-email.me> <utq0os$ibqn$1@raubtier-asyl.eternal-september.org> <utq11p$icmm$1@dont-email.me> <v25c87$1ld9m$1@dont-email.me> <86r0duwqgg.fsf@linuxsc.com> <v39o3l$1lvju$1@dont-email.me> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Injection-Date: Fri, 31 May 2024 07:17:02 +0200 (CEST) Injection-Info: dont-email.me; posting-host="75c895ee66ef08392f27be5b3e8de353"; logging-data="2217286"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/Qq+3ezgHNJ4z7nA9fuB+rMr8AGXuuHxk=" User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux) Cancel-Lock: sha1:ZHLf4CCRd48kqR0o5rP3xnG5rKM= sha1:zN8OObR2P+zfW7UhgsGqLt0GVOE= Bytes: 7121 Vir Campestris <vir.campestris@invalid.invalid> writes: > On 22/05/2024 03:06, Tim Rentsch wrote: > >> Vir Campestris <vir.campestris@invalid.invalid> writes: >> >>> I've been playing with this. Has it really been this long? I >>> ought to have more time than this... [...] >>> >>> I missed Tim's Mod 30 trick, and that might well help. But I >>> think I'm bored with this. It would save a lot of memory, but >>> the extra computation might make it slower. >> >> Two comments. Using the mod 30 encoding can be speed competitive >> with simpler encodings. The second comment is, it isn't just >> that less memory is used, but that more primes can be found. >> Bonita brags about how fast some code is, but it's an apples >> and oranges comparison - that code doesn't do the job because >> it runs into a memory limit way sooner than using a mod 30 >> encoding, and trying to run past that limit would make the >> code R E A L L Y S L O W. >> >>> Maybe I'll try one day ;) >> >> I hope you do. If you have some difficulty seeing how to >> avoid the seeming speed penalties, feel free to ask, I am >> happy to help. > > I think I've come to the end of what my code will do. [...] A few miscellaneous responses... > Firstly the C++ bit: > > std::vector<bool> is much too slow to be any use. > [various alternatives] I simply used a static array, more accurately a union of two arrays (code here is approximate only): #define LARGEST_N 2400000000000 // primes less than 2.4e12 union { unsigned char bytes[ LARGEST_N / 30 ]; unsigned long long ulls[ LARGEST_N / 30 / 8 ]; } all; > That's the end of the on-topic part. > > I don't store even numbers. That's an optimisation, and is of course > equivalent to Tim's Mod 30 trick, but with mod 2 instead. And it > doesn't require an divide which may be slow. Like I keep saying, keeping the bits in a mod 30 representation doesn't need any divides (at least not in the main loop, where multiples are being computed). > When I mask out just 3 I get a pattern like this: > > 2492492492492490 > > There is a single nybble at the end of that which is zero, but the > rest is a 3-nybble repeat. [similarly for 3 and 5, etc...] In a mod 30 representation, the multiples of 7 repeat after 7 bytes. If we replicate those 7 bytes 8 times, we get 7 unsigned long longs. Those 7 ULLs can be combined with logic operations into the all.ulls[] array. Similarly for 11, 13, etc, for as high as is reasonable (I think I topped out at 29). > However I can build a mask from the ones for 3 and 5, which has a > repeat of 15 (3*5) > 6692cd259a4b3490, > 96692cd259a4b349,496692cd259a4b34,3496692cd259a4b3,b3496692cd259a4b, > 4b3496692cd259a4,a4b3496692cd259a,9a4b3496692cd259,59a4b3496692cd25, > 259a4b3496692cd2,d259a4b3496692cd,cd259a4b3496692c,2cd259a4b3496692, > 92cd259a4b349669,692cd259a4b34966,6692cd259a4b3496 > > and copy that. > > In fact I can do that with any number of single prime masks I want. > [...] I thought about doing something along these lines but decided it was too much work. So I just did the blocks of 7*8 bytes, 11*8 bytes, 13*8 bytes, etc, up to some fairly small prime, then switched to an alternate method where individual bytes are addressed and modified. > Note that although this has been quite a fun thing to play with it's > had almost no C++ content. In fact I found that to get best > performance I had to drop down to C type code. Right. I used straight C, and I don't see any part of the program that would benefit from having C++. > I kept a class to > manage my buffers though, it's much easier to drop an object that > manages a malloc-ed buffer than to remember exactly when I need to > call free. Correctness trumps speed. I used static allocation for the big bitmap, and small local arrays (less than 50 * 8 bytes) for the small temporaries. No malloc()s or free()s to worry about. Initialized to zero without any code needed. > Now Mod 30 - by storing the odd numbers only I'm halving the storage, > and I don't need to do an expensive divide. It's only shifts and ANDs. > > If I use mod 30 (30 = 2*3*5) conveniently there are 8 candidate > numbers I need to store a mask for. By special casing each of the possibilities (there are 8*8 == 64 of them), only multiplies and adds are needed, and no shifts -- masks are evaluated to compile-time constants. > Sadly my computer likes 64 bit > operations better. But the required storage is now only 27% of the > original mask if all numbers are put in it. Once the primes get above a fairly small number, the density of places to zap is low enough so that working in bytes is okay. In particular once we are looking at primes above 240 there is less than one bit per 64 bit ULL, which makes using bytes rather than ULLs less painful > I can carry on with this, but the storage doesn't have any more nice > sizes. [...] Yeah. It's a happy coincidence that mod 30 needs 8 bits per block-of-30, and unfortunately no other nice size in sight for larger prime-set mods. > Maybe I'll play with it. But the rain stopped, and I must get on > with the weeding! Now you have something to look forward to the next time it rains. :)