Path: ...!eternal-september.org!feeder2.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Vir Campestris Newsgroups: alt.comp.lang.c,comp.lang.c Subject: Re: OT: Re: A very slow program Date: Sat, 26 Oct 2024 15:28:24 +0100 Organization: A noiseless patient Spider Lines: 59 Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Date: Sat, 26 Oct 2024 16:28:25 +0200 (CEST) Injection-Info: dont-email.me; posting-host="1a34abd57199070a12913a6a127db11a"; logging-data="3944075"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/9JeyuimavyTkjJ5iDnQFFT8ae7UeRL/w=" User-Agent: Mozilla Thunderbird Cancel-Lock: sha1:C1l21PUL1phX7XvMtkYvuR5PavM= Content-Language: en-GB In-Reply-To: Bytes: 3607 On 21/10/2024 16:36, Bonita Montero wrote: > Am 20.10.2024 um 12:49 schrieb Vir Campestris: >> On 04/10/2024 05:53, Bonita Montero wrote: >>> Am 03.10.2024 um 18:06 schrieb Vir Campestris: >>> >>>> for 4e9 primes my machine take nearly 10 seconds for 4294967295 >>>> primes. The one I posted before over there takes 1.5 seconds. >>> >>> Show me your code. I'm pretty sure you never posted faster code. >>> >>> >> Try this. >> > > Still about half the speed of my code for 2 ^ 32. > Fascinating. But after my latest experience perhaps not surprising. You're on Windows aren't you? I do know you have a far more powerful machine than I do. I'm running Ubuntu with Gnu compiler. (I tried Clang, but it's slower) When I try your code on my system it takes 10.4s eleapsed, and 12.1s of CPU to calculate 4294967296. That code I supplied up there takes only 1.47s elapsed, and a little less CPU - it's single threaded. So on _my_ system my code is about 7 times faster! But my recent experience - I've been playing with using a modulo30 store, not just odds. My code makes bitmasks for 2 sets of small primes (these can be small, there's a repeat at the product of all the primes) then combines the 2 to make the initial mask for the desired number of primes. It then masks out the bigger primes. With a mask of odds-only the mask words can be 64 bit, and combining them is way faster than Mod30 - where the mask is a byte. So I put together some code so that it would put the two small masks together with 64 bit operations. That bit of the code got twice as fast. Then I realised that the next part, for reasons I have been unable to establish, runs three times more slowly. But only when I'm using GCC (not Clang) and only when -Ofast optimisation is turned on. That loop is the same speed in all 0other cases. I then tried putting some debug timing statements in, and I could get the same effect by inserting some in in various places. I looked at the output assembly, and there seem to be random changes (instructions that are not dependent on each other being re-ordered) so it's not obvious why this is happening. I think I'm bored enough with this project to just leave it. I've been more interested in correctness, not speed, in recent years. Andy