Path: ...!eternal-september.org!feeder2.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Vir Campestris <vir.campestris@invalid.invalid>
Newsgroups: alt.comp.lang.c,comp.lang.c
Subject: Re: OT: Re: A very slow program
Date: Sat, 26 Oct 2024 15:28:24 +0100
Organization: A noiseless patient Spider
Lines: 59
Message-ID: <vfiua8$3obkb$2@dont-email.me>
References: <vc7nb0$7rl3$1@paganini.bofh.team>
 <vdjf67$3840f$1@raubtier-asyl.eternal-september.org>
 <vdmfdl$3pknu$1@dont-email.me>
 <vdnsav$3ucm$1@raubtier-asyl.eternal-september.org>
 <vf2n7i$cko5$1@dont-email.me>
 <vf5sdt$10dmr$1@raubtier-asyl.eternal-september.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sat, 26 Oct 2024 16:28:25 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="1a34abd57199070a12913a6a127db11a";
	logging-data="3944075"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1/9JeyuimavyTkjJ5iDnQFFT8ae7UeRL/w="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:C1l21PUL1phX7XvMtkYvuR5PavM=
Content-Language: en-GB
In-Reply-To: <vf5sdt$10dmr$1@raubtier-asyl.eternal-september.org>
Bytes: 3607

On 21/10/2024 16:36, Bonita Montero wrote:
> Am 20.10.2024 um 12:49 schrieb Vir Campestris:
>> On 04/10/2024 05:53, Bonita Montero wrote:
>>> Am 03.10.2024 um 18:06 schrieb Vir Campestris:
>>>
>>>> for 4e9 primes my machine take nearly 10 seconds for 4294967295
>>>> primes. The one I posted before over there takes 1.5 seconds.
>>>
>>> Show me your code. I'm pretty sure you never posted faster code.
>>>
>>>
>> Try this.
>>
> 
> Still about half the speed of my code for 2 ^ 32.
> 

Fascinating.

But after my latest experience perhaps not surprising.

You're on Windows aren't you? I do know you have a far more powerful 
machine than I do. I'm running Ubuntu with Gnu compiler. (I tried Clang, 
but it's slower)

When I try your code on my system it takes 10.4s eleapsed, and 12.1s of 
CPU to calculate 4294967296. That code I supplied up there takes only 
1.47s elapsed, and a little less CPU - it's single threaded.

So on _my_ system my code is about 7 times faster!

But my recent experience - I've been playing with using a modulo30 
store, not just odds.

My code makes bitmasks for 2 sets of small primes (these can be small, 
there's a repeat at the product of all the primes) then combines the 2 
to make the initial mask for the desired number of primes. It then masks 
out the bigger primes.

With a mask of odds-only the mask words can be 64 bit, and combining 
them is way faster than Mod30 - where the mask is a byte. So I put 
together some code so that it would put the two small masks together 
with 64 bit operations. That bit of the code got twice as fast. Then I 
realised that the next part, for reasons I have been unable to 
establish, runs three times more slowly.

But only when I'm using GCC (not Clang) and only when -Ofast 
optimisation is turned on. That loop is the same speed in all 0other cases.

I then tried putting some debug timing statements in, and I could get 
the same effect by inserting some in in various places.

I looked at the output assembly, and there seem to be random changes 
(instructions that are not dependent on each other being re-ordered) so 
it's not obvious why this is happening.
  I think I'm bored enough with this project to just leave it. I've been 
more interested in correctness, not speed, in recent years.

Andy