Article <20241219234549.00001902@yahoo.com>

Deutsch English Français Italiano
<20241219234549.00001902@yahoo.com>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Michael S <already5chosen@yahoo.com>
Newsgroups: comp.lang.c++
Subject: Re: constexpr is really very smart!
Date: Thu, 19 Dec 2024 23:45:49 +0200
Organization: A noiseless patient Spider
Lines: 228
Message-ID: <20241219234549.00001902@yahoo.com>
References: <vjndub$2glcu$1@paganini.bofh.team>
	<20241216112808.00003f74@yahoo.com>
	<86jzbyghdw.fsf@linuxsc.com>
	<20241218013342.0000518a@yahoo.com>
	<867c7whol9.fsf@linuxsc.com>
	<20241218220006.00003f8e@yahoo.com>
	<86v7vgf3tb.fsf@linuxsc.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 19 Dec 2024 22:45:53 +0100 (CET)
Injection-Info: dont-email.me; posting-host="a37a7afdc6a200c65b6cb4dd23255f26";
	logging-data="3163558"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1/mIJ4D+K52mox0lKlIPHEghqvFZlfZhBo="
Cancel-Lock: sha1:XZkGuNERLE0lYSCK/6YPgFvRFZQ=
X-Newsreader: Claws Mail 4.1.1 (GTK 3.24.34; x86_64-w64-mingw32)
Bytes: 7563

On Thu, 19 Dec 2024 00:57:20 -0800
Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:

> Michael S <already5chosen@yahoo.com> writes:
> 
> > On Wed, 18 Dec 2024 09:45:38 -0800
> > Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
> >  
> >> Michael S <already5chosen@yahoo.com> writes:
> >>  
> >>> On Tue, 17 Dec 2024 12:54:19 -0800
> >>> Tim Rentsch <tr.17687@z991.linuxsc.com> wrote:
> >>>  
> >>>> Michael S <already5chosen@yahoo.com> writes:  
> >>
> >> [...]
> >>  
> >>>>> #include <stdio.h>
> >>>>> #include <stdlib.h>
> >>>>> #include <intrin.h>
> >>>>>
> >>>>> static long long fib(long n)
> >>>>> {
> >>>>>   if (fib <= 0)
> >>>>>     return 0;
> >>>>>   long long f0 = 0, f1 = 1;
> >>>>>   for (long i = 1; i < n; ++i) {
> >>>>>     long long f2 = f0 + f1;
> >>>>>     f0 = f1;
> >>>>>     f1 = f2;
> >>>>>   }
> >>>>>   return f1;
> >>>>> }  
> >>>>
> >>>> Here is my second fastest fibonacci calculation code (for
> >>>> relatively small inputs):
> >>>>
> >>>>     typedef long unsigned long ULL;
> >>>>
> >>>>     ULL
> >>>>     fibonacci( unsigned n ){
> >>>>         ULL  b = n&1,  a = b^1;
> >>>>
> >>>>         if(  n & 2  )  a += b,  b += a;
> >>>>         if(  n & 4  )  a += b,  b += a,  a += b,  b += a;
> >>>>         if(  n & 8  ){
> >>>>             ULL na = 13*a+21*b, nb = 21*a+34*b;
> >>>>             a = na,  b = nb;
> >>>>         }
> >>>>
> >>>>         n >>= 4;
> >>>>         while(  n--  ){
> >>>>             ULL  na = 610*a + 987*b,  nb = 987*a + 1597*b;
> >>>>             a = na,  b = nb;
> >>>>         }
> >>>>
> >>>>         return  b;
> >>>>     }  
> >>>
> >>> From BigO perspective this code looks like it's still O(n) so no
> >>> better than simple loop.  
> >>
> >> It is O(n) but it has a smaller constant.
> >>  
> >>> According to my measurement gear, in range 0 to 92 there are few
> >>> points where it is faster than simple loop, but in majority of
> >>> cases it is slower.  
> >>
> >> I'm at a loss to understand how this could happen.  In my own
> >> measurements, the code shown above runs faster than a simple loop
> >> in all cases above n > 12, more than twice as fast when n > 17,
> >> more than three times as fast when n > 42, and going up from
> >> there.  What might account for these radically different results?  
> >
> > May be, number of repetitions?
> > I run the test only once.  That gives relative advantage to smaller
> > code which is less sensitive to cold ICache and to cold branch
> > predictors.  
> 
> That's an interesting idea.  Can you also run a measurement where
> the code is run inside loops?  I think it would be instructive
> to compare the results under the two approaches.
> 
> (The server I use seems not to have the necessary support to
> measure very fast code segments that are run only once.)
> 
> > Also, because I am running it only once, my test is not so good in
> > seeing differences between various "good" algorithms.
> > But I feel that running test once is more realistic.  
> 
> I think an argument could be made that repeatedly running the code
> in a loop gives a more relevant result.  Any short code segment that
> is run only a handful of times will never have a significant impact
> on overall program performance.  It's better to optimize under the
> assumption that the code will be "hot":  if it is hot then the
> optimization will be worthwhile, and if it isn't hot then the loss
> will be way down in the noise and not worth worrying about.


I feel that running fib(n) with the same n in loop too unrealistic.
So I decided to run, at least, with different values of n.
Ended up spending about a hour just to build a test bench.

The answer is - in a loop of more than dozen iterations your code is
indeed faster. Esp. so for hundred or more iterations.

Here is my test bench.

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <intrin.h>

long long fib(long n);
volatile long long dummy;

int main(int argz, char** argv)
{
  if (argz < 2) {
    printf("Go away!\n");
    return 1;
  }

  char* endp;
  long n = strtol(argv[1], &endp, 0);
  if (endp == argv[1]) {
    printf("%s is not a number. Go away!\n", argv[1]);
    return 1;
  }

  if (argz == 2) {
    unsigned long long t0 = _rdtsc();
    long long f = fib(n);
    unsigned long long t1 = _rdtsc();
    printf("fib(%ld)=%lld. %lld clock cycles.\n", n, f, t1 - t0);
    return 0;
  }

  // argz > 2
  if (argv[2][0]=='t') {
    long long f0 = 0, f1 = 1;
    for (long i = 0; i <= n; ++i) {
      long long res = fib(i);
      if (res != f0) {
        printf("Mistake at n = %ld. %lld instead of %lld\n"
         , i, res, f0);
        return 1;
      }
      long long f2 = f0 + f1;
      f0 = f1;
      f1 = f2;
    }
    printf("o.k.\n");
    return 0;
  }

  long m = strtol(argv[2], &endp, 0);
  if (endp == argv[2]) {
    printf("%s is not a number. Go away!\n", argv[2]);
    return 1;
  }

  if (n > m) {
    long tmp = n; n = m; m = tmp;
  }

  unsigned long dn = m - n + 1;
  if (dn > 1000000) {
    printf(
      "Range [%ld..%ld] is wider than 1M. Unsupported. I am sorry.\n"
      , n, m);
    return 1;
  }

  size_t n_iter = 10;
========== REMAINDER OF ARTICLE TRUNCATED ==========