Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Michael S <already5chosen@yahoo.com>
Newsgroups: comp.arch
Subject: Re: Continuations
Date: Thu, 18 Jul 2024 20:28:43 +0300
Organization: A noiseless patient Spider
Lines: 123
Message-ID: <20240718202843.00001dd0@yahoo.com>
References: <v6tbki$3g9rg$1@dont-email.me>
	<47689j5gbdg2runh3t7oq2thodmfkalno6@4ax.com>
	<v71vqu$gomv$9@dont-email.me>
	<116d9j5651mtjmq4bkjaheuf0pgpu6p0m8@4ax.com>
	<f8c6c5b5863ecfc1ad45bb415f0d2b49@www.novabbs.org>
	<7u7e9j5dthm94vb2vdsugngjf1cafhu2i4@4ax.com>
	<0f7b4deb1761f4c485d1dc3b21eb7cb3@www.novabbs.org>
	<v78soj$1tn73$1@dont-email.me>
	<4bbc6af7baab612635eef0de4847ba5b@www.novabbs.org>
	<v792kn$1v70t$1@dont-email.me>
	<ef12aa647464a3ebe3bd208c13a3c40c@www.novabbs.org>
	<v79b56$20oq8$1@dont-email.me>
	<v7ahnf$2an0d$1@dont-email.me>
	<20240718193803.00004176@yahoo.com>
	<v7bi3c$2gk81$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 18 Jul 2024 19:28:16 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="43779574c2539bdff38d08e75c37b6d0";
	logging-data="2622127"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1/AdXZKweWZgTmcoVksC31+mZEhatmZuEA="
Cancel-Lock: sha1:Gd2cRSXRgZ0ueFzMfTKdL+JiWj0=
X-Newsreader: Claws Mail 3.19.1 (GTK+ 2.24.33; x86_64-w64-mingw32)
Bytes: 6251

On Thu, 18 Jul 2024 17:06:52 -0000 (UTC)
Thomas Koenig <tkoenig@netcologne.de> wrote:

> Michael S <already5chosen@yahoo.com> schrieb:
> > On Thu, 18 Jul 2024 07:54:23 -0000 (UTC)
> > Thomas Koenig <tkoenig@netcologne.de> wrote:
> >  
> >> Stephen Fuld <SFuld@alumni.cmu.edu.invalid> schrieb:
> >> 
> >> [Arrhenius]
> >>   
> >> > Good, I get that.  But Thomas' original discussion of the problem
> >> > indicated that it was very parallel, so the question is, in your
> >> > design, how many of those calculations can go in in parallel?    
> >> 
> >> I ran a little Arrhenius benchmark on an i7-11700.  Main program
> >> was
> >> 
> >> program main
> >>   implicit none
> >>   integer, parameter :: n = 1024
> >>   double precision, dimension(n) :: k, a, ea, t
> >>   integer :: i
> >>   call random_number (a)
> >>   call random_number(ea)
> >>   ea = 10000+ea*30000
> >>   call random_number(t)
> >>   t = 400 + 200*t
> >>   do i=1,1024*1024
> >>      call arrhenius(k,a,ea,t,n)
> >>   end do
> >> end program main
> >> 
> >> and the called routine was (in a separate file, so the compiler
> >> could not notice that the results were actually never used)
> >> 
> >> subroutine arrhenius(k, a, ea, t, n)
> >>   implicit none
> >>   integer, intent(in) :: n
> >>   double precision, dimension(n), intent(out) :: k
> >>   double precision, dimension(n), intent(in) :: a, ea, t
> >>   double precision, parameter :: r = 8.314
> >>   k = a * exp(-ea/(r*t))
> >> end subroutine arrhenius
> >> 
> >> Timing result (wall-clock time only):
> >> 
> >> -O0: 5.343s
> >> -O2: 4.560s
> >> -Ofast: 2.237s
> >> -Ofast -march=native -mtune=native: 2.154
> >> 
> >> Of course, you kever know what speed your CPU is actually running
> >> at these days, but if I assume 5GHz, that would give around 10
> >> cycles per Arrhenius evaluation, which is quite fast (IMHO).
> >> It uses an AVX2 version of exp, or so I gather from the function
> >> name, _ZGVdN4v_exp_avx2 .  
> >
> > Does the benchmark represent a real-world use?
> > In particular, 
> > 1. Is there really a large number of different EA values or only
> > around dozen or hundred?  
> 
> Usually, one for each chemical reaction.  If you have a complex
> reaction network, it can be a few hundred.
> 
> It is possible to pre-calculate Ea/R, but the division by T still
> needs to be done.
> 

The idea I was thinking about was to pre-calculate R/Ea rather than
Ea/R. And then to find fast algorithm for approximation of exp(-1/x) on
the range of interest.
This sort of tricks no longer make sense on post-Skylake Intel
or on post-Zen2 AMD, or on few later generations of Apple. But just 5-6
years ago it made a good sense on pretty much anything.

> > 2. Does temperature vary all the time or there are relatively big
> > groups of points calculated at the same temperature?  
> 
> It varies all the time because of the exothermic/endothermic
> character of the reactions.  This is not what was calculated, but you
> can think of Methane combusion with oxygen.  There are numerous
> radical, high-energy species appearing and disappearing all the time.
> And if you're unlucky, you will also have to calculate radiation :-(
> 
> But for each fluid element, the energy equation is solved again
> each time step.
> 
> And if you try to average the temperatueres over groups of cells...
> don't.  You average enough already by selecting the grid, and also by
> applying multigrid solvers.
> 
> > A similar question can be asked about A, but it is of little
> > practical importance.
> > 3. Is double precision really needed? According to my understanding,
> > empirical equations like this one have precision of something like 2
> > significant digits, or 3 digits at best. So I'd expect that single
> > precision is sufficient with digits to spare.  
> 
> To calculate values, yes, but if you try to differentiate things,
> things can go wrong pretty fast.  I'm actually not sure what he used
> in this particular case.
> 
> There is, however, a tendency with engineers to use double precision
> because quite often, you do not go wrong with that, and you can
> go wrong with single precision.
> 
> A lot of numerical work in general, and CFD work in particular,
> consists of fighting for convergence.  You usually don't want to
> change anything that would endanger that.
> 
> > 4. Dies the equation work at all when the temperature is not close
> > to the point of equilibrium ? If not, what is a sane range for
> > ea/(r*t) ?  
> 
> It does work far away from the equilibrium (which is the point).

But how far?
Does it work for EA/RT outside of, say [-10:+12] ?