Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Thomas Koenig Newsgroups: comp.arch Subject: Re: Continuations Date: Thu, 18 Jul 2024 07:54:23 -0000 (UTC) Organization: A noiseless patient Spider Lines: 49 Message-ID: References: <47689j5gbdg2runh3t7oq2thodmfkalno6@4ax.com> <116d9j5651mtjmq4bkjaheuf0pgpu6p0m8@4ax.com> <7u7e9j5dthm94vb2vdsugngjf1cafhu2i4@4ax.com> <0f7b4deb1761f4c485d1dc3b21eb7cb3@www.novabbs.org> <4bbc6af7baab612635eef0de4847ba5b@www.novabbs.org> Injection-Date: Thu, 18 Jul 2024 09:54:23 +0200 (CEST) Injection-Info: dont-email.me; posting-host="dcaf4e807253839291fe21870f3c64fa"; logging-data="2448397"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18Urell+vGAYO8rETdyRhhHDDHzjxqyNJw=" User-Agent: slrn/1.0.3 (Linux) Cancel-Lock: sha1:0pW9OtaMmshxhaX3H9/LEdEflfg= Bytes: 2788 Stephen Fuld schrieb: [Arrhenius] > Good, I get that. But Thomas' original discussion of the problem > indicated that it was very parallel, so the question is, in your > design, how many of those calculations can go in in parallel? I ran a little Arrhenius benchmark on an i7-11700. Main program was program main implicit none integer, parameter :: n = 1024 double precision, dimension(n) :: k, a, ea, t integer :: i call random_number (a) call random_number(ea) ea = 10000+ea*30000 call random_number(t) t = 400 + 200*t do i=1,1024*1024 call arrhenius(k,a,ea,t,n) end do end program main and the called routine was (in a separate file, so the compiler could not notice that the results were actually never used) subroutine arrhenius(k, a, ea, t, n) implicit none integer, intent(in) :: n double precision, dimension(n), intent(out) :: k double precision, dimension(n), intent(in) :: a, ea, t double precision, parameter :: r = 8.314 k = a * exp(-ea/(r*t)) end subroutine arrhenius Timing result (wall-clock time only): -O0: 5.343s -O2: 4.560s -Ofast: 2.237s -Ofast -march=native -mtune=native: 2.154 Of course, you kever know what speed your CPU is actually running at these days, but if I assume 5GHz, that would give around 10 cycles per Arrhenius evaluation, which is quite fast (IMHO). It uses an AVX2 version of exp, or so I gather from the function name, _ZGVdN4v_exp_avx2 .