Path: ...!news.mixmin.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Michael S Newsgroups: comp.arch Subject: Re: What integer C type to use Date: Tue, 12 Mar 2024 14:44:28 +0200 Organization: A noiseless patient Spider Lines: 79 Message-ID: <20240312144428.000063f5@yahoo.com> References: <1067c5b46cebaa18a0fc50fc423aa86a@www.novabbs.com> <2024Feb17.190353@mips.complang.tuwien.ac.at> <2024Feb20.083240@mips.complang.tuwien.ac.at> <2024Feb20.130029@mips.complang.tuwien.ac.at> <2024Feb20.184737@mips.complang.tuwien.ac.at> <3a1c5c42222d44ea006bc20d55e0c94c@www.novabbs.org> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Injection-Info: dont-email.me; posting-host="b509e7d58e1b778f71c122f58f1058bc"; logging-data="214395"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+8cajGWzb8XiOfWDQGDASDzuzHUP0cjv8=" Cancel-Lock: sha1:ufPxb3hRQp7k42bEi/5RWCpBVL4= X-Newsreader: Claws Mail 3.19.1 (GTK+ 2.24.33; x86_64-w64-mingw32) Bytes: 5149 On Tue, 12 Mar 2024 11:14:47 +0100 David Brown wrote: > On 11/03/2024 20:56, MitchAlsup1 wrote: > > David Brown wrote: > > =20 > >> On 23/02/2024 20:55, MitchAlsup1 wrote: =20 > >>> Thomas Koenig wrote: > >>> =20 > >>>> Anton Ertl schrieb: =20 > >>>>> I know no implementation of a 64-bit architecture where ALU > >>>>> operations (except maybe division where present) is slower in > >>>>> 64 bits than in 32 bits.=A0 I would have chosen ILP64 at the > >>>>> time, so I can only guess at their reasons: =20 > >>> =20 > >>>> A guess: people did not want sizeof(float) !=3D sizeof(float). > >>>> float is cerainly faster than double. =20 > >>> > >>> Now, only in cache footprint. All your std operations take the > >>> same amount > >>> of cycles DP vs. SP. Excepting for cache footprint latency:: > >>> perf(DP) =3D=3D perf(SP) > >>> =20 > > =20 > >> That's true - except when it is not. =20 > > =20 > >> It is not true when you are using vector instructions, and you can > >> do twice as many SP instructions as DP instructions in the same > >> register and instruction. =20 > >=20 > > You use the word vector where you mean SIMD. =20 >=20 > Yes, I was using the word somewhat interchangeably, as I was talking > in general terms. Perhaps I should have been more precise. I know > this thread talked about "Cray style vectors", but I thought this > branch had diverged - I don't know anywhere near enough about the > details of Cray machines to talk much about them. > Even for Cray/NEC-style vectors, the same throughput for different precision is not an universal property. Cray's and NEC's vector processors happen to be designed like that, but one can easily imagine vector processors of similar style that have 2 or even 3 times higher throughput for SP vs DP. I personally never encountered such machines, but would be surprised if it were never built and sold back by one or another usual suspect (may be, Fujitsu?) in days when designers liked Cray's style. Which, of course, leaves the question of what property makes vector processor Cray-style. Just having ALU/FPU several times narrower than VR is, IMHO, not enough to be considered Cray-style. In my book, the critical distinction is that at least one size of partial (chopped) none-load-store vector operations has higher throughput (and hopefully, but not necessarily lower latency) than full vector operations of the same type. > > A CRAY-YMP doing single > > would not be twice as fast because it was designed for 1 FADD + 1 > > FMUL + 2 LD + 1 ST per cycle continuously. CRAY-YMP is the epitome > > of a Vector machine. > >=20 > > The alternative word would be short-vector instead of SIMD. > > =20 > >> It is not true when you are using accelerators of various kinds, > >> such as graphics card processors. =20 > > =20 > >> And it is not true on smaller processors, such as in the embedded=20 > >> world. =A0 On microcontrollers with floating point hardware for > >> single and double precision, SP can be up to twice as fast as DP. > >> And for many of the more popular microcontrollers, you can have > >> hardware SP but DP is done in software - the difference there is > >> clearly massive. =20 > > =20 > >> But for big processors doing non-vector adds and multiplies, DP > >> and SP are usually equal in clock cycles (other than memory and > >> cache effects). =20 >=20