Path: ...!news.mixmin.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Michael S <already5chosen@yahoo.com>
Newsgroups: comp.arch
Subject: Re: What integer C type to use
Date: Tue, 12 Mar 2024 14:44:28 +0200
Organization: A noiseless patient Spider
Lines: 79
Message-ID: <20240312144428.000063f5@yahoo.com>
References: <upq0cr$6b5m$1@dont-email.me>
	<uqobhv$3o4m9$2@dont-email.me>
	<1067c5b46cebaa18a0fc50fc423aa86a@www.novabbs.com>
	<uqpngc$3o4m9$3@dont-email.me>
	<uqpuid$bhg0$1@dont-email.me>
	<2024Feb17.190353@mips.complang.tuwien.ac.at>
	<uqqvkc$i2cu$1@dont-email.me>
	<uqvk2o$1snbf$1@dont-email.me>
	<ur0ka6$23ma8$1@dont-email.me>
	<dd9c82c9be34460dc6fea35c8608e51d@www.novabbs.org>
	<2024Feb20.083240@mips.complang.tuwien.ac.at>
	<2024Feb20.130029@mips.complang.tuwien.ac.at>
	<ur2jpf$2j800$1@dont-email.me>
	<2024Feb20.184737@mips.complang.tuwien.ac.at>
	<uraof0$kij0$1@newsreader4.netcologne.de>
	<3a1c5c42222d44ea006bc20d55e0c94c@www.novabbs.org>
	<urfcgs$1rne2$1@dont-email.me>
	<dbaa17babbd2ca8362fad9f9ecd4b79c@www.novabbs.org>
	<usp9un$7pij$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Injection-Info: dont-email.me; posting-host="b509e7d58e1b778f71c122f58f1058bc";
	logging-data="214395"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1+8cajGWzb8XiOfWDQGDASDzuzHUP0cjv8="
Cancel-Lock: sha1:ufPxb3hRQp7k42bEi/5RWCpBVL4=
X-Newsreader: Claws Mail 3.19.1 (GTK+ 2.24.33; x86_64-w64-mingw32)
Bytes: 5149

On Tue, 12 Mar 2024 11:14:47 +0100
David Brown <david.brown@hesbynett.no> wrote:

> On 11/03/2024 20:56, MitchAlsup1 wrote:
> > David Brown wrote:
> >  =20
> >> On 23/02/2024 20:55, MitchAlsup1 wrote: =20
> >>> Thomas Koenig wrote:
> >>> =20
> >>>> Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb: =20
> >>>>> I know no implementation of a 64-bit architecture where ALU
> >>>>> operations (except maybe division where present) is slower in
> >>>>> 64 bits than in 32 bits.=A0 I would have chosen ILP64 at the
> >>>>> time, so I can only guess at their reasons: =20
> >>> =20
> >>>> A guess: people did not want sizeof(float) !=3D sizeof(float).
> >>>> float is cerainly faster than double. =20
> >>>
> >>> Now, only in cache footprint. All your std operations take the
> >>> same amount
> >>> of cycles DP vs. SP. Excepting for cache footprint latency::
> >>> perf(DP) =3D=3D perf(SP)
> >>> =20
> >  =20
> >> That's true - except when it is not. =20
> >  =20
> >> It is not true when you are using vector instructions, and you can
> >> do twice as many SP instructions as DP instructions in the same
> >> register and instruction. =20
> >=20
> > You use the word vector where you mean SIMD. =20
>=20
> Yes, I was using the word somewhat interchangeably, as I was talking
> in general terms.  Perhaps I should have been more precise.  I know
> this thread talked about "Cray style vectors", but I thought this
> branch had diverged - I don't know anywhere near enough about the
> details of Cray machines to talk much about them.
>

Even for Cray/NEC-style vectors, the same throughput for different
precision is not an universal property. Cray's and NEC's vector
processors happen to be designed like that, but one can easily imagine
vector processors of similar style that have 2 or even 3 times higher
throughput for SP vs DP.
I personally never encountered such machines, but would be surprised if
it were never built and sold back by one or another usual suspect (may
be, Fujitsu?) in days when designers liked Cray's style.

Which, of course, leaves the question of what property makes vector
processor Cray-style. Just having ALU/FPU several times narrower than
VR is, IMHO, not enough to be considered Cray-style.
In my book, the critical distinction is that at least one size of
partial (chopped) none-load-store vector operations has higher
throughput (and hopefully, but not necessarily lower latency) than full
vector operations of the same type.

> > A CRAY-YMP doing single
> > would not be twice as fast because it was designed for 1 FADD + 1
> > FMUL + 2 LD + 1 ST per cycle continuously. CRAY-YMP is the epitome
> > of a Vector machine.
> >=20
> > The alternative word would be short-vector instead of SIMD.
> >  =20
> >> It is not true when you are using accelerators of various kinds,
> >> such as graphics card processors. =20
> >  =20
> >> And it is not true on smaller processors, such as in the embedded=20
> >> world. =A0 On microcontrollers with floating point hardware for
> >> single and double precision, SP can be up to twice as fast as DP.
> >> And for many of the more popular microcontrollers, you can have
> >> hardware SP but DP is done in software - the difference there is
> >> clearly massive. =20
> >  =20
> >> But for big processors doing non-vector adds and multiplies, DP
> >> and SP are usually equal in clock cycles (other than memory and
> >> cache effects). =20
>=20