Path: ...!news.misty.com!weretis.net!feeder6.news.weretis.net!i2pn.org!i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup1)
Newsgroups: comp.arch
Subject: Re: What integer C type to use
Date: Mon, 11 Mar 2024 19:56:32 +0000
Organization: Rocksolid Light
Message-ID: <dbaa17babbd2ca8362fad9f9ecd4b79c@www.novabbs.org>
References: <upq0cr$6b5m$1@dont-email.me> <uqobhv$3o4m9$2@dont-email.me> <1067c5b46cebaa18a0fc50fc423aa86a@www.novabbs.com> <uqpngc$3o4m9$3@dont-email.me> <uqpuid$bhg0$1@dont-email.me> <2024Feb17.190353@mips.complang.tuwien.ac.at> <uqqvkc$i2cu$1@dont-email.me> <uqvk2o$1snbf$1@dont-email.me> <ur0ka6$23ma8$1@dont-email.me> <dd9c82c9be34460dc6fea35c8608e51d@www.novabbs.org> <2024Feb20.083240@mips.complang.tuwien.ac.at> <2024Feb20.130029@mips.complang.tuwien.ac.at> <ur2jpf$2j800$1@dont-email.me> <2024Feb20.184737@mips.complang.tuwien.ac.at> <uraof0$kij0$1@newsreader4.netcologne.de> <3a1c5c42222d44ea006bc20d55e0c94c@www.novabbs.org> <urfcgs$1rne2$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
	logging-data="1653689"; mail-complaints-to="usenet@i2pn2.org";
	posting-account="PGd4t4cXnWwgUWG9VtTiCsm47oOWbHLcTr4rYoM0Edo";
User-Agent: Rocksolid Light
X-Spam-Checker-Version: SpamAssassin 4.0.0
X-Rslight-Site: $2y$10$NjHELtupAaZStoengoPo6eD5QUe8ZqVoRsEf93jlcU8qJlise/OES
X-Rslight-Posting-User: ac58ceb75ea22753186dae54d967fed894c3dce8
Bytes: 3364
Lines: 42

David Brown wrote:

> On 23/02/2024 20:55, MitchAlsup1 wrote:
>> Thomas Koenig wrote:
>> 
>>> Anton Ertl <anton@mips.complang.tuwien.ac.at> schrieb:
>>>> I know no implementation of a 64-bit architecture where ALU operations
>>>> (except maybe division where present) is slower in 64 bits than in 32
>>>> bits.  I would have chosen ILP64 at the time, so I can only guess at
>>>> their reasons:
>> 
>>> A guess: people did not want sizeof(float) != sizeof(float). float
>>> is cerainly faster than double.
>> 
>> Now, only in cache footprint. All your std operations take the same amount
>> of cycles DP vs. SP. Excepting for cache footprint latency:: perf(DP) == 
>> perf(SP)
>> 

> That's true - except when it is not.

> It is not true when you are using vector instructions, and you can do 
> twice as many SP instructions as DP instructions in the same register 
> and instruction.

You use the word vector where you mean SIMD. A CRAY-YMP doing single
would not be twice as fast because it was designed for 1 FADD + 1 FMUL 
+ 2 LD + 1 ST per cycle continuously. CRAY-YMP is the epitome of a
Vector machine.

The alternative word would be short-vector instead of SIMD.

> It is not true when you are using accelerators of various kinds, such as 
> graphics card processors.

> And it is not true on smaller processors, such as in the embedded world. 
>   On microcontrollers with floating point hardware for single and double 
> precision, SP can be up to twice as fast as DP.  And for many of the 
> more popular microcontrollers, you can have hardware SP but DP is done 
> in software - the difference there is clearly massive.

> But for big processors doing non-vector adds and multiplies, DP and SP 
> are usually equal in clock cycles (other than memory and cache effects).