Path: ...!news.misty.com!weretis.net!feeder6.news.weretis.net!i2pn.org!i2pn2.org!.POSTED!not-for-mail From: mitchalsup@aol.com (MitchAlsup1) Newsgroups: comp.arch Subject: Re: What integer C type to use Date: Mon, 11 Mar 2024 19:56:32 +0000 Organization: Rocksolid Light Message-ID: References: <1067c5b46cebaa18a0fc50fc423aa86a@www.novabbs.com> <2024Feb17.190353@mips.complang.tuwien.ac.at> <2024Feb20.083240@mips.complang.tuwien.ac.at> <2024Feb20.130029@mips.complang.tuwien.ac.at> <2024Feb20.184737@mips.complang.tuwien.ac.at> <3a1c5c42222d44ea006bc20d55e0c94c@www.novabbs.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Info: i2pn2.org; logging-data="1653689"; mail-complaints-to="usenet@i2pn2.org"; posting-account="PGd4t4cXnWwgUWG9VtTiCsm47oOWbHLcTr4rYoM0Edo"; User-Agent: Rocksolid Light X-Spam-Checker-Version: SpamAssassin 4.0.0 X-Rslight-Site: $2y$10$NjHELtupAaZStoengoPo6eD5QUe8ZqVoRsEf93jlcU8qJlise/OES X-Rslight-Posting-User: ac58ceb75ea22753186dae54d967fed894c3dce8 Bytes: 3364 Lines: 42 David Brown wrote: > On 23/02/2024 20:55, MitchAlsup1 wrote: >> Thomas Koenig wrote: >> >>> Anton Ertl schrieb: >>>> I know no implementation of a 64-bit architecture where ALU operations >>>> (except maybe division where present) is slower in 64 bits than in 32 >>>> bits.  I would have chosen ILP64 at the time, so I can only guess at >>>> their reasons: >> >>> A guess: people did not want sizeof(float) != sizeof(float). float >>> is cerainly faster than double. >> >> Now, only in cache footprint. All your std operations take the same amount >> of cycles DP vs. SP. Excepting for cache footprint latency:: perf(DP) == >> perf(SP) >> > That's true - except when it is not. > It is not true when you are using vector instructions, and you can do > twice as many SP instructions as DP instructions in the same register > and instruction. You use the word vector where you mean SIMD. A CRAY-YMP doing single would not be twice as fast because it was designed for 1 FADD + 1 FMUL + 2 LD + 1 ST per cycle continuously. CRAY-YMP is the epitome of a Vector machine. The alternative word would be short-vector instead of SIMD. > It is not true when you are using accelerators of various kinds, such as > graphics card processors. > And it is not true on smaller processors, such as in the embedded world. > On microcontrollers with floating point hardware for single and double > precision, SP can be up to twice as fast as DP. And for many of the > more popular microcontrollers, you can have hardware SP but DP is done > in software - the difference there is clearly massive. > But for big processors doing non-vector adds and multiplies, DP and SP > are usually equal in clock cycles (other than memory and cache effects).