Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: David Brown <david.brown@hesbynett.no>
Newsgroups: comp.lang.c
Subject: Re: Top 10 most common hard skills listed on resumes...
Date: Fri, 13 Sep 2024 16:24:12 +0200
Organization: A noiseless patient Spider
Lines: 92
Message-ID: <vc1huc$ti1v$1@dont-email.me>
References: <vab101$3er$1@reader1.panix.com> <vbl3am$228vv$1@dont-email.me>
 <vblfgb$2dkij$1@paganini.bofh.team> <vblhp7$249ug$1@dont-email.me>
 <vbloje$2e34o$1@paganini.bofh.team> <vbmeae$2bn2v$2@dont-email.me>
 <vbn8pe$2g9i6$2@paganini.bofh.team> <vbnaqt$2g0vc$1@dont-email.me>
 <vbnre4$2h8k3$1@paganini.bofh.team> <vbor2f$2qqt1$1@dont-email.me>
 <vbpj6o$2orhf$1@paganini.bofh.team> <vbpl36$30c4v$2@dont-email.me>
 <vbtatt$33hat$1@paganini.bofh.team> <vbuk84$7chj$1@dont-email.me>
 <vbvmde$3c7ti$2@paganini.bofh.team>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Fri, 13 Sep 2024 16:24:13 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="62baffe00b005c9dd99479aa197dcc2f";
	logging-data="968767"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1/h3FtyqiPueaXLDPsyXEboDKQn9guusyg="
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.11.0
Cancel-Lock: sha1:F2/KFhzdShWb8owoKJTWJEzvljs=
Content-Language: en-GB
In-Reply-To: <vbvmde$3c7ti$2@paganini.bofh.team>
Bytes: 6095

On 12/09/2024 23:28, Waldek Hebisch wrote:
> David Brown <david.brown@hesbynett.no> wrote:
>> On 12/09/2024 01:59, Waldek Hebisch wrote:
>>> David Brown <david.brown@hesbynett.no> wrote:
>>>>
>>>> On many cpus, using sizes smaller than the full register size means
>>>> doing sign extensions or masking operations at various times - thus full
>>>> size register operations can often be more efficient.  On such systems
>>>> you will find that int_fast16_t is 32-bit or 64-bit, according to the
>>>> register width.  On other cpus, some common ALU operations on full-size
>>>> operands can be slower than for smaller operands (such as on the 68000).
>>>>    There, int_fast16_t will be 16-bit.
>>>>
>>>> Compiler authors know what will usually be faster on the target.  There
>>>> will always be some exceptions (division is usually faster on smaller
>>>> operands, for example).  But if you don't know the target - as is the
>>>> case of portable code - the compiler will usually make a better choice
>>>> here than you would.
>>>
>>> BTW, I just played with Clang 18 on 64-bit FreeBSD.  It has 32-bit
>>> int_fast16_t.  gcc in Linux makes it 64-bit.  Who is right?
>>>
>>
>> Technically, both are right - implementations can use any integer type
>> of at least 16 bits here, whatever they think is fastest in general.
>> But it surprises me somewhat, given that clang for x86-64 on Linux uses
>> 64-bit for int_fast16_t.
> 
> Well, both satisfy "hard" requirements.  But the question was which
> type is faster.

Yes.  But unless I am using a target processor that I know well, have a 
good idea of the types of instructions and know about the timings for 
those instructions with different data types, then I am inclined to 
believe the compiler implementer here and use the int_fastNN_t types. 
(For most of my C programming, I /do/ know the target well, and can make 
more refined type choices if it is relevant.  But I don't know the 
timing details for the countless x86-64 variants.)

> 
>> But to be clear, the size of the "fast" types depends on the target and
>> the implementation.  They are not normally used for external ABI's, and
>> are purely internal to the generated code.  Obviously you must pick a
>> "fast" size that is at least as big as the range you need.
> 
> I think that Linux (and probably FreeBSD too) considers size of
> fast type as part of ABI (regardless of gudelines those types
> certainly leaked into "public" interfaces).  Such ABI change is
> probably viewed as not worth doing.
> 

I am not sure if the fast types are in the ABI.  It certainly seems not, 
if you say clang on x86-64 BSD has 32-bit int_fast32_t, while it is 
64-bit on Linux gcc and clang.  BSD and Linux use the same ABI, AFAIK. 
(Everybody except MS use the same x86-64 ABI.)


> And concering choice on x86_64, AFAIK for operations on numbers of
> the same size 32-bit gives fastest operations.  16-bit had two
> disadvantages, big one due to partial register stalls, small one
> due to larger size (operand size prefix).  64-bit requires bigger
> code (more need to use prefixes) and bigger data.

I don't know if that is all correct or not.  Some operations are 
definitely slower on bigger operands, such as division.  Different 
x86-64 processors may see different costs for things like prefix sizes. 
And if you can get SIMD instructions into the picture, then smaller 
sizes let you do more in the same instruction.

>  When mixing
> types, 32-bit numbers are automatically zero extended, so there
> is no extra cost when mixing unsigend numbers.

When I look at generated code, unsigned types smaller than 64 bits can 
require masking at times, and they do sometimes require zero extend 
instructions (typically expressed as a "move 32-bit register to 64-bit 
register" instruction).

>  So what remains
> is mixing signed 32-bit integers with 64-bit ones.  Addresses
> use 64-bit artitmetic, so that requires sign extention.  OTOH
> in arithmetic "fast" types are likely to be mixed with exact
> 32-bit types and then making "fast" types 32-bit is faster
> overall.  So, there are bets/assumptions which usage is more
> frequent.  OTOH, choice between 32-bit and 64-bit fast types
> is unlikely to make _much_ difference.
> 

It could be interesting to compare speeds for different kinds of code on 
different x86-86 targets.  But the details here are beyond me, and also 
well outside of the targets that I am most interested in.