Article <vbvljl$ea0m$1@dont-email.me>

Deutsch English Français Italiano
<vbvljl$ea0m$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!2.eu.feeder.erje.net!3.eu.feeder.erje.net!feeder.erje.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: BGB <cr88192@gmail.com>
Newsgroups: comp.arch
Subject: Re: Computer architects leaving Intel...
Date: Thu, 12 Sep 2024 16:14:18 -0500
Organization: A noiseless patient Spider
Lines: 429
Message-ID: <vbvljl$ea0m$1@dont-email.me>
References: <vaqgtl$3526$1@dont-email.me>
 <memo.20240830090549.19028u@jgd.cix.co.uk>
 <2024Aug30.161204@mips.complang.tuwien.ac.at> <86r09ulqyp.fsf@linuxsc.com>
 <2024Sep8.173639@mips.complang.tuwien.ac.at>
 <p1cvdjpqjg65e6e3rtt4ua6hgm79cdfm2n@4ax.com>
 <2024Sep10.101932@mips.complang.tuwien.ac.at> <ygn8qvztf16.fsf@y.z>
 <2024Sep11.123824@mips.complang.tuwien.ac.at> <vbsoro$3ol1a$1@dont-email.me>
 <vbut86$9toi$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 12 Sep 2024 23:14:30 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="1759098feabd3e0537ca2f71e7972685";
	logging-data="469014"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX19Rn8BuscVD1bwIia4DihbSl1NUiVlU37g="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:EPYkDATsJwH+Rs85f4doBG6/ojI=
Content-Language: en-US
In-Reply-To: <vbut86$9toi$1@dont-email.me>
Bytes: 17115

On 9/12/2024 9:18 AM, David Brown wrote:
> On 11/09/2024 20:51, BGB wrote:
>> On 9/11/2024 5:38 AM, Anton Ertl wrote:
>>> Josh Vanderhoof <x@y.z> writes:
>>>> anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
>>>>
>>>>> George Neuner <gneuner2@comcast.net> writes:
>>>>>> On Sun, 08 Sep 2024 15:36:39 GMT, anton@mips.complang.tuwien.ac.at
>>>>>> (Anton Ertl) wrote:
>>>>>>> 1) At first I thought that yes, one could just check whether 
>>>>>>> there is
>>>>>>> an overlap of the memory areas.  But then I remembered that you 
>>>>>>> cannot
>>>>>>> write such a check in standard C without (in the general case)
>>>>>>> exercising undefined behaviour; and then the compiler could 
>>>>>>> eliminate
>>>>>>> the check or do something else that's unexpected.  Do you have 
>>>>>>> such a
>>>>>>> check in mind that does not exercise undefined behaviour in the
>>>>>>> general case?
>>> ...
>>>> It is legal to test for equality between pointers to different objects
>>>> so you could test for overlap by testing against every element in the
>>>> array.  It seems like it should be possible for the compiler to figure
>>>> out what's happening and optimize those tests away, but unfortunately
>>>> no compiler I tested did it.
>>>
>>> That would be an interesting result of the ATUBDNH lunacy: programmers
>>> would see themselves forced to write workarounds such as the one you
>>> suggest (with terrible performance when not optimized), and then C
>>> compiler maintainers would see themselves forced to optimize this kind
>>> of code.  The end result would be that both parties have to put in
>>> more effort to eventually get the same result as if ordered comparison
>>> between different objects had been defined from the start.
>>>
>>> For now, the ATUBDNH advocates tell programmers that they have to work
>>> around the lack of definition, but there is usually no optimization
>>> for that.
>>>
>>> One case where things work somewhat along the lines you suggest is
>>> unaligned accesses.  Traditionally, if knowing that the hardware
>>> supports unaligned accesses, for a 16-bit load one would write:
>>>
>>> int16_t foo1(int16_t *p)
>>> {
>>>    return *p;
>>> }
>>>
>>> If one does not know that the hardware supports unaligned accesses,
>>> the traditional way to perform such an access (little-endian) is
>>> something like:
>>>
>>> int16_t foo2(int16_t *p)
>>> {
>>>    unsignedchar *q = p;
>>>    return (int16_t)(q[0] + (q[1]>>8));
>>> }
> 
> Correcting the typos (in case anyone wants to copy-and-paste to 
> godbolt.org for testing):
> 
> 
> int16_t foo2(int16_t *p)
> {
>      unsigned char *q = (unsigned char *) p;
>      return (int16_t)(q[0] + (q[1] << 8));
> }
> 
>>>
>>> Now, several years ago, somebody told me that the proper way is as
>>> follows:
>>>
>>> int16_t foo3(int16_t *p)
>>> {
>>>     int16_t v;
>>>     memcpy(&v,p,2);
>>>     return v;
>>> }
>>>
>>> That way looked horribly inefficient to me, with v having to reside in
>>> memory instead of in a register and then the expensive function call,
>>> and all the decisions that memcpy() has to take depending on the
>>> length argument.  But gcc optimizes this idiom into an unaligned load
>>> rather than taking all the steps that I expected (however, I have seen
>>> cases where the code produced on hardware that supports unaligned
>>> accesses is worse than that for foo1()).  Of course, if you also want
>>> to support less sophisticated compilers, this idiom may be really slow
>>> on those, although not quite as expensive as your containment check.
>>>
>>
> 
> It is a unfortunate truth that code that is correct can be inefficient 
> on some compilers, while code that is efficient on those compilers is 
> not correct (according to the C standards) and can fail on other 
> compilers.  I may be a "ATUBDNH advocate", but I can certainly 
> acknowledge that much.  The C standard is concerned with the behaviour 
> of the code, not its efficiency, and it has always been a fact of life 
> for C programmers that different compilers give better or worse results 
> for different ways of writing source code.  Not all code can be written 
> portably /and/ efficiently, without at least some conditional compilation.
> 
> foo1() is defined behaviour if and only if the pointer is correctly 
> aligned.  For a stand-alone function,
> 
> foo2() above is perfectly correct C and has fully defined behaviour 
> (with the obvious assumptions that CHARBIT is 8 and that int16_t 
> exists), but only gives the correct results for little-endian systems.
> 
> foo3() is correct regardless of the endianness (with the same 
> assumptions about the targets), but efficiency can vary.
> 
> Testing these on godbolt.org with gcc and MSVC shows these both optimise 
> the memcpy() into a single 16-bit load.  MSVC does not recognize the 
> pattern in foo2() and generates poor code for it (it even uses an "imul" 
> instruction!).
> 
> 
> Another alternative is:
> 
> int16_t foo1v(int16_t *p)
> {
>      volatile int16_t * q = p;
>      return *q;
> }
> 
> The C standard does not say exactly what this will do, but you can 
> expect the compiler to generate the load, even if it knows "p" is 
> misaligned, and even if it knows the target does not support misaligned 
> accesses.  Of course, this has implications for optimisations as the 
> compiler can't re-order such loads.
> 
> 
>> Would be nice, say, if there were semi-standard compiler macros for 
>> various things:
> 
> Ask, and you shall receive!  (Well, sometimes you might receive.)
> 
>>    Endianess (macros exist, typically compiler specific);
>>      And, apparently GCC and Clang can't agree on which strategy to use.
> 
> #if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
> ...
> #elif __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
> ...
> #else
> ...
> #endif
> 
> Works in gcc, clang and MSVC.
> 

Technically now also in BGBCC, since I have just recently added it.

> 
> And C23 has the <stdbit.h> header with many convenient little "bit and 
> byte" utilities, including endian detection:
> 
> #include <stdbit.h>
> #if __STDC_ENDIAN_NATIVE__ == __STDC_ENDIAN_LITTLE__
> ...
> #elif __STDC_ENDIAN_NATIVE__ == __STDC_ENDIAN_BIG__
> ...
> #else
> ...
> #endif
> 

This is good at least.

Though, generally takes a few years before new features become usable.
Like, it is only in recent years that it has become "safe" to use most 
parts of C99.
========== REMAINDER OF ARTICLE TRUNCATED ==========