Deutsch English Français Italiano |
<vbvljl$ea0m$1@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!2.eu.feeder.erje.net!3.eu.feeder.erje.net!feeder.erje.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: BGB <cr88192@gmail.com> Newsgroups: comp.arch Subject: Re: Computer architects leaving Intel... Date: Thu, 12 Sep 2024 16:14:18 -0500 Organization: A noiseless patient Spider Lines: 429 Message-ID: <vbvljl$ea0m$1@dont-email.me> References: <vaqgtl$3526$1@dont-email.me> <memo.20240830090549.19028u@jgd.cix.co.uk> <2024Aug30.161204@mips.complang.tuwien.ac.at> <86r09ulqyp.fsf@linuxsc.com> <2024Sep8.173639@mips.complang.tuwien.ac.at> <p1cvdjpqjg65e6e3rtt4ua6hgm79cdfm2n@4ax.com> <2024Sep10.101932@mips.complang.tuwien.ac.at> <ygn8qvztf16.fsf@y.z> <2024Sep11.123824@mips.complang.tuwien.ac.at> <vbsoro$3ol1a$1@dont-email.me> <vbut86$9toi$1@dont-email.me> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Date: Thu, 12 Sep 2024 23:14:30 +0200 (CEST) Injection-Info: dont-email.me; posting-host="1759098feabd3e0537ca2f71e7972685"; logging-data="469014"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19Rn8BuscVD1bwIia4DihbSl1NUiVlU37g=" User-Agent: Mozilla Thunderbird Cancel-Lock: sha1:EPYkDATsJwH+Rs85f4doBG6/ojI= Content-Language: en-US In-Reply-To: <vbut86$9toi$1@dont-email.me> Bytes: 17115 On 9/12/2024 9:18 AM, David Brown wrote: > On 11/09/2024 20:51, BGB wrote: >> On 9/11/2024 5:38 AM, Anton Ertl wrote: >>> Josh Vanderhoof <x@y.z> writes: >>>> anton@mips.complang.tuwien.ac.at (Anton Ertl) writes: >>>> >>>>> George Neuner <gneuner2@comcast.net> writes: >>>>>> On Sun, 08 Sep 2024 15:36:39 GMT, anton@mips.complang.tuwien.ac.at >>>>>> (Anton Ertl) wrote: >>>>>>> 1) At first I thought that yes, one could just check whether >>>>>>> there is >>>>>>> an overlap of the memory areas. But then I remembered that you >>>>>>> cannot >>>>>>> write such a check in standard C without (in the general case) >>>>>>> exercising undefined behaviour; and then the compiler could >>>>>>> eliminate >>>>>>> the check or do something else that's unexpected. Do you have >>>>>>> such a >>>>>>> check in mind that does not exercise undefined behaviour in the >>>>>>> general case? >>> ... >>>> It is legal to test for equality between pointers to different objects >>>> so you could test for overlap by testing against every element in the >>>> array. It seems like it should be possible for the compiler to figure >>>> out what's happening and optimize those tests away, but unfortunately >>>> no compiler I tested did it. >>> >>> That would be an interesting result of the ATUBDNH lunacy: programmers >>> would see themselves forced to write workarounds such as the one you >>> suggest (with terrible performance when not optimized), and then C >>> compiler maintainers would see themselves forced to optimize this kind >>> of code. The end result would be that both parties have to put in >>> more effort to eventually get the same result as if ordered comparison >>> between different objects had been defined from the start. >>> >>> For now, the ATUBDNH advocates tell programmers that they have to work >>> around the lack of definition, but there is usually no optimization >>> for that. >>> >>> One case where things work somewhat along the lines you suggest is >>> unaligned accesses. Traditionally, if knowing that the hardware >>> supports unaligned accesses, for a 16-bit load one would write: >>> >>> int16_t foo1(int16_t *p) >>> { >>> return *p; >>> } >>> >>> If one does not know that the hardware supports unaligned accesses, >>> the traditional way to perform such an access (little-endian) is >>> something like: >>> >>> int16_t foo2(int16_t *p) >>> { >>> unsignedchar *q = p; >>> return (int16_t)(q[0] + (q[1]>>8)); >>> } > > Correcting the typos (in case anyone wants to copy-and-paste to > godbolt.org for testing): > > > int16_t foo2(int16_t *p) > { > unsigned char *q = (unsigned char *) p; > return (int16_t)(q[0] + (q[1] << 8)); > } > >>> >>> Now, several years ago, somebody told me that the proper way is as >>> follows: >>> >>> int16_t foo3(int16_t *p) >>> { >>> int16_t v; >>> memcpy(&v,p,2); >>> return v; >>> } >>> >>> That way looked horribly inefficient to me, with v having to reside in >>> memory instead of in a register and then the expensive function call, >>> and all the decisions that memcpy() has to take depending on the >>> length argument. But gcc optimizes this idiom into an unaligned load >>> rather than taking all the steps that I expected (however, I have seen >>> cases where the code produced on hardware that supports unaligned >>> accesses is worse than that for foo1()). Of course, if you also want >>> to support less sophisticated compilers, this idiom may be really slow >>> on those, although not quite as expensive as your containment check. >>> >> > > It is a unfortunate truth that code that is correct can be inefficient > on some compilers, while code that is efficient on those compilers is > not correct (according to the C standards) and can fail on other > compilers. I may be a "ATUBDNH advocate", but I can certainly > acknowledge that much. The C standard is concerned with the behaviour > of the code, not its efficiency, and it has always been a fact of life > for C programmers that different compilers give better or worse results > for different ways of writing source code. Not all code can be written > portably /and/ efficiently, without at least some conditional compilation. > > foo1() is defined behaviour if and only if the pointer is correctly > aligned. For a stand-alone function, > > foo2() above is perfectly correct C and has fully defined behaviour > (with the obvious assumptions that CHARBIT is 8 and that int16_t > exists), but only gives the correct results for little-endian systems. > > foo3() is correct regardless of the endianness (with the same > assumptions about the targets), but efficiency can vary. > > Testing these on godbolt.org with gcc and MSVC shows these both optimise > the memcpy() into a single 16-bit load. MSVC does not recognize the > pattern in foo2() and generates poor code for it (it even uses an "imul" > instruction!). > > > Another alternative is: > > int16_t foo1v(int16_t *p) > { > volatile int16_t * q = p; > return *q; > } > > The C standard does not say exactly what this will do, but you can > expect the compiler to generate the load, even if it knows "p" is > misaligned, and even if it knows the target does not support misaligned > accesses. Of course, this has implications for optimisations as the > compiler can't re-order such loads. > > >> Would be nice, say, if there were semi-standard compiler macros for >> various things: > > Ask, and you shall receive! (Well, sometimes you might receive.) > >> Endianess (macros exist, typically compiler specific); >> And, apparently GCC and Clang can't agree on which strategy to use. > > #if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ > ... > #elif __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__ > ... > #else > ... > #endif > > Works in gcc, clang and MSVC. > Technically now also in BGBCC, since I have just recently added it. > > And C23 has the <stdbit.h> header with many convenient little "bit and > byte" utilities, including endian detection: > > #include <stdbit.h> > #if __STDC_ENDIAN_NATIVE__ == __STDC_ENDIAN_LITTLE__ > ... > #elif __STDC_ENDIAN_NATIVE__ == __STDC_ENDIAN_BIG__ > ... > #else > ... > #endif > This is good at least. Though, generally takes a few years before new features become usable. Like, it is only in recent years that it has become "safe" to use most parts of C99. ========== REMAINDER OF ARTICLE TRUNCATED ==========