Deutsch English Français Italiano |
<v4kdmu$3hi90$1@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: David Brown <david.brown@hesbynett.no> Newsgroups: comp.lang.c Subject: Re: C23 thoughts and opinions Date: Sat, 15 Jun 2024 17:58:22 +0200 Organization: A noiseless patient Spider Lines: 266 Message-ID: <v4kdmu$3hi90$1@dont-email.me> References: <v2l828$18v7f$1@dont-email.me> <00297443-2fee-48d4-81a0-9ff6ae6481e4@gmail.com> <v2lji1$1bbcp$1@dont-email.me> <87msoh5uh6.fsf@nosuchdomain.example.com> <f08d2c9f-5c2e-495d-b0bd-3f71bd301432@gmail.com> <v2nbp4$1o9h6$1@dont-email.me> <v2ng4n$1p3o2$1@dont-email.me> <87y18047jk.fsf@nosuchdomain.example.com> <87msoe1xxo.fsf@nosuchdomain.example.com> <v2sh19$2rle2$2@dont-email.me> <87ikz11osy.fsf@nosuchdomain.example.com> <v2v59g$3cr0f$1@dont-email.me> <87plt8yxgn.fsf@nosuchdomain.example.com> <v31rj5$o20$1@dont-email.me> <87cyp6zsen.fsf@nosuchdomain.example.com> <v34gi3$j385$1@dont-email.me> <874jahznzt.fsf@nosuchdomain.example.com> <v36nf9$12bei$1@dont-email.me> <87v82b43h6.fsf@nosuchdomain.example.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Date: Sat, 15 Jun 2024 17:58:23 +0200 (CEST) Injection-Info: dont-email.me; posting-host="f678a482ffafce70c2ceef8ecfac3e10"; logging-data="3721504"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18vxD4k44lXyJzm2KwhDGinv1dkwtQB1Iw=" User-Agent: Mozilla Thunderbird Cancel-Lock: sha1:6kMTKpL45xX88ypb0HN+ZpZI4NM= In-Reply-To: <87v82b43h6.fsf@nosuchdomain.example.com> Content-Language: en-GB Bytes: 15561 On 14/06/2024 23:30, Keith Thompson wrote: > David Brown <david.brown@hesbynett.no> writes: >> On 28/05/2024 22:21, Keith Thompson wrote: >>> David Brown <david.brown@hesbynett.no> writes: >>>> On 28/05/2024 02:33, Keith Thompson wrote: >>> [...] >>>>> Without some kind of programmer control, I'm concerned that the rules >>>>> for defining an array so #embed will be correctly optimized will be >>>>> spread as lore rather than being specified anywhere. >>>> >>>> They might, but I really do not think that is so important, since they >>>> will not affect the generated results. >>> Right, it won't affect the generated results (assuming I use it >>> correctly). Unless I use `#embed optimize(true)` to initialize >>> a struct with varying member sizes, but that's my fault because I >>> asked for it. >> >> I am still not understanding your point. (I am confident that you >> have a point, even if I don't get it.) >> >> I cannot see why there would be any need or use of manually adding >> optimisation hints or controls in the source code. I cannot see why >> the there is any possibility of getting incorrect results in any way. >> >>> The point is compile-timer performance, and perhaps even the ability >>> to compile at all. >>> I'm thinking about hypothetical cases where I want to embed a >>> *very* large file and parsing the comma-delimited sequence could >>> have unacceptable compile-time performance, perhaps even causing >>> a compile-time stack overflow depending on how the parser works. >>> Every time the compiler sees #embed, it has to decide whether to >>> optimize it or not, and the decision criteria are not specified >>> anywhere (not at all in the standard, perhaps not clearly in the >>> compiler's documentation). >>> >> >> Yes, I agree with that. And this is how it should be - this is not >> something that should be specified. The C standards give minimum >> requirements for things like the number of identifiers or the length >> of lines. But pretty much all compilers, for most of the "translation >> limits", say they are "limited by the memory of the host computer". >> The same will apply to #embed. And some compilers will cope better >> than others with huge #embed's, some will be faster, some more memory >> efficient. Some will change from version to version. This is not >> something that can sensibly be specified or formalized - like pretty >> much everything in regard to compilation time, each compiler does the >> best it can without any specifications. I'd expect compiler reference >> manuals might have hints, such as saying #embed is fastest with >> unsigned char arrays (or whatever), but no more than that. >> >> But again - I see no reason for manual optimisation hints, and no >> reason for any possible errors. >> >> Let me outline a possible strategy for a compiler like gcc. (I have >> not looked at the prototype implementations from thephd, nor any gcc >> developer discussions.) >> >> gcc splits the C pre-processor and the compiler itself, and >> (currently) communicates dataflow in only one direction, via a >> temporary file or a pipe. But the "gcc" (or "g++", according to >> preference) driver program calls and coordinates the two programs. >> >> If the pre-processor is called stand-alone, then it will generate a >> comma-separated list of integers, helpfully split over multiple lines >> of reasonable size. This will clearly always be correct, and always >> work, within limits of a compiler's translation limits. >> >> But when the gcc driver calls it, it will have a flag indicating that >> the target compiler is gcc and supports an extended pre-processed >> syntax (and also that the source is C23 - after all, the C >> pre-processor can be used as a macro processor for other files with no >> relation to C). Now the pre-processor has a lot more freedom. >> Whenever it meets an #embed directive, it can generate a line : >> >> #embed_data 123456 >> >> followed in the file by 123456 (or whatever) bytes of binary data. >> The C compiler, when parsing this file, will pull that in as a single >> blob. Then it is up to the C compiler - which knows how the #embed >> data will be used - to tell if the these bytes should be used as >> parameters to a macro, initialisation for a char array, or whatever. >> And it can use them as efficiently as practically possible. (It is >> probably only worth using this for #embed data over a certain size - >> smaller #embed's could just generate the integer sequences.) >> >> Nowhere in this is there any call of manual optimisation hints, nor >> any risk of incorrect results. > > I've kept this on the back burner for a couple of weeks. I'm finally > getting around to posting a followup. > > I'm not particular concerned about compilers processing #embed > incorrectly. It's conceivable that a compiler could incorrectly decide > that it can optimize a particular #embed directive, but I expect > compilers to be conservative, falling back to the specified behavior if > they can't *prove* that an optimization is safe. > I'd expect that too. (Of course there's always the risk of bugs with weird use-case) > I see two conceptual problems with #embed as it's currently defined in > N3220. > > First, there's a possible compile-time performance issue for very large > embedded files. The (draft) standard calls for #embed to expand to a > comma-separated list of integer constant expressions. (I'm not sure why > it didn't specify integer constants.) > > My objection is based on the possibility that #embed for a *very* large > file might result in unacceptable time and memory usage during compile > time. I haven't looked into how existing compilers handle large > initializers, but I can imagine that parsing such a list might consume > more than O(N) time and/or memory, or at least O(N) with a large > constant. (If parsing long lists of integer constants is expensive for > some compiler, this could be a motivation to optimize that particular > case.) The point of #embed is to get O(N) scaling - or at least, much closer to that than compilers do today with an #include of a list of numbers (or even a string literal). There is little doubt that a big enough #embed file will consume time and memory that is unacceptable, at least for some people - all you need is to pick a file bigger than your computer's memory, and you can be reasonably confident that it will be problematic. But it also seems reasonable to expect that if a file is big enough to cause trouble for #embed, then any other method of including it in a C file will be at least as bad and probably /much/ worse. At worst, #embed is going to be no less efficient than today's solution, and at best it will be significantly more efficient. I don't think it is fair to object to it because a given implementation might not reach theoretical optimum efficiencies. > > The intent of #embed is to copy the contents of a file at compile time > into an array of unsigned char -- but it's specified in a roundabout way > that requires bizarre usages to work "correctly". That is one expected use, and will probably be the biggest use by a fair way, but it is not the only possible use. The specification lets you have more flexibility. For example, I have a project where I include a number of files in a structure with a number of unsigned char arrays, amongst other data - a simpler #embed solution that forced you to have an unsigned char array might not work with that. (The project predates #embed and uses a Python script to generate the data.) > I expect at least > some compilers to optimize #embed for better compile-time performance, > but that requires them to determine when optimization is permitted with > no advice from the standard about how to do that. That's going to be > moderately difficult for compiler implementers; I'm not too concerned > about that. But it also imposes a burden on programmers, who will have > to use trial and error to determine how to ensure a #embed is optimized. > I am entirely confident that major compiler vendors will optimise the case of initialising char arrays. For anything else, who cares? It is unlikely that you'd use #embed for other purposes with files that are big enough for unoptimised implementations to be unreasonably slow. And if that does turn out to be a problem in practice, then you /know/ you have huge files and are doing something weird, and you can use something other than #embed for the purpose in the same way you do today. Of prime importance is /correctness/ - #embed should give the results you expect, and I can't see that being a problem. Outside that, #embed is always going to be at least as efficient as existing solutions, and usually much faster for cases that matter. ========== REMAINDER OF ARTICLE TRUNCATED ==========