Deutsch English Français Italiano |
<vjsjll$1rlkq$3@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!weretis.net!feeder9.news.weretis.net!news.quux.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Thiago Adams <thiago.adams@gmail.com> Newsgroups: comp.lang.c Subject: Re: transpiling to low level C Date: Tue, 17 Dec 2024 16:33:09 -0300 Organization: A noiseless patient Spider Lines: 187 Message-ID: <vjsjll$1rlkq$3@dont-email.me> References: <vjlh19$8j4k$1@dont-email.me> <vjn9g5$n0vl$1@raubtier-asyl.eternal-september.org> <vjnhsq$oh1f$1@dont-email.me> <vjnq5s$pubt$1@dont-email.me> <vjp2f3$13k4m$2@dont-email.me> <vjr7np$1j57r$2@dont-email.me> <vjsdum$1rfp2$1@dont-email.me> <vjsi62$1s5j5$2@dont-email.me> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Date: Tue, 17 Dec 2024 20:33:10 +0100 (CET) Injection-Info: dont-email.me; posting-host="1f012199d928ca914dffdfea9ee32a88"; logging-data="1955482"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/7Yvy5hDQUC6gllMHMlbV9S4Jr4BlsEww=" User-Agent: Mozilla Thunderbird Cancel-Lock: sha1:37lSvedm3jk55EcTgVyp72/Ebwk= Content-Language: en-GB In-Reply-To: <vjsi62$1s5j5$2@dont-email.me> Bytes: 7192 Em 12/17/2024 4:07 PM, BGB escreveu: > On 12/17/2024 11:55 AM, Thiago Adams wrote: >> Em 12/17/2024 4:03 AM, BGB escreveu: >>> On 12/16/2024 5:21 AM, Thiago Adams wrote: >>>> On 15/12/2024 20:53, BGB wrote: >>>>> On 12/15/2024 3:32 PM, bart wrote: >>>>>> On 15/12/2024 19:08, Bonita Montero wrote: >>>>>>> C++ is more readable because is is magnitudes more expressive >>>>>>> than C. >>>>>>> You can easily write a C++-statement that would hunddres of lines in >>>>>>> C (imagines specializing a unordered_map by hand). Making a language >>>>>>> less expressive makes it even less readable, and that's also true >>>>>>> for >>>>>>> your reduced C. >>>>>>> >>>>>> >>>>>> That's not really the point of it. This reduced C is used as an >>>>>> intermediate language for a compiler target. It will not usually >>>>>> be read, or maintained. >>>>>> >>>>>> An intermediate language needs to at a lower level than the source >>>>>> language. >>>>>> >>>>>> And for this project, it needs to be compilable by any C89 compiler. >>>>>> >>>>>> Generating C++ would be quite useless. >>>>>> >>>>> >>>>> As an IL, even C is a little overkill, unless turned into a >>>>> restricted subset (say, along similar lines to GCC's GIMPLE). >>>>> >>>>> Say: >>>>> Only function-scope variables allowed; >>>>> No high-level control structures; >>>>> ... >>>>> >>>>> Say: >>>>> int foo(int x) >>>>> { >>>>> int i, v; >>>>> for(i=x, v=0; i>0; i--) >>>>> v=v*i; >>>>> return(v); >>>>> } >>>>> >>>>> Becoming, say: >>>>> int foo(int x) >>>>> { >>>>> int i; >>>>> int v; >>>>> i=x; >>>>> v=0; >>>>> if(i<=0)goto L1; >>>>> L0: >>>>> v=v*i; >>>>> i=i-1; >>>>> if(i>0)goto L0; >>>>> L1: >>>>> return v; >>>>> } >>>>> >>>>> ... >>>>> >>>> >>>> I have considered to remove loops and keep only goto. >>>> But I think this is not bring too much simplification. >>>> >>> >>> It depends. >>> >>> If the compiler works like an actual C compiler, with a full parser >>> and AST stage, yeah, it may not save much. >>> >>> >>> If the parser is a thin wrapper over 3AC operations (only allowing >>> statements that map 1:1 with a 3AC IR operation), it may save a bit >>> more... >>> >>> >>> >>> As for whether or not it makes sense to use a C like syntax here, >>> this is more up for debate (for practical use within a compiler, I >>> would assume a binary serialization rather than an ASCII syntax, >>> though ASCII may be better in terms of inter-operation or human >>> readability). >>> >>> >>> But, as can be noted, I would assume a binary serialization that is >>> oriented around operators; and *not* about serializing the structures >>> used to implement those operators. Also I would assume that the IR >>> need not be in SSA form (conversion to full SSA could be done when >>> reading in the IR operations). >>> >>> >>> Ny argument is that not using SSA form means fewer issues for both >>> the serialization format and compiler front-end to need to deal with >>> (and is comparably easy to regenerate for the backend, with the >>> backend operating with its internal IR in SSA form). >>> >>> Well, contrast to LLVM assuming everything is always in SSA form. >>> >>> ... >>> >>> >> >> I also have considered split expressions. >> >> For instance >> >> if (a*b+c) {} >> >> into >> >> register int r1 = a * b; >> register int r2 = r1 + c; >> if (r2) {} >> >> This would make easier to add overflow checks in runtime (if desired) >> and implement things like _complex >> >> Is this what you mean by 3AC or SSA? >> > > 3AC means that IR expressed 3 (or sometimes more) operands per IR op. > > So: > MUL r1, a, b > Rather than, say, stack: > LOAD a > LOAD b > MUL > STORE r1 > > > SSA: > Static Single Assignment > Oh sorry .. I knew what SSA is. > Generally: > Every variable may only be assigned once (more like in a functional > programming language); > Generally, variables are "merged" in the control-flow via PHI operators > (which variable merges in depending on which path control came from). > I do similar merge in my flow analysis but without the concept of SSA. > IMHO, while SSA is preferable for backend analysis, optimization, and > code generation; it is undesirable pretty much everywhere else as it > adds too much complexity. > > Better IMO for the frontend compiler and main IL stage to assume that > local variables are freely mutable. > > Typically, global variables are excluded in most variants, and remain > fully mutable; but may be handled as designated LOAD/STORE operations. > > > In BGBCC though, full SSA only applies to temporaries. Normal local > variables are merely flagged by "version", and all versions of the same > local variable implicitly merge back together at each branch/label. > Sorry what is BGBCC ? (C compiler?) > This allows some similar advantages (for analysis and optimization) > while limiting some of the complexities. Though, this differs from > temporaries which are assumed to essentially fully disappear once they > go outside of the span in which they exist (albeit with an awkward case > to deal with temporaries that cross basic-block boundaries, which need > to actually "exist" in some semi-concrete form, more like local variables). > > Note that unless the address is taken of a local variable, it need not ========== REMAINDER OF ARTICLE TRUNCATED ==========