Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Thiago Adams Newsgroups: comp.lang.c Subject: Re: transpiling to low level C Date: Tue, 17 Dec 2024 15:16:48 -0300 Organization: A noiseless patient Spider Lines: 153 Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Date: Tue, 17 Dec 2024 19:16:49 +0100 (CET) Injection-Info: dont-email.me; posting-host="1f012199d928ca914dffdfea9ee32a88"; logging-data="1955482"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/BMvVm1HDznUldhLKsKbgOC9pSIUeGRjo=" User-Agent: Mozilla Thunderbird Cancel-Lock: sha1:MhRKIzOR/l1GbfXulfpo8OtdZSY= In-Reply-To: Content-Language: en-GB Bytes: 5686 Em 12/17/2024 2:59 PM, Thiago Adams escreveu: > Em 12/17/2024 2:55 PM, Thiago Adams escreveu: >> Em 12/17/2024 4:03 AM, BGB escreveu: >>> On 12/16/2024 5:21 AM, Thiago Adams wrote: >>>> On 15/12/2024 20:53, BGB wrote: >>>>> On 12/15/2024 3:32 PM, bart wrote: >>>>>> On 15/12/2024 19:08, Bonita Montero wrote: >>>>>>> C++ is more readable because is is magnitudes more expressive >>>>>>> than C. >>>>>>> You can easily write a C++-statement that would hunddres of lines in >>>>>>> C (imagines specializing a unordered_map by hand). Making a language >>>>>>> less expressive makes it even less readable, and that's also true >>>>>>> for >>>>>>> your reduced C. >>>>>>> >>>>>> >>>>>> That's not really the point of it. This reduced C is used as an >>>>>> intermediate language for a compiler target. It will not usually >>>>>> be read, or maintained. >>>>>> >>>>>> An intermediate language needs to at a lower level than the source >>>>>> language. >>>>>> >>>>>> And for this project, it needs to be compilable by any C89 compiler. >>>>>> >>>>>> Generating C++ would be quite useless. >>>>>> >>>>> >>>>> As an IL, even C is a little overkill, unless turned into a >>>>> restricted subset (say, along similar lines to GCC's GIMPLE). >>>>> >>>>> Say: >>>>>    Only function-scope variables allowed; >>>>>    No high-level control structures; >>>>>    ... >>>>> >>>>> Say: >>>>>    int foo(int x) >>>>>    { >>>>>      int i, v; >>>>>      for(i=x, v=0; i>0; i--) >>>>>        v=v*i; >>>>>      return(v); >>>>>    } >>>>> >>>>> Becoming, say: >>>>>    int foo(int x) >>>>>    { >>>>>      int i; >>>>>      int v; >>>>>      i=x; >>>>>      v=0; >>>>>      if(i<=0)goto L1; >>>>>      L0: >>>>>      v=v*i; >>>>>      i=i-1; >>>>>      if(i>0)goto L0; >>>>>      L1: >>>>>      return v; >>>>>    } >>>>> >>>>> ... >>>>> >>>> >>>> I have considered to remove loops and keep only goto. >>>> But I think this is not bring too much simplification. >>>> >>> >>> It depends. >>> >>> If the compiler works like an actual C compiler, with a full parser >>> and AST stage, yeah, it may not save much. >>> >>> >>> If the parser is a thin wrapper over 3AC operations (only allowing >>> statements that map 1:1 with a 3AC IR operation), it may save a bit >>> more... >>> >>> >>> >>> As for whether or not it makes sense to use a C like syntax here, >>> this is more up for debate (for practical use within a compiler, I >>> would assume a binary serialization rather than an ASCII syntax, >>> though ASCII may be better in terms of inter-operation or human >>> readability). >>> >>> >>> But, as can be noted, I would assume a binary serialization that is >>> oriented around operators; and *not* about serializing the structures >>> used to implement those operators. Also I would assume that the IR >>> need not be in SSA form (conversion to full SSA could be done when >>> reading in the IR operations). >>> >>> >>> Ny argument is that not using SSA form means fewer issues for both >>> the serialization format and compiler front-end to need to deal with >>> (and is comparably easy to regenerate for the backend, with the >>> backend operating with its internal IR in SSA form). >>> >>> Well, contrast to LLVM assuming everything is always in SSA form. >>> >>> ... >>> >>> >> >> I also have considered split expressions. >> >> For instance >> >> if (a*b+c) {} >> >> into >> >> register int r1 = a * b; >> register int r2 = r1 + c; >> if (r2) {} >> >> This would make easier to add overflow checks in runtime (if desired) >> and implement things like _complex >> >> Is this what you mean by 3AC or SSA? >> >> This would definitely simplify expressions grammar. >> >> > > I also have consider remove local scopes. But I think local scopes may > be useful to better use stack reusing the same addresses when variables > goes out of scope. > For instance > > { >  int i =1; >  { >   int a  = 2; >  } >  { >   int b  = 3; >  } > } > I think scope makes easier to use the same stack position of a and b > because it is easier to see a does not exist any more. > also remove structs changing by unsigned char [] and cast parts of it to access members. I think this the lower level possible in c.