| Deutsch English Français Italiano |
|
<vjv5jd$2ds8r$3@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: BGB <cr88192@gmail.com>
Newsgroups: comp.lang.c
Subject: Re: transpiling to low level C
Date: Wed, 18 Dec 2024 12:51:23 -0600
Organization: A noiseless patient Spider
Lines: 227
Message-ID: <vjv5jd$2ds8r$3@dont-email.me>
References: <vjlh19$8j4k$1@dont-email.me>
<vjn9g5$n0vl$1@raubtier-asyl.eternal-september.org>
<vjnhsq$oh1f$1@dont-email.me> <vjnq5s$pubt$1@dont-email.me>
<vjp2f3$13k4m$2@dont-email.me> <vjr7np$1j57r$2@dont-email.me>
<vjsdum$1rfp2$1@dont-email.me> <vjsi62$1s5j5$2@dont-email.me>
<vjsjll$1rlkq$3@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 18 Dec 2024 19:51:26 +0100 (CET)
Injection-Info: dont-email.me; posting-host="d6170756eb4c94ad5ca5e98ba35d9045";
logging-data="2552091"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/FsF/FKxy9I+4A/qdSYrV1Fzo1hzOnYuU="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:oLqLpAfe9CR9p2pWxJFWWD0e2R0=
In-Reply-To: <vjsjll$1rlkq$3@dont-email.me>
Content-Language: en-US
Bytes: 8568
On 12/17/2024 1:33 PM, Thiago Adams wrote:
> Em 12/17/2024 4:07 PM, BGB escreveu:
>> On 12/17/2024 11:55 AM, Thiago Adams wrote:
>>> Em 12/17/2024 4:03 AM, BGB escreveu:
>>>> On 12/16/2024 5:21 AM, Thiago Adams wrote:
>>>>> On 15/12/2024 20:53, BGB wrote:
>>>>>> On 12/15/2024 3:32 PM, bart wrote:
>>>>>>> On 15/12/2024 19:08, Bonita Montero wrote:
>>>>>>>> C++ is more readable because is is magnitudes more expressive
>>>>>>>> than C.
>>>>>>>> You can easily write a C++-statement that would hunddres of
>>>>>>>> lines in
>>>>>>>> C (imagines specializing a unordered_map by hand). Making a
>>>>>>>> language
>>>>>>>> less expressive makes it even less readable, and that's also
>>>>>>>> true for
>>>>>>>> your reduced C.
>>>>>>>>
>>>>>>>
>>>>>>> That's not really the point of it. This reduced C is used as an
>>>>>>> intermediate language for a compiler target. It will not usually
>>>>>>> be read, or maintained.
>>>>>>>
>>>>>>> An intermediate language needs to at a lower level than the
>>>>>>> source language.
>>>>>>>
>>>>>>> And for this project, it needs to be compilable by any C89 compiler.
>>>>>>>
>>>>>>> Generating C++ would be quite useless.
>>>>>>>
>>>>>>
>>>>>> As an IL, even C is a little overkill, unless turned into a
>>>>>> restricted subset (say, along similar lines to GCC's GIMPLE).
>>>>>>
>>>>>> Say:
>>>>>> Only function-scope variables allowed;
>>>>>> No high-level control structures;
>>>>>> ...
>>>>>>
>>>>>> Say:
>>>>>> int foo(int x)
>>>>>> {
>>>>>> int i, v;
>>>>>> for(i=x, v=0; i>0; i--)
>>>>>> v=v*i;
>>>>>> return(v);
>>>>>> }
>>>>>>
>>>>>> Becoming, say:
>>>>>> int foo(int x)
>>>>>> {
>>>>>> int i;
>>>>>> int v;
>>>>>> i=x;
>>>>>> v=0;
>>>>>> if(i<=0)goto L1;
>>>>>> L0:
>>>>>> v=v*i;
>>>>>> i=i-1;
>>>>>> if(i>0)goto L0;
>>>>>> L1:
>>>>>> return v;
>>>>>> }
>>>>>>
>>>>>> ...
>>>>>>
>>>>>
>>>>> I have considered to remove loops and keep only goto.
>>>>> But I think this is not bring too much simplification.
>>>>>
>>>>
>>>> It depends.
>>>>
>>>> If the compiler works like an actual C compiler, with a full parser
>>>> and AST stage, yeah, it may not save much.
>>>>
>>>>
>>>> If the parser is a thin wrapper over 3AC operations (only allowing
>>>> statements that map 1:1 with a 3AC IR operation), it may save a bit
>>>> more...
>>>>
>>>>
>>>>
>>>> As for whether or not it makes sense to use a C like syntax here,
>>>> this is more up for debate (for practical use within a compiler, I
>>>> would assume a binary serialization rather than an ASCII syntax,
>>>> though ASCII may be better in terms of inter-operation or human
>>>> readability).
>>>>
>>>>
>>>> But, as can be noted, I would assume a binary serialization that is
>>>> oriented around operators; and *not* about serializing the
>>>> structures used to implement those operators. Also I would assume
>>>> that the IR need not be in SSA form (conversion to full SSA could be
>>>> done when reading in the IR operations).
>>>>
>>>>
>>>> Ny argument is that not using SSA form means fewer issues for both
>>>> the serialization format and compiler front-end to need to deal with
>>>> (and is comparably easy to regenerate for the backend, with the
>>>> backend operating with its internal IR in SSA form).
>>>>
>>>> Well, contrast to LLVM assuming everything is always in SSA form.
>>>>
>>>> ...
>>>>
>>>>
>>>
>>> I also have considered split expressions.
>>>
>>> For instance
>>>
>>> if (a*b+c) {}
>>>
>>> into
>>>
>>> register int r1 = a * b;
>>> register int r2 = r1 + c;
>>> if (r2) {}
>>>
>>> This would make easier to add overflow checks in runtime (if desired)
>>> and implement things like _complex
>>>
>>> Is this what you mean by 3AC or SSA?
>>>
>>
>> 3AC means that IR expressed 3 (or sometimes more) operands per IR op.
>>
>> So:
>> MUL r1, a, b
>> Rather than, say, stack:
>> LOAD a
>> LOAD b
>> MUL
>> STORE r1
>>
>>
>> SSA:
>> Static Single Assignment
>>
>
> Oh sorry .. I knew what SSA is.
>
>> Generally:
>> Every variable may only be assigned once (more like in a functional
>> programming language);
>> Generally, variables are "merged" in the control-flow via PHI
>> operators (which variable merges in depending on which path control
>> came from).
>>
>
> I do similar merge in my flow analysis but without the concept of SSA.
>
>> IMHO, while SSA is preferable for backend analysis, optimization, and
>> code generation; it is undesirable pretty much everywhere else as it
>> adds too much complexity.
>>
>> Better IMO for the frontend compiler and main IL stage to assume that
>> local variables are freely mutable.
>>
>> Typically, global variables are excluded in most variants, and remain
>> fully mutable; but may be handled as designated LOAD/STORE operations.
>>
>>
>> In BGBCC though, full SSA only applies to temporaries. Normal local
>> variables are merely flagged by "version", and all versions of the
>> same local variable implicitly merge back together at each branch/label.
>>
>
> Sorry what is BGBCC ? (C compiler?)
>
It is my C compiler.
========== REMAINDER OF ARTICLE TRUNCATED ==========