Article <v6duhl$amlm$1@dont-email.me>

Deutsch English Français Italiano
<v6duhl$amlm$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: bart <bc@freeuk.com>
Newsgroups: comp.lang.c
Subject: =?UTF-8?Q?Re=3A_technology_discussion_=E2=86=92_does_the_world_need?=
 =?UTF-8?B?IGEgIm5ldyIgQyA/?=
Date: Sun, 7 Jul 2024 12:35:17 +0100
Organization: A noiseless patient Spider
Lines: 162
Message-ID: <v6duhl$amlm$1@dont-email.me>
References: <v66eci$2qeee$1@dont-email.me> <v67gt1$2vq6a$2@dont-email.me>
 <v687h2$36i6p$1@dont-email.me> <871q48w98e.fsf@nosuchdomain.example.com>
 <v68dsm$37sg2$1@dont-email.me> <87plrsultu.fsf@bsb.me.uk>
 <v68sft$3a6lh$1@dont-email.me> <87ed87v4wi.fsf@bsb.me.uk>
 <v6adrm$3ljg6$1@dont-email.me> <87v81ita77.fsf@bsb.me.uk>
 <v6d5k0$6rk5$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 07 Jul 2024 13:35:17 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="f75d1ea82b211cd880bb79514b946f37";
	logging-data="350902"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1/Nm6+zD7KY84GatoAHwukZ"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:PgEDLrjk7qQeto8PBKQDgjzszSk=
Content-Language: en-GB
In-Reply-To: <v6d5k0$6rk5$1@dont-email.me>
Bytes: 7040

On 07/07/2024 05:28, BGB wrote:
> On 7/6/2024 5:41 PM, Ben Bacarisse wrote:
>> BGB <cr88192@gmail.com> writes:
>>
>>> On 7/5/2024 5:40 PM, Ben Bacarisse wrote:
>>>> BGB <cr88192@gmail.com> writes:
>>>>
>>>>> On 7/5/2024 6:20 AM, Ben Bacarisse wrote:
>>>>>> BGB <cr88192@gmail.com> writes:
>>
>>>>>>> While eliminating structs could also simplify things; structs 
>>>>>>> also tend to
>>>>>>> be a lot more useful.
>>>>>> Indeed.  And I'd have to use them for this!
>>>>>>
>>>>>
>>>>> Errm, the strategy I would assume is, as noted:
>>>>>     int a[4][4];
>>>>>     ...
>>>>>     l=a[j][k];
>>>>> Becomes:
>>>>>     int a[16];
>>>>>     ...
>>>>>     l=a[j*4+k];
>>>> That's what you want to force me to write, but I can use and array of
>>>> arrays despite your arbitrary ban on them by simply putting the 
>>>> array in
>>>> a struct.
>> ...
>>> IN most contexts, I don't really see how a struct is preferable to a
>>> multiply, but either way...
>>
>> And I can't see how an array of arrays is harder for your compiler than
>> an array of structs.  C's indexing requires the compiler to know that
>> size of the items pointed to.
>>
>> I suspect that there is something amiss with your design if you are
>> considering this limiting in order to simplify the compiler.  A simple
>> compiler should not care what kind of thing p points to in
>>
>>    p[i]
>>
>> only what size of object p points to.
>>
> 
> 
> When I designed the compiler code, the initial approach for internal 
> type layout was to bit-pack it into 32 bits, say (a lot of this is from 
> memory, so maybe wrong):
> Basic1
>    (31:28): Layout of Type (0=Basic)
>    (27:16): Array Size
>    (15:12): Pointer Level Count
>    (11: 0): Base Type
> Basic2
>    (31:28): Layout of Type (1=Basic2)
>    (27: 8): Array Size
>    ( 7: 6): Pointer Level Count
>    ( 5: 0): Base Type
> Basic3
>    (31:28): Layout of Type (2=Basic3)
>    (27:24): Array Size
>    (23:20): Pointer Level Count
>    (19: 0): Base Type
> Overflow
>    (31:28): Layout of Type (3=Overflow)
>    (27:24): MBZ
>    (23: 0): Index into Type-Overflow Table
> And, a few other cases...
> 
> 
> Basic1 was the default, able to express arrays from 0..4095 elements, 
> with 0..7 levels of pointer indirection, and 0..4095 for the base type.
>   Where, 0=T, 1=T*, 2=T**, ..., 7=T*******
>     8=T[], 9=T[][], A=T*[], B=T*[*], C=&T, ...

That's quite a level of detail. This looks more like an encoding you 
might devise for a CPU instruction set. And yet you say elsewhere that 
the whole compiler is 250K lines? So you're not trying to squeeze it 
into a 16KB ROM or something.

> It could also be used to encode another type, which was needed for 
> things like multidimensional arrays and some other complex types. But, 
> this seemed like an ugly hack... (And was at odds with how I imagined 
> types working, but seemed like a technical necessity).

I can see how multi-dim arrays would be troublesome with such a scheme.

My own approach is to use a bunch of parallel arrays (a table of structs 
didn't appeal). They are populated with the standard types at the lower 
end, then new types can be added.

A C type is represented by an index into those arrays. One of those 
arrays is this (not C):

  [0:maxtype]int	tttarget              # All names start with 'tt'

For a pointer, it is contains the index of the target type. For arrays, 
it is the index of the element type. There is no restriction on nesting, 
other than 'maxtype' (set to some large number; one day I'll make the 
limit flexible).


> One downside as-is, is that if a given variable is assigned more than 
> 4096 times in a given function, it can no longer be given a unique ID. 
> Though uncommon, this is not entirely implausible (with sufficiently 
> large functions), and there isn't currently any good way to deal with 
> this (apart from raising a compiler error).


One of my compiler benchmarks is this program:

    #include <stdio.h>

    int main(void) {
        int a, b=2, c=3, d=4;

        a=b+c*d;

        printf("%d\n", a);
    }

Except that the 'a=b+c*d;' line is repeated 1M or 2M times, usually 
within an included file.

Some compilers (including for other languages) find this challenging.

> 
> 
> But, as noted, the 3AC IR only exists in memory.
> 
> In the IR, the operations are expressed as a sort of linear bytecode 
> operating on a virtual stack machine; with types expressed as ASCII 
> strings.
> 
> Logically, the stack holds "ccxl_register" values, and the number of 3AC 
> ops is typically less than the number of stack-machine operations (those 
> which exist simply to shuffle registers around essentially disappear in 
> the translation process).
> 
> Say, for example:
>    LOAD x
>    LOAD y
>    ADD
>    STORE z
> Might turn into a single 3AC operation.

I've used 3AC as an IL (several attempts), and also a stack VM. The 3AC 
always looked so simple, but it also was a lot of hard work to turn it 
into decent register-based code.

Now I used a stack-based IL which is far easier to work with.

I wouldn't start off with turning stack code into 3AC! (SSA is supposed 
to be /the/ way to generate code in compilers; they can keep it.)

Those 4 stack instructions might turn into 3/4 machine instructions on 
x64 (perhaps fewere is register-resident). But if converting them to one 
3AC instruction, you then have to expand them again.

Perhaps 3AC suits another architecture better (I think ARM uses 
3-register ops; x64 is 2-register ops).