Article <v6d5k0$6rk5$1@dont-email.me>

Deutsch English Français Italiano
<v6d5k0$6rk5$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!3.eu.feeder.erje.net!feeder.erje.net!usenet.goja.nl.eu.org!weretis.net!feeder8.news.weretis.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: BGB <cr88192@gmail.com>
Newsgroups: comp.lang.c
Subject: =?UTF-8?Q?Re=3A_technology_discussion_=E2=86=92_does_the_world_need?=
 =?UTF-8?B?IGEgIm5ldyIgQyA/?=
Date: Sat, 6 Jul 2024 23:28:36 -0500
Organization: A noiseless patient Spider
Lines: 253
Message-ID: <v6d5k0$6rk5$1@dont-email.me>
References: <v66eci$2qeee$1@dont-email.me> <v67gt1$2vq6a$2@dont-email.me>
 <v687h2$36i6p$1@dont-email.me> <871q48w98e.fsf@nosuchdomain.example.com>
 <v68dsm$37sg2$1@dont-email.me> <87plrsultu.fsf@bsb.me.uk>
 <v68sft$3a6lh$1@dont-email.me> <87ed87v4wi.fsf@bsb.me.uk>
 <v6adrm$3ljg6$1@dont-email.me> <87v81ita77.fsf@bsb.me.uk>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 07 Jul 2024 06:29:54 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="f7c16b2fffd87ab0346ebf4c8e11259c";
	logging-data="224901"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX19qS2SlIxyi1yihx/d2uNeHn0/n2ivVHxM="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:dqTfkEyXbEAvisg1mxY5Pxjh0ds=
In-Reply-To: <87v81ita77.fsf@bsb.me.uk>
Content-Language: en-US
Bytes: 10138

On 7/6/2024 5:41 PM, Ben Bacarisse wrote:
> BGB <cr88192@gmail.com> writes:
> 
>> On 7/5/2024 5:40 PM, Ben Bacarisse wrote:
>>> BGB <cr88192@gmail.com> writes:
>>>
>>>> On 7/5/2024 6:20 AM, Ben Bacarisse wrote:
>>>>> BGB <cr88192@gmail.com> writes:
> 
>>>>>> While eliminating structs could also simplify things; structs also tend to
>>>>>> be a lot more useful.
>>>>> Indeed.  And I'd have to use them for this!
>>>>>
>>>>
>>>> Errm, the strategy I would assume is, as noted:
>>>>     int a[4][4];
>>>>     ...
>>>>     l=a[j][k];
>>>> Becomes:
>>>>     int a[16];
>>>>     ...
>>>>     l=a[j*4+k];
>>> That's what you want to force me to write, but I can use and array of
>>> arrays despite your arbitrary ban on them by simply putting the array in
>>> a struct.
> ...
>> IN most contexts, I don't really see how a struct is preferable to a
>> multiply, but either way...
> 
> And I can't see how an array of arrays is harder for your compiler than
> an array of structs.  C's indexing requires the compiler to know that
> size of the items pointed to.
> 
> I suspect that there is something amiss with your design if you are
> considering this limiting in order to simplify the compiler.  A simple
> compiler should not care what kind of thing p points to in
> 
>    p[i]
> 
> only what size of object p points to.
> 


When I designed the compiler code, the initial approach for internal 
type layout was to bit-pack it into 32 bits, say (a lot of this is from 
memory, so maybe wrong):
Basic1
   (31:28): Layout of Type (0=Basic)
   (27:16): Array Size
   (15:12): Pointer Level Count
   (11: 0): Base Type
Basic2
   (31:28): Layout of Type (1=Basic2)
   (27: 8): Array Size
   ( 7: 6): Pointer Level Count
   ( 5: 0): Base Type
Basic3
   (31:28): Layout of Type (2=Basic3)
   (27:24): Array Size
   (23:20): Pointer Level Count
   (19: 0): Base Type
Overflow
   (31:28): Layout of Type (3=Overflow)
   (27:24): MBZ
   (23: 0): Index into Type-Overflow Table
And, a few other cases...


Basic1 was the default, able to express arrays from 0..4095 elements, 
with 0..7 levels of pointer indirection, and 0..4095 for the base type.
  Where, 0=T, 1=T*, 2=T**, ..., 7=T*******
    8=T[], 9=T[][], A=T*[], B=T*[*], C=&T, ...

Note that at present, there is no way to express more than 7 levels of 
pointer indirection, but this issue hasn't come up in practice.


Basic2 is for big arrays of a primitive type, 0..3 pointer levels. May 
only encode low-numbered primitive types.

Basic3 is the opposite, able to express a wider range of types, but only 
small arrays.

There is another variant of Basic1 that splits the Array Size field in 
half, with a smaller array limit, but able to encode 
const/volatile/restrict/etc (but only in certain combinations).


Overflow would be used if the type couldn't fit into one of the above, 
the type is then expressed in a table. It is avoided when possible, as 
overflow entry tables are comparably expensive.




Type Numbering space:
      0..     63: Primitive Types, Higher priority
     64..    255: Primitive Types, Lower priority
   256 ..   4095: Complex Types, Index into Literals Table
   4096..1048575: Complex Types, Index into Literals Table

Small numbered base types were higher priority:
   00=Int,        01=Long(64bit), 02=Float,            03=Double,
   04=Ptr(void*), 05=Void,        06=Struct(Abstract), 07=NativeLong
   08=SByte,      09=UByte,       0A=Short,            0B=UShort,
   0C=UInt,       0D=ULong,       0E=UNativeLong,      0F=ImplicitInt
Followed by, say:
   10=Int128, 11=UInt128, 12=Float128/LongDouble, 13=Float16,
   ...


Where, Type Number 256 would map to index 0 in the Literal Table.

An index into the literals table will generally be used to encode a 
Struct or Function Pointer or similar. This table will hold a structure 
describing the fields of a struct, or the arguments and return value of 
a function pointer (in my BS2 language, it may also define class 
members, a superclass, implemented interfaces, ...).


It could also be used to encode another type, which was needed for 
things like multidimensional arrays and some other complex types. But, 
this seemed like an ugly hack... (And was at odds with how I imagined 
types working, but seemed like a technical necessity).


These would often be packed inside of a 64-bit register/variable descriptor.

Local Variables:
   (63:56): Descriptor Type
   (55:24): Variable Type
   (23:12): Sequence Number
   (11: 0): Identity Number
Global Variables:
   (63:56): Descriptor Type
   (55:24): Variable Type
   (23: 0): Index into Global Table
Integer Literal:
   (63:56): Descriptor Type
   (55:32): Compact Type
   (31: 0): Value
String Literal:
   (63:56): Descriptor Type
   (55:32): Compact Type
   (31: 0): Offset into String Table

There were various other types, representing larger integer and floating 
point types:
   Long and Double literals, representing the value as 56 bits
     Low 8 bits cut off for Double

An index into a table of raw 64-bit values (if it can't be expressed 
directly as one of the other options).

Values for 128-bit types were expressed as an index pair into the table 
of 64-bit values:
   (63:56): Descriptor Type
   (55:48): Primitive Type
   (47:24): Index into Value Table (High 64 bits)
   (23: 0): Index into Value Table (Low 64 bits)


One downside as-is, is that if a given variable is assigned more than 
4096 times in a given function, it can no longer be given a unique ID. 
Though uncommon, this is not entirely implausible (with sufficiently 
large functions), and there isn't currently any good way to deal with 
this (apart from raising a compiler error).

This can happen potentially in large functions. Taking away bits from 
the base ID isn't good either, as functions pushing 1000+ local 
variables aren't entirely implausible either (though, thus far, not 
really seen any with much more than around 400 local variables, but 
still...).

========== REMAINDER OF ARTICLE TRUNCATED ==========