Article <v68dsm$37sg2$1@dont-email.me>

Deutsch English Français Italiano
<v68dsm$37sg2$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: BGB <cr88192@gmail.com>
Newsgroups: comp.lang.c
Subject: =?UTF-8?Q?Re=3A_technology_discussion_=E2=86=92_does_the_world_need?=
 =?UTF-8?B?IGEgIm5ldyIgQyA/?=
Date: Fri, 5 Jul 2024 04:19:07 -0500
Organization: A noiseless patient Spider
Lines: 200
Message-ID: <v68dsm$37sg2$1@dont-email.me>
References: <v66eci$2qeee$1@dont-email.me> <v67gt1$2vq6a$2@dont-email.me>
 <v687h2$36i6p$1@dont-email.me> <871q48w98e.fsf@nosuchdomain.example.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 05 Jul 2024 11:20:22 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="39c124d3e74403081112702eedb03ac8";
	logging-data="3404290"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX18xJERCsPfrMlOk+06yj40q+BzF6zXlafU="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:8RrzwFfryemF5IgPsO9keORLSTw=
In-Reply-To: <871q48w98e.fsf@nosuchdomain.example.com>
Content-Language: en-US
Bytes: 8332

On 7/5/2024 3:09 AM, Keith Thompson wrote:
> BGB <cr88192@gmail.com> writes:
>> On 7/4/2024 8:05 PM, Lawrence D'Oliveiro wrote:
>>> It’s called “Rust”.
>>
>>
>> If anything, I suspect may make sense to go a different direction:
>>    Not to a bigger language, but to a more narrowly defined language.
>>
>> Basically, to try to distill what C does well, keeping its core
>> essence intact.
>>
>>
>> Goal would be to make it easier to get more consistent behavior across
>> implementations, and also to make it simpler to implement (vs an
>> actual C compiler); with a sub-goal to allow for implementing a
>> compiler within a small memory footprint (as would be possible for K&R
>> or C89).
>>
>>
>> Say for example:
>>    Integer type sizes are defined;
>>    Nominally, integers are:
>>      Twos complement;
>>      Little endian;
>>      Wrap on overflow.
>>    Dropped features:
>>      VLAs
>>      Multidimensional arrays (*1)
>>      Bitfields
>>      ...
>>    Simplified declaration syntax (*2):
>>      {Modifier|Attribute}* TypeName Declarator
>>
>>
>> *1: While not exactly that rare, and can be useful, it is debatable if
>>   they add enough to really justify their complexity and relative
>>   semantic fragility. If using pointers, one almost invariably needs to
>>   fall back to doing "arr[y*N+x]" or similar anyways, so it is arguable
>>   that it could make sense to drop them and have people always do their
>>   multidimensional indexing manually.
>>
>> Note that multidimensional indexing via multiple levels of pointer
>> indirection would not be effected by this.
> [...]
> 
> Multidimensional arrays in C are not a distinct language feature.
> They are simply arrays of arrays, and all operations on them follow
> from operations on ordinary arrays and pointers.
> 
> Are you proposing (in this hypothetical new language) to add
> an arbitrary restriction, so that arrays can have elements of
> arithmetic, pointer, struct, etc. type, but not of array type?
> I'm not sure I see the point.
> 

As-is, the multidimensional arrays require the compiler to realize that 
it needs to multiply one index by the product of all following indices.

So, say:
   int a[4][4];
    int j, k, l;
    l=a[j][k];

Essentially needs to be internally translated to, say:
    l=a[j*4+k];

Eliminating multidimensional arrays eliminates the need for this 
translation logic, and the need to be able to represent this case in the 
typesystem handling logic (which is, as I see it, in some ways very 
different from what one needs for a struct).


While eliminating structs could also simplify things; structs also tend 
to be a lot more useful.


At least in BGBCC, internally types are represented as several logical 
fields:
   array size (usually 0 for non-arrays);
   (sometimes) a field holding an additional "modifier mode";
   pointer indirection level;
   ID number for primitive type or complex type/structure.

So, for part of the range, it is understood to represent one of the 
primitive types. Else it identifies a structure, which identifies the 
contents of a struct or function pointer or similar.


One can represent a multidimensional array by daisy-chaining the types 
(via a "typedef" entry whose sole purpose is to hold another type), but 
this is wonky. Though, C contains other occasionally used edge cases 
which require this.



Though, within the serialized IR stage, there is an ASCII notation, say:
   i             //int
   A4i           //int[4]
   A4,4i         //int[4][4]
   Pi            //int*
   PPi           //int**
   XSomeStruct;  //SomeStruct, given by name
   X123          //SomeStruct (given by index number)
   A4X123        //SomeStruct[4]
   P(xx)y        //unsigned long long (*)(long long, long long);
   ...
Where:
   a..z: Various primitive types.
   Cx, Gx: More primitive types.
Most capital letters represent structural features of the type.


Versus the compiler internally using a bit-packed format.


In the normal bit-packed form (with multiple sub-layouts existing), the 
various fields have a limited range. Going bigger turns into a case 
where the type is instead interpreted as an index into a table of 
"overflowed" types, which are represented as structures.

This is less desirable, as an entry in a table holding a dedicated 
struct is less desirable than a packed bit-pattern (but, logically, both 
can represent the same general information).



> I personally would hope that this language would *not* inherit C's
> odd treatment of arrays and pointers.  If so, and if it supports
> multidimensional arrays, they'd have to be defined differently than
> the way they're defined in C.
> 

The idea in this case was to make it so that:
   int[16];
Can be functionally identical to:
   int*
As far as most of the compiler's type-system handling is concerned. In 
this case, one only needs to care about the size when reserving space 
for the array.


One other simplification is also to make is so that:
   SomeStruct tfoo, tbar;
And:
   SomeStruct *pfoo, *pbar;

Are treated as equivalent at the lower levels, with the primary 
difference that:
   pbar=pfoo;
Will merely assign the pointer, whereas:
   tbar=tfoo;
Needs to reserve memory for the structs in the stack-frame or similar, 
and then effectively turns the assignment internally into:
   memcpy(&tbar, &tfoo, sizeof(SomeStruct));


The ABI in my case also typically passes (and returns) structs by 
reference (thus, at the low-level, the compiler and ABI can treat them 
as pointers). Though, struct return is a little wonky as one effectively 
passes in a hidden argument for where the structure is to be returned 
into (and trying to call a function that returns a struct by value with 
a missing prototype will very possibly result in a crash).

Though, functionally it is not that much different in this sense from 
either the Win64 x64 ABI, or the WinCE SH-4 ABI (from which part the 
core of my ABI design was derived). Though, can note that both the Win64 
x64 and WinCE SH-4 ABI are "oddly similar" in many areas.


Granted, arrays of structs still represent some hassle, as one needs to 
multiply the index by the size of the struct when offsetting the pointer.



This is in strong contrast, say, to the RISC-V ABI, which directly 
copies the structure contents around in memory to pass and return them.
========== REMAINDER OF ARTICLE TRUNCATED ==========