Article <fm0d80pwde3d$.dlg@le.bref>

Deutsch English Français Italiano
<fm0d80pwde3d$.dlg@le.bref>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: BGB <cr88192@gmail.com>
Newsgroups: comp.lang.c
Subject: =?UTF-8?Q?Re=3A_technology_discussion_=E2=86=92_does_the_world_need?=
 =?UTF-8?B?IGEgIm5ldyIgQyA/?=
Date: Fri, 5 Jul 2024 08:28:19 -0500
Organization: A noiseless patient Spider
Lines: 173
Message-ID: <v68sft$3a6lh$1@dont-email.me>
References: <v66eci$2qeee$1@dont-email.me> <v67gt1$2vq6a$2@dont-email.me>
 <v687h2$36i6p$1@dont-email.me> <871q48w98e.fsf@nosuchdomain.example.com>
 <v68dsm$37sg2$1@dont-email.me> <87plrsultu.fsf@bsb.me.uk>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 05 Jul 2024 15:29:34 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="39c124d3e74403081112702eedb03ac8";
	logging-data="3480241"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1+mN9rs4mBdXoKvC7yYeOplI5bnSrInTXY="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:CUTEJMlhowZN0ULSoMnwwy3QiEE=
In-Reply-To: <87plrsultu.fsf@bsb.me.uk>
Content-Language: en-US
Bytes: 7786

On 7/5/2024 6:20 AM, Ben Bacarisse wrote:
> BGB <cr88192@gmail.com> writes:
> 
>> On 7/5/2024 3:09 AM, Keith Thompson wrote:
>>> BGB <cr88192@gmail.com> writes:
>>>> On 7/4/2024 8:05 PM, Lawrence D'Oliveiro wrote:
>>>>> It’s called “Rust”.
>>>>
>>>>
>>>> If anything, I suspect may make sense to go a different direction:
>>>>     Not to a bigger language, but to a more narrowly defined language.
>>>>
>>>> Basically, to try to distill what C does well, keeping its core
>>>> essence intact.
>>>>
>>>>
>>>> Goal would be to make it easier to get more consistent behavior across
>>>> implementations, and also to make it simpler to implement (vs an
>>>> actual C compiler); with a sub-goal to allow for implementing a
>>>> compiler within a small memory footprint (as would be possible for K&R
>>>> or C89).
>>>>
>>>>
>>>> Say for example:
>>>>     Integer type sizes are defined;
>>>>     Nominally, integers are:
>>>>       Twos complement;
>>>>       Little endian;
>>>>       Wrap on overflow.
>>>>     Dropped features:
>>>>       VLAs
>>>>       Multidimensional arrays (*1)
>>>>       Bitfields
>>>>       ...
>>>>     Simplified declaration syntax (*2):
>>>>       {Modifier|Attribute}* TypeName Declarator
>>>>
>>>>
>>>> *1: While not exactly that rare, and can be useful, it is debatable if
>>>>    they add enough to really justify their complexity and relative
>>>>    semantic fragility. If using pointers, one almost invariably needs to
>>>>    fall back to doing "arr[y*N+x]" or similar anyways, so it is arguable
>>>>    that it could make sense to drop them and have people always do their
>>>>    multidimensional indexing manually.
>>>>
>>>> Note that multidimensional indexing via multiple levels of pointer
>>>> indirection would not be effected by this.
>>> [...]
>>> Multidimensional arrays in C are not a distinct language feature.
>>> They are simply arrays of arrays, and all operations on them follow
>>> from operations on ordinary arrays and pointers.
>>> Are you proposing (in this hypothetical new language) to add
>>> an arbitrary restriction, so that arrays can have elements of
>>> arithmetic, pointer, struct, etc. type, but not of array type?
>>> I'm not sure I see the point.
>>
>> As-is, the multidimensional arrays require the compiler to realize that it
>> needs to multiply one index by the product of all following indices.
>>
>> So, say:
>>    int a[4][4];
>>     int j, k, l;
>>     l=a[j][k];
>>
>> Essentially needs to be internally translated to, say:
>>     l=a[j*4+k];
>>
>> Eliminating multidimensional arrays eliminates the need for this
>> translation logic, and the need to be able to represent this case in the
>> typesystem handling logic (which is, as I see it, in some ways very
>> different from what one needs for a struct).
> 
> How can it be eliminated?  All your plan does is force me to wrap the
> inner array in a struct in order to get anything like the convenience of
> the above:
> 
> struct a { int a[4]; };
> 
> struct a a[4];
> l = a[j].a[k];
> 
> Most compilers with generate the same arithmetic (indeed exactly the
> same code) for this as for the more convenient from that you don't like.
> 
> All you can do to eliminate this code generation is to make it so hard
> to re-write the convenient code you dislike.  (And yes, I used the same
> name all over the place because you are forcing me to.)
> 

It is not so much a dislike of multidimensional arrays as a concept, but 
rather, the hair they add to the compiler and typesystem.

Granted, one would still have other complex types, like structs and 
function pointers, so potentially the complexity savings would be limited.



>> While eliminating structs could also simplify things; structs also tend to
>> be a lot more useful.
> 
> Indeed.  And I'd have to use them for this!
> 

Errm, the strategy I would assume is, as noted:
   int a[4][4];
   ...
   l=a[j][k];
Becomes:
   int a[16];
   ...
   l=a[j*4+k];

Much like what one would typically need to do anyways if the array was 
heap-allocated.



Though, the major goal for this sort of thing is mostly to try to limit 
the complexity required to write a compiler (as opposed to programmer 
convenience).


Like, for example, I had tried (but failed) to write a usable C compiler 
in less than 30k lines (and also ideally needing less than 4MB of RAM). 
But, if the language design is simplified some, this might be a little 
closer. Might still be doable, but a C compiler in 50-75k lines is much 
less impressive.


Though, admittedly, my current C compiler (used for my project) 
currently weighs in at around 250k lines, and much of the code 
complexity is in the backend code generator. One would need to have some 
fairly weak code-generation to get this down (likely translating the IR 
to 3AC form, and then doing a pretty close to 1:1 mapping between 3AC 
operations and output code, possibly with no register allocator or other 
optimizations).

But, even with this massive 250k line compiler, I am still running into 
edge cases which don't work correctly (as recently discovered after 
resuming the effort to port Quake3 to my ISA).

Granted, this is along with needing to fix a bunch of bugs in my DLL 
loading and similar (Quake3 being a more complex engine than Doom or 
Quake1; in this case effectively requiring virtual memory and a DLL 
loader). And, discovering various other bugs in the runtime libraries 
and other things in the process (and, Quake3 builds and sorta works, but 
it seems the BSP doesn't load correctly and the player just sort of 
falls out the bottom of the incorrectly loaded map).


As-is, also, for the main Quake3 EXE, my compiler ends up using about 
270MB of RAM, which is a bit steep...

A big chunk of the RAM in this case ends up being:
   Memory needed for the parser ASTs;
   Memory needed for the structures representing globals;
   Memory needed for the IR stages (mostly the 3AC IR, *1);

   ...

*1: Where, as-is, pretty much every operation in the program ends up 
being represented as a struct.

My compiler generally does code generation for an entire executable 
image at the same time. The AST is a lot bulkier, but generally the AST 
only needs enough memory to parse a single translation unit (and the AST 
nodes are reused between translation units).


Though, yes, granted, porting something these games would still require 
a proper C compiler, so alas...