Article <v687h2$36i6p$1@dont-email.me>

Deutsch English Français Italiano
<v687h2$36i6p$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: BGB <cr88192@gmail.com>
Newsgroups: comp.lang.c
Subject: =?UTF-8?Q?Re=3A_technology_discussion_=E2=86=92_does_the_world_need?=
 =?UTF-8?B?IGEgIm5ldyIgQyA/?=
Date: Fri, 5 Jul 2024 02:30:34 -0500
Organization: A noiseless patient Spider
Lines: 257
Message-ID: <v687h2$36i6p$1@dont-email.me>
References: <v66eci$2qeee$1@dont-email.me> <v67gt1$2vq6a$2@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 05 Jul 2024 09:31:47 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="39c124d3e74403081112702eedb03ac8";
	logging-data="3360985"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX18qQJnld5NiGvEYltQ17X+zQNnzGNcK/rw="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:fMMzIjGElBZBJD+Id1CdUql3dBY=
In-Reply-To: <v67gt1$2vq6a$2@dont-email.me>
Content-Language: en-US
Bytes: 11007

On 7/4/2024 8:05 PM, Lawrence D'Oliveiro wrote:
> It’s called “Rust”.


If anything, I suspect may make sense to go a different direction:
   Not to a bigger language, but to a more narrowly defined language.

Basically, to try to distill what C does well, keeping its core essence 
intact.


Goal would be to make it easier to get more consistent behavior across 
implementations, and also to make it simpler to implement (vs an actual 
C compiler); with a sub-goal to allow for implementing a compiler within 
a small memory footprint (as would be possible for K&R or C89).


Say for example:
   Integer type sizes are defined;
   Nominally, integers are:
     Twos complement;
     Little endian;
     Wrap on overflow.
   Dropped features:
     VLAs
     Multidimensional arrays (*1)
     Bitfields
     ...
   Simplified declaration syntax (*2):
     {Modifier|Attribute}* TypeName Declarator


*1: While not exactly that rare, and can be useful, it is debatable if 
they add enough to really justify their complexity and relative semantic 
fragility. If using pointers, one almost invariably needs to fall back 
to doing "arr[y*N+x]" or similar anyways, so it is arguable that it 
could make sense to drop them and have people always do their 
multidimensional indexing manually.

Note that multidimensional indexing via multiple levels of pointer 
indirection would not be effected by this.


*2: This can be used to both make parsing easier, and also make parsing 
faster, as it can eliminate needing to lookup symbols to see if they 
were a previously defined typedef or similar (which in effect in C ends 
up being needed for pretty much every non-keyword identifier encountered 
here; ideally you don't want to need to do it at all in the parsing stage).



Say, integer types:
   sbyte, byte/ubyte: 8 bits
   short, ushort: 16 bits
   int, uint: 32 bits
   long, ulong: 64 bits
   intNN/uintNN: Explicit sized types, may map to the above.

Unclear:
   char: 8-bit, unsigned
     Could go either way, signed is more traditional,
     but unsigned makes more logical sense here.
   wchar: 16-bit, unsigned

Arrays:
   Basic array types are always one dimensional;
   Type[] will alias with "Type*" in most contexts.

Would likely drop C's function pointer syntax, likely in favor of, say:
   typedef int fooFunc_t();  //declare a function type
   fooFunc_t *fptr;  //actual function pointer

Similarly, structs may not be declared at the point of use, but only as 
types.

struct FooStruct {
   int x, y;
}

FooStruct *fs;  //pointer to FooStruct

Where, declaring a struct will also behave as-if it had also been 
implicitly typedef'ed with the same name.


Struct semantics would be tweaked:
A by-value struct will behave as if it were pass-by-reference with 
copy-on-assignment (as is typically the case when structs are used as 
lvalues).


Would make some other restrictions:
Variable declarations are only allowed in the top-level block of a 
function, and (regardless of location) will always behave as if they 
were declared at the top of the function.
Initial values in a declaration may only be a constant expression or a 
reference to a global declaration (including for local variables).

Expressions like sizeof() and offsetof() will not (necessarily) be seen 
as constants (except if the value may be trivially determined). Note 
that these will also be only valid for type names, not for the type of 
an arbitrary expression.

Note that only certain expressions (such as variable assignments or 
function calls) will be allowed in statement context (most other 
expressions would not be allowed).

....


So, for example:
   int Foo(int x, int y)
   {
     BAD:
       int z=x/y;  //not allowed, not constant
     OK:
       int z;
       z=x/y;
     if(z>10)
     {
        BAD:
          int w;  //declaration is not allowed here
        z+=3;  //OK
        z*4;   //BAD, expression not allowed as statement
     }
   }

Maybe:
   Pointers may be allowed to be bounds-checked;
   But, casts between pointer and integer types will be restricted.
     An implementation will be allowed to disallow this.
     Granted, this would disallow traditional forms of pointer tagging.

An implementation may instead provide optional intrinsics for working 
with pointer tagging (in place of raw casts and bit-twiddling). Though, 
this would mean one would either need a runtime that is aware of 
type-tagging, or allow for implementations which forbid pointer tagging 
entirely (likely requiring a fallback to other strategies, such as boxed 
values).

Though, in this case, requiring the runtime to be a little more clever 
is an easier sell than trying to deal with it in the compiler.

....


Will also add a restriction to break and continue:
They will only be valid within the body of a loop, or within an if/else 
block within the loop. Nearly any other constructs (such as another loop 
or a "switch()" will entirely hide the visibility of the outer break or 
continue).

....


Possible functional difference:
Will use explicit module importing rather than headers.

Modules will be parsed top-to-bottom, with the ability to see into any 
imported modules. Each module will only be exported once, with a logical 
declaration order based on a DAG walk.
Preprocessing defines/macros would not carry across module boundaries.

Modules would function in a way partway between headers and static 
libraries, likely being built in advance, but pulled into the compiler 
stage (likely with a manifest defining any types or global declarations 
within the module). Ideally, the goal would be to allow for 
implementation both with separate compilation (such as COFF or ELF 
objects; where likely the object code and manifest would exist 
separately) or with a bytecode IR (which would likely combine both into 
a single entity). Ideally, it should be possible to determine module 
dependency order without fully invoking the compiler (say, such that the 
logic for compiling each module, and scheduling the compilation of 
modules, can operate independently).

But, admittedly, I have had good results using a stack-machine IR in my 
compilers for things like static libraries, so leveraging similar 
technology could still make sense.
========== REMAINDER OF ARTICLE TRUNCATED ==========