Deutsch English Français Italiano |
<v687h2$36i6p$1@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: BGB <cr88192@gmail.com> Newsgroups: comp.lang.c Subject: =?UTF-8?Q?Re=3A_technology_discussion_=E2=86=92_does_the_world_need?= =?UTF-8?B?IGEgIm5ldyIgQyA/?= Date: Fri, 5 Jul 2024 02:30:34 -0500 Organization: A noiseless patient Spider Lines: 257 Message-ID: <v687h2$36i6p$1@dont-email.me> References: <v66eci$2qeee$1@dont-email.me> <v67gt1$2vq6a$2@dont-email.me> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Date: Fri, 05 Jul 2024 09:31:47 +0200 (CEST) Injection-Info: dont-email.me; posting-host="39c124d3e74403081112702eedb03ac8"; logging-data="3360985"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18qQJnld5NiGvEYltQ17X+zQNnzGNcK/rw=" User-Agent: Mozilla Thunderbird Cancel-Lock: sha1:fMMzIjGElBZBJD+Id1CdUql3dBY= In-Reply-To: <v67gt1$2vq6a$2@dont-email.me> Content-Language: en-US Bytes: 11007 On 7/4/2024 8:05 PM, Lawrence D'Oliveiro wrote: > It’s called “Rust”. If anything, I suspect may make sense to go a different direction: Not to a bigger language, but to a more narrowly defined language. Basically, to try to distill what C does well, keeping its core essence intact. Goal would be to make it easier to get more consistent behavior across implementations, and also to make it simpler to implement (vs an actual C compiler); with a sub-goal to allow for implementing a compiler within a small memory footprint (as would be possible for K&R or C89). Say for example: Integer type sizes are defined; Nominally, integers are: Twos complement; Little endian; Wrap on overflow. Dropped features: VLAs Multidimensional arrays (*1) Bitfields ... Simplified declaration syntax (*2): {Modifier|Attribute}* TypeName Declarator *1: While not exactly that rare, and can be useful, it is debatable if they add enough to really justify their complexity and relative semantic fragility. If using pointers, one almost invariably needs to fall back to doing "arr[y*N+x]" or similar anyways, so it is arguable that it could make sense to drop them and have people always do their multidimensional indexing manually. Note that multidimensional indexing via multiple levels of pointer indirection would not be effected by this. *2: This can be used to both make parsing easier, and also make parsing faster, as it can eliminate needing to lookup symbols to see if they were a previously defined typedef or similar (which in effect in C ends up being needed for pretty much every non-keyword identifier encountered here; ideally you don't want to need to do it at all in the parsing stage). Say, integer types: sbyte, byte/ubyte: 8 bits short, ushort: 16 bits int, uint: 32 bits long, ulong: 64 bits intNN/uintNN: Explicit sized types, may map to the above. Unclear: char: 8-bit, unsigned Could go either way, signed is more traditional, but unsigned makes more logical sense here. wchar: 16-bit, unsigned Arrays: Basic array types are always one dimensional; Type[] will alias with "Type*" in most contexts. Would likely drop C's function pointer syntax, likely in favor of, say: typedef int fooFunc_t(); //declare a function type fooFunc_t *fptr; //actual function pointer Similarly, structs may not be declared at the point of use, but only as types. struct FooStruct { int x, y; } FooStruct *fs; //pointer to FooStruct Where, declaring a struct will also behave as-if it had also been implicitly typedef'ed with the same name. Struct semantics would be tweaked: A by-value struct will behave as if it were pass-by-reference with copy-on-assignment (as is typically the case when structs are used as lvalues). Would make some other restrictions: Variable declarations are only allowed in the top-level block of a function, and (regardless of location) will always behave as if they were declared at the top of the function. Initial values in a declaration may only be a constant expression or a reference to a global declaration (including for local variables). Expressions like sizeof() and offsetof() will not (necessarily) be seen as constants (except if the value may be trivially determined). Note that these will also be only valid for type names, not for the type of an arbitrary expression. Note that only certain expressions (such as variable assignments or function calls) will be allowed in statement context (most other expressions would not be allowed). .... So, for example: int Foo(int x, int y) { BAD: int z=x/y; //not allowed, not constant OK: int z; z=x/y; if(z>10) { BAD: int w; //declaration is not allowed here z+=3; //OK z*4; //BAD, expression not allowed as statement } } Maybe: Pointers may be allowed to be bounds-checked; But, casts between pointer and integer types will be restricted. An implementation will be allowed to disallow this. Granted, this would disallow traditional forms of pointer tagging. An implementation may instead provide optional intrinsics for working with pointer tagging (in place of raw casts and bit-twiddling). Though, this would mean one would either need a runtime that is aware of type-tagging, or allow for implementations which forbid pointer tagging entirely (likely requiring a fallback to other strategies, such as boxed values). Though, in this case, requiring the runtime to be a little more clever is an easier sell than trying to deal with it in the compiler. .... Will also add a restriction to break and continue: They will only be valid within the body of a loop, or within an if/else block within the loop. Nearly any other constructs (such as another loop or a "switch()" will entirely hide the visibility of the outer break or continue). .... Possible functional difference: Will use explicit module importing rather than headers. Modules will be parsed top-to-bottom, with the ability to see into any imported modules. Each module will only be exported once, with a logical declaration order based on a DAG walk. Preprocessing defines/macros would not carry across module boundaries. Modules would function in a way partway between headers and static libraries, likely being built in advance, but pulled into the compiler stage (likely with a manifest defining any types or global declarations within the module). Ideally, the goal would be to allow for implementation both with separate compilation (such as COFF or ELF objects; where likely the object code and manifest would exist separately) or with a bytecode IR (which would likely combine both into a single entity). Ideally, it should be possible to determine module dependency order without fully invoking the compiler (say, such that the logic for compiling each module, and scheduling the compilation of modules, can operate independently). But, admittedly, I have had good results using a stack-machine IR in my compilers for things like static libraries, so leveraging similar technology could still make sense. ========== REMAINDER OF ARTICLE TRUNCATED ==========