Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: BGB Newsgroups: comp.lang.c Subject: =?UTF-8?Q?Re=3A_technology_discussion_=E2=86=92_does_the_world_need?= =?UTF-8?B?IGEgIm5ldyIgQyA/?= Date: Fri, 5 Jul 2024 08:28:19 -0500 Organization: A noiseless patient Spider Lines: 173 Message-ID: References: <871q48w98e.fsf@nosuchdomain.example.com> <87plrsultu.fsf@bsb.me.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Date: Fri, 05 Jul 2024 15:29:34 +0200 (CEST) Injection-Info: dont-email.me; posting-host="39c124d3e74403081112702eedb03ac8"; logging-data="3480241"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+mN9rs4mBdXoKvC7yYeOplI5bnSrInTXY=" User-Agent: Mozilla Thunderbird Cancel-Lock: sha1:CUTEJMlhowZN0ULSoMnwwy3QiEE= In-Reply-To: <87plrsultu.fsf@bsb.me.uk> Content-Language: en-US Bytes: 7786 On 7/5/2024 6:20 AM, Ben Bacarisse wrote: > BGB writes: > >> On 7/5/2024 3:09 AM, Keith Thompson wrote: >>> BGB writes: >>>> On 7/4/2024 8:05 PM, Lawrence D'Oliveiro wrote: >>>>> It’s called “Rust”. >>>> >>>> >>>> If anything, I suspect may make sense to go a different direction: >>>> Not to a bigger language, but to a more narrowly defined language. >>>> >>>> Basically, to try to distill what C does well, keeping its core >>>> essence intact. >>>> >>>> >>>> Goal would be to make it easier to get more consistent behavior across >>>> implementations, and also to make it simpler to implement (vs an >>>> actual C compiler); with a sub-goal to allow for implementing a >>>> compiler within a small memory footprint (as would be possible for K&R >>>> or C89). >>>> >>>> >>>> Say for example: >>>> Integer type sizes are defined; >>>> Nominally, integers are: >>>> Twos complement; >>>> Little endian; >>>> Wrap on overflow. >>>> Dropped features: >>>> VLAs >>>> Multidimensional arrays (*1) >>>> Bitfields >>>> ... >>>> Simplified declaration syntax (*2): >>>> {Modifier|Attribute}* TypeName Declarator >>>> >>>> >>>> *1: While not exactly that rare, and can be useful, it is debatable if >>>> they add enough to really justify their complexity and relative >>>> semantic fragility. If using pointers, one almost invariably needs to >>>> fall back to doing "arr[y*N+x]" or similar anyways, so it is arguable >>>> that it could make sense to drop them and have people always do their >>>> multidimensional indexing manually. >>>> >>>> Note that multidimensional indexing via multiple levels of pointer >>>> indirection would not be effected by this. >>> [...] >>> Multidimensional arrays in C are not a distinct language feature. >>> They are simply arrays of arrays, and all operations on them follow >>> from operations on ordinary arrays and pointers. >>> Are you proposing (in this hypothetical new language) to add >>> an arbitrary restriction, so that arrays can have elements of >>> arithmetic, pointer, struct, etc. type, but not of array type? >>> I'm not sure I see the point. >> >> As-is, the multidimensional arrays require the compiler to realize that it >> needs to multiply one index by the product of all following indices. >> >> So, say: >> int a[4][4]; >> int j, k, l; >> l=a[j][k]; >> >> Essentially needs to be internally translated to, say: >> l=a[j*4+k]; >> >> Eliminating multidimensional arrays eliminates the need for this >> translation logic, and the need to be able to represent this case in the >> typesystem handling logic (which is, as I see it, in some ways very >> different from what one needs for a struct). > > How can it be eliminated? All your plan does is force me to wrap the > inner array in a struct in order to get anything like the convenience of > the above: > > struct a { int a[4]; }; > > struct a a[4]; > l = a[j].a[k]; > > Most compilers with generate the same arithmetic (indeed exactly the > same code) for this as for the more convenient from that you don't like. > > All you can do to eliminate this code generation is to make it so hard > to re-write the convenient code you dislike. (And yes, I used the same > name all over the place because you are forcing me to.) > It is not so much a dislike of multidimensional arrays as a concept, but rather, the hair they add to the compiler and typesystem. Granted, one would still have other complex types, like structs and function pointers, so potentially the complexity savings would be limited. >> While eliminating structs could also simplify things; structs also tend to >> be a lot more useful. > > Indeed. And I'd have to use them for this! > Errm, the strategy I would assume is, as noted: int a[4][4]; ... l=a[j][k]; Becomes: int a[16]; ... l=a[j*4+k]; Much like what one would typically need to do anyways if the array was heap-allocated. Though, the major goal for this sort of thing is mostly to try to limit the complexity required to write a compiler (as opposed to programmer convenience). Like, for example, I had tried (but failed) to write a usable C compiler in less than 30k lines (and also ideally needing less than 4MB of RAM). But, if the language design is simplified some, this might be a little closer. Might still be doable, but a C compiler in 50-75k lines is much less impressive. Though, admittedly, my current C compiler (used for my project) currently weighs in at around 250k lines, and much of the code complexity is in the backend code generator. One would need to have some fairly weak code-generation to get this down (likely translating the IR to 3AC form, and then doing a pretty close to 1:1 mapping between 3AC operations and output code, possibly with no register allocator or other optimizations). But, even with this massive 250k line compiler, I am still running into edge cases which don't work correctly (as recently discovered after resuming the effort to port Quake3 to my ISA). Granted, this is along with needing to fix a bunch of bugs in my DLL loading and similar (Quake3 being a more complex engine than Doom or Quake1; in this case effectively requiring virtual memory and a DLL loader). And, discovering various other bugs in the runtime libraries and other things in the process (and, Quake3 builds and sorta works, but it seems the BSP doesn't load correctly and the player just sort of falls out the bottom of the incorrectly loaded map). As-is, also, for the main Quake3 EXE, my compiler ends up using about 270MB of RAM, which is a bit steep... A big chunk of the RAM in this case ends up being: Memory needed for the parser ASTs; Memory needed for the structures representing globals; Memory needed for the IR stages (mostly the 3AC IR, *1); ... *1: Where, as-is, pretty much every operation in the program ends up being represented as a struct. My compiler generally does code generation for an entire executable image at the same time. The AST is a lot bulkier, but generally the AST only needs enough memory to parse a single translation unit (and the AST nodes are reused between translation units). Though, yes, granted, porting something these games would still require a proper C compiler, so alas...