Deutsch   English   Français   Italiano  
<vuev2e$33r49$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail
From: BGB <cr88192@gmail.com>
Newsgroups: comp.arch
Subject: Re: Misc (semi OT): Well, distractions...
Date: Thu, 24 Apr 2025 22:17:52 -0500
Organization: A noiseless patient Spider
Lines: 233
Message-ID: <vuev2e$33r49$1@dont-email.me>
References: <vue615$2b0tt$1@dont-email.me>
 <3e78dbf087fceab8acc676da77f7f36f@www.novabbs.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 25 Apr 2025 05:20:47 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="2abce02b99807a305f9260dba391c266";
	logging-data="3271817"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1/5GDhciWEhz+U7P7NMdebyNujCZ5h6lwM="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:boiz5zNh86IZDRKYIqT4dgEqjyw=
Content-Language: en-US
In-Reply-To: <3e78dbf087fceab8acc676da77f7f36f@www.novabbs.org>

On 4/24/2025 6:00 PM, MitchAlsup1 wrote:
> On Thu, 24 Apr 2025 20:10:29 +0000, BGB wrote:
> 
> ----------
>>
>>
>> But, a recent line of fiddling has gone in an odd direction.
>>    I felt a need for a basic script language for some tasks;
> 
> I have been doing something similar--except I wrote my script
> translator in eXcel.
> 

Errm?...

> I use it to read *.h files and spit out *.c files that translate
> type mathfunction(type arguments) into a series of lines Brian's
> compiler interprets as transcendental instructions in My 66000
> ISA.
> 
> So, code contains the prototype::
> 
> extern type_r recognized_name( type_a1 name );
> 
> and my scripter punts out::
> 
> type_r recognized_spelling( type_a name )
> {
>      register typoe_a __asm__("R1") = name;
>      __asm__("instruction_spelling\t%4,%4" : "=r" (R1) : "r" (r1) );
>      return R1;
> }
> 
> So when user codes (in visibility of math.h)
> 
>      y = sinpi( x );
> 
> compiler spits out:
> 
>      SINPI    Ry,Rx

OK.



In my case, the script interpreter is written in C.
   Code needed to get the core interpreter working: Around 1000 lines;
   Code needed after adding more stuff, around 1500 lines.
Though, not counting the ~ 600 lines needed mostly for the dynamic 
type-system and similar.

A vaguely similar design was implemented inside the TestKern shell, but 
was written to make use of BGBCC extensions. For this case, needed to 
write something that would also work in MSVC and GCC. Core design for 
the dialect was still similar though (and chose BASIC as a base partly 
becuase I already knew I could get something that was fairly usable with 
comparably small code).


Language sort of looks like:
   //comment, contents entirely ignored by parser
   rem stuff  //also comment, but subject to token rules
   x=a+b      //basic assignment
   let x=a+b  //also assignment, creates vars in global scope
   temp y=a+b //similar to let, but dynamically scoped
   x=a*b+c*d  //does compound statements with a normal-ish precedence.
   x=12345    //integer literal, decimal
   x=0x1234   //hexadecimal
   x="string"  //string, uses C style escapes
   dim a(128)  //creates a global array
   if x<10 goto label
   label:
   goto label   //goto
   gosub label  //subroutine call to label
   return       //return from most recent gosub
   end          //script terminates
   print stuff  //print stuff to console
   x=arr(i)     //load from array
   arr(i)=x     //store to array

Atypical stuff:
   Dynamically typed;
     Traditional BASIC used suffixes to encode type.
     With no-suffix typically for a default numeric type.
     QBasic and Visual Basic using static types.
   Dynamically scoped;
     Like Emacs Lisp.
     Callee can see variables in the caller;
     Variables can be created that do not effect caller.
   ...

Atypical syntax:
   x = gosub label a=3, b=4  //gosub with return values and parameters.
   return expr    //return with expression
   v=(vec 1,2,3)  //vector type
   m=(vec (vec 1,0,0),(vec 0,1,0),(vec 0,0,1)) //poor man's matrix
   ...

Precendence:
   Literal values;
   Unary operators (+, -, !, ~)
   *, /, %
   +, -
   &, |, ^
   <<, >>
   == (=), !=, <, >, <=, =>
   &&, ||

No assignment or comma operators; assignment is a statement.
Precedence rules differ here from C.

Unlike a C style tokenizer, any combination of operator symbols will be 
parsed as a single operator, regardless of whether or not such an 
operator exists (this shaves a big chunk of code off the tokenizer logic).

For now, the language lacks any ability to define proper functions 
in-language, so the only functions that exist are built-in.


For the first time in a very long time, this interpreter has an "eval" 
command in the console. Though, one needs to use parenthesis to eval an 
expression as (unlike JS or similar) statements and expressions are 
different and one may not have an implicit expression in a statement 
context. For my first major script language (JS based), there was an 
eval. Howerver, with the design of my later BS2 language, eval was no 
longer viable.


Where, there is a split between design choices that make sense for a 
light-duty script language, and one meant for "serious work" (more 
features, better performance, etc). Sometimes, one might climb the 
ladder of the language being better for implementation tasks, while 
ignoring things that are useful for light-duty scripting tasks (trying 
to make a language that does both but maybe ultimately does neither task 
particularly well).

So, say, the fate that befell my original BGBScript language, was that 
the VM became increasingly heavyweight (more code, more complex, ...) 
and less well suited for implementation tasks (as it tried to take on 
work that might have otherwise been left to C). BGBScript2 had 
essentially turned into a Java like language, not as good at 
implementing stuff as "just write everything in C", yet no longer great 
for scripting either (namely, Java-style code structure is not 
particularly amendable to interactive use of "eval"; nor is a 
statically-typed language particularly amendable to "hot patching" live 
code, etc...).

Like, when a scripting VM expands to 300 or 500 kLOC, using it for 
scripting a project is no longer as attractive of an option. A partial 
fork of this VM still survives though, I just now call it "BGBCC" and am 
using it mostly as a C compiler for my custom ISA project.

Though, from what I can see, modern JavaScript seems headed down a 
similar path.

A similar issue seems common in many long lived script VM projects. They 
get faster and more powerful, all while loosing the properties that made 
them useful for their original use cases.



Granted, the other option is to effectively "roll the clock backwards", 
and revert a language to a simpler form.

Judging by the past, could probably do another JavaScript style VM in 
around 10k LOC or so. Maybe less if the design priority is keeping code 
small. Besides the block structuring, there are "gotcha" things like 
break/continue handling that one needs to deal with. Naive AST-walking 
interpreters don't deal well with non-local control transfers (like 
break/continue/goto).

So, say, if the minimum becomes:
   Parse language to AST;
   Flatten AST into some sort of linear IR;
   Interpret linear IR.
Then this would set a lower limit on the size of the interpreter.

Well, and giving up on 'break' and 'continue' wouldn't be great for 
usability. Then again, maybe there could be a "break/continue" flag, 
========== REMAINDER OF ARTICLE TRUNCATED ==========