| Deutsch English Français Italiano |
|
<vuev2e$33r49$1@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail
From: BGB <cr88192@gmail.com>
Newsgroups: comp.arch
Subject: Re: Misc (semi OT): Well, distractions...
Date: Thu, 24 Apr 2025 22:17:52 -0500
Organization: A noiseless patient Spider
Lines: 233
Message-ID: <vuev2e$33r49$1@dont-email.me>
References: <vue615$2b0tt$1@dont-email.me>
<3e78dbf087fceab8acc676da77f7f36f@www.novabbs.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 25 Apr 2025 05:20:47 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="2abce02b99807a305f9260dba391c266";
logging-data="3271817"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/5GDhciWEhz+U7P7NMdebyNujCZ5h6lwM="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:boiz5zNh86IZDRKYIqT4dgEqjyw=
Content-Language: en-US
In-Reply-To: <3e78dbf087fceab8acc676da77f7f36f@www.novabbs.org>
On 4/24/2025 6:00 PM, MitchAlsup1 wrote:
> On Thu, 24 Apr 2025 20:10:29 +0000, BGB wrote:
>
> ----------
>>
>>
>> But, a recent line of fiddling has gone in an odd direction.
>> I felt a need for a basic script language for some tasks;
>
> I have been doing something similar--except I wrote my script
> translator in eXcel.
>
Errm?...
> I use it to read *.h files and spit out *.c files that translate
> type mathfunction(type arguments) into a series of lines Brian's
> compiler interprets as transcendental instructions in My 66000
> ISA.
>
> So, code contains the prototype::
>
> extern type_r recognized_name( type_a1 name );
>
> and my scripter punts out::
>
> type_r recognized_spelling( type_a name )
> {
> register typoe_a __asm__("R1") = name;
> __asm__("instruction_spelling\t%4,%4" : "=r" (R1) : "r" (r1) );
> return R1;
> }
>
> So when user codes (in visibility of math.h)
>
> y = sinpi( x );
>
> compiler spits out:
>
> SINPI Ry,Rx
OK.
In my case, the script interpreter is written in C.
Code needed to get the core interpreter working: Around 1000 lines;
Code needed after adding more stuff, around 1500 lines.
Though, not counting the ~ 600 lines needed mostly for the dynamic
type-system and similar.
A vaguely similar design was implemented inside the TestKern shell, but
was written to make use of BGBCC extensions. For this case, needed to
write something that would also work in MSVC and GCC. Core design for
the dialect was still similar though (and chose BASIC as a base partly
becuase I already knew I could get something that was fairly usable with
comparably small code).
Language sort of looks like:
//comment, contents entirely ignored by parser
rem stuff //also comment, but subject to token rules
x=a+b //basic assignment
let x=a+b //also assignment, creates vars in global scope
temp y=a+b //similar to let, but dynamically scoped
x=a*b+c*d //does compound statements with a normal-ish precedence.
x=12345 //integer literal, decimal
x=0x1234 //hexadecimal
x="string" //string, uses C style escapes
dim a(128) //creates a global array
if x<10 goto label
label:
goto label //goto
gosub label //subroutine call to label
return //return from most recent gosub
end //script terminates
print stuff //print stuff to console
x=arr(i) //load from array
arr(i)=x //store to array
Atypical stuff:
Dynamically typed;
Traditional BASIC used suffixes to encode type.
With no-suffix typically for a default numeric type.
QBasic and Visual Basic using static types.
Dynamically scoped;
Like Emacs Lisp.
Callee can see variables in the caller;
Variables can be created that do not effect caller.
...
Atypical syntax:
x = gosub label a=3, b=4 //gosub with return values and parameters.
return expr //return with expression
v=(vec 1,2,3) //vector type
m=(vec (vec 1,0,0),(vec 0,1,0),(vec 0,0,1)) //poor man's matrix
...
Precendence:
Literal values;
Unary operators (+, -, !, ~)
*, /, %
+, -
&, |, ^
<<, >>
== (=), !=, <, >, <=, =>
&&, ||
No assignment or comma operators; assignment is a statement.
Precedence rules differ here from C.
Unlike a C style tokenizer, any combination of operator symbols will be
parsed as a single operator, regardless of whether or not such an
operator exists (this shaves a big chunk of code off the tokenizer logic).
For now, the language lacks any ability to define proper functions
in-language, so the only functions that exist are built-in.
For the first time in a very long time, this interpreter has an "eval"
command in the console. Though, one needs to use parenthesis to eval an
expression as (unlike JS or similar) statements and expressions are
different and one may not have an implicit expression in a statement
context. For my first major script language (JS based), there was an
eval. Howerver, with the design of my later BS2 language, eval was no
longer viable.
Where, there is a split between design choices that make sense for a
light-duty script language, and one meant for "serious work" (more
features, better performance, etc). Sometimes, one might climb the
ladder of the language being better for implementation tasks, while
ignoring things that are useful for light-duty scripting tasks (trying
to make a language that does both but maybe ultimately does neither task
particularly well).
So, say, the fate that befell my original BGBScript language, was that
the VM became increasingly heavyweight (more code, more complex, ...)
and less well suited for implementation tasks (as it tried to take on
work that might have otherwise been left to C). BGBScript2 had
essentially turned into a Java like language, not as good at
implementing stuff as "just write everything in C", yet no longer great
for scripting either (namely, Java-style code structure is not
particularly amendable to interactive use of "eval"; nor is a
statically-typed language particularly amendable to "hot patching" live
code, etc...).
Like, when a scripting VM expands to 300 or 500 kLOC, using it for
scripting a project is no longer as attractive of an option. A partial
fork of this VM still survives though, I just now call it "BGBCC" and am
using it mostly as a C compiler for my custom ISA project.
Though, from what I can see, modern JavaScript seems headed down a
similar path.
A similar issue seems common in many long lived script VM projects. They
get faster and more powerful, all while loosing the properties that made
them useful for their original use cases.
Granted, the other option is to effectively "roll the clock backwards",
and revert a language to a simpler form.
Judging by the past, could probably do another JavaScript style VM in
around 10k LOC or so. Maybe less if the design priority is keeping code
small. Besides the block structuring, there are "gotcha" things like
break/continue handling that one needs to deal with. Naive AST-walking
interpreters don't deal well with non-local control transfers (like
break/continue/goto).
So, say, if the minimum becomes:
Parse language to AST;
Flatten AST into some sort of linear IR;
Interpret linear IR.
Then this would set a lower limit on the size of the interpreter.
Well, and giving up on 'break' and 'continue' wouldn't be great for
usability. Then again, maybe there could be a "break/continue" flag,
========== REMAINDER OF ARTICLE TRUNCATED ==========