Article <va88rq$ioap$1@dont-email.me>

Deutsch English Français Italiano
<va88rq$ioap$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!weretis.net!feeder8.news.weretis.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: BGB <cr88192@gmail.com>
Newsgroups: comp.arch
Subject: Re: Article on new mainframe use
Date: Thu, 22 Aug 2024 15:59:34 -0500
Organization: A noiseless patient Spider
Lines: 197
Message-ID: <va88rq$ioap$1@dont-email.me>
References: <v9iqko$h7vd$1@dont-email.me>
 <bb873f7f6a14f222f73abacd698e60eb@www.novabbs.org>
 <3f8sbj9chugcr6arbpck2t7nb0g87ff6ik@4ax.com>
 <f7fe11f84f9342f0a7e27d4a729aadad@www.novabbs.org>
 <li71t8Fs9jnU1@mid.individual.net> <v9mc57$15mm9$2@dont-email.me>
 <kmvubjdn7ub4bkgfhpj89c5vsl37vpp16d@4ax.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 22 Aug 2024 22:59:39 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="03f30db01f41f36bfc164740fb32e254";
	logging-data="614745"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX19Hz+AGopBghO1thjL9fVAKhVm9pg4uA/g="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:mO7UhaXf92WJX4s7Be289JtwDow=
In-Reply-To: <kmvubjdn7ub4bkgfhpj89c5vsl37vpp16d@4ax.com>
Content-Language: en-US
Bytes: 9854

On 8/16/2024 12:43 PM, George Neuner wrote:
> On Fri, 16 Aug 2024 02:05:27 -0000 (UTC), Lawrence D'Oliveiro
> <ldo@nz.invalid> wrote:
> 
> 
>> The best way to interface to [relational] DBMS was to be able to generate SQL
>> strings on the fly; but this required some facility with manipulation of
>> dynamic, variable-length strings, which COBOL completely lacked. And so
>> special extensions were tacked on, just to cope with the generation of SQL
>> queries and templates.
> 
> You mean the *WORST* way.
> 
> Just about every SQL injection attack is made possible by programmers
> dynamically generating queries.  Most[1] attacks can be prevented
> simply by proper use of SQL parameters.
> 
> 
> There are only a few situations in which dynamic SQL actually is
> necessary - it is not possible to specify table or column names using
> parameters, so to reuse a query with a different table or column name
> does require generating new query text.
> 
> Some applications do have a need to do this - but in most cases the
> names to use will be known statically, will be predictable (e.g., date
> related), or, if necessary, can be discovered by querying the database
> schema - so they should not be provided by user input.
> 
> The only exception is to permit a user to create a new *custom* table
> type ... but there is little/no need for most applications to do this.
> Most applications that must create new tables at runtime know what
> names to use, and/or how to generate them, and do not need any input
> from a user to do so.
> 
> If creating custom table types with user specified names even is
> permitted by the application, it should be an operation reserved to
> privileged users [presumably who know what they are doing].
> 
> 
> ---
> [1] many RDBMS now directly support JSON and/or XML data, and it is
> possible via SQL parameters to inject false "path" information for
> working with these data types.  To guard against this the application
> itself has to be aware of the data layout.

Ironically, relational databases are one of those things that manages to 
be both underused and overused at the same time:

Underused: It is a sensible way of structuring data, but needing to 
interface with it though SQL strings is awkward and inefficient.


Overused: Probably 99% of the cases where a person resorts to using an 
RDBMS for something, it is massive overkill. Like, if an application 
expects someone to install MySQL or PostGres and then proceeds to use it 
for mostly trivial things, like application settings, that would be 
better served by INI/CFG files or the OS Registry, they are doing it wrong.

In most (non business) contexts, if someone is using an RDBMS for 
something, it is most often for holding application settings (well, 
except MineTest which made the absurd design choice of using the DBMS to 
store Minecraft-style terrain data; which is both absurd/overkill and a 
relatively inefficient way to store this type of data).


OTOH, seemingly people fail hard at coming up with sensible APIs to 
access database data.
Often it turns into one of:
Hey, compose SQL as strings and parse data from the response strings;
Awkwardly bolt the SQL query syntax onto the language, and then have a 
mechanism to parse the query responses (as some sort of weird outgrowth 
of the underlying language);

Or, do some weird OO thing where object methods exist for SQL keywords 
and each method returns an object that can be used to glue on the next 
keyword:
   res=db.select().from("FOOTAB").where().equals("NAME", "John").end();

Or, ...


I guess, possible alternative thoughts:
System where database is mapped to normal filesystem files, rather than 
some centralized DBMS;
Nominally, access is application local, exposed as a library facility;
Expressed in a way that "makes sense" for a language like C or similar, 
and likely abandoning most of the trappings of SQL syntax.

Possibly, it could make sense to also abandon the use of full static 
typing for the use of dynamically typed tagrefs, as ironically tagrefs 
could make the implementation simpler (and most of what "efficiency" 
would be lost by dynamic typing is likely to be dwarfed by the overhead 
of the system as a whole; and still more efficient than text parsing).

Though, the implementation could still enforce typing in some cases, 
such as if a column is defined as holding an integer, not allowing an 
"insert" to put a string or similar there.

The alternative being mostly to define it as a system for serializing C 
structs with the table layout mapped to the C struct, but then dealing 
with any table layout changes will be a pain (you would need to version 
the structs, and to automatically translate between them, which would be 
more of a pain than an implementation than just treating each row like a 
JavaScript style object and then letting the application deal with some 
of this).

Granted, one could still use C structs in the API but then have 
everything using dynamic types internally (and then try to dispel any 
complains that a dynamic-typed representation is somehow less efficient 
than trying to dynamically deal with a data layout resembling C 
structs... The moment the layout is not fixed at compile time, any 
claims of "efficiency" effectively go out the window).


But, maybe moot as it is likely to either still go unused, or used 
mostly for things like application settings.

....



I guess possible structure:
DB is organized into blocks, probably 1K to 4K, which will nominally 
reflect nodes in one or more B-Trees (along with data blocks);
Internally, each DB table will have its own B-Tree;
There will be an internal "table of tables" which would define the 
layout of each table and the root of each tables' tree, along with 
internal management metadata (such as a free blocks list).

Node structure (leaf nodes):
   Node Magic, Next/Prev/Up pointers, etc.
   A list of keys and offsets into the node, for each row/item;
     This will hold the primary key for the table.
   A data area for row data, and other data (like string contents).

For non-leaf nodes, likely only the index part would exist (except if 
the primary key is a string or similar), with the offsets instead giving 
node numbers.

Each row would consist of a set of key/value pairs, say:
   16-bit key, index into a symbol list
     All row and table names in the DB;
     May also encode the value's base type.
   32 or 64-bit value;

If one does: 4b type + 12b keyword, this should be sufficient as it is 
very unlikely there would be more than 4K unique table or column names.

Or, maybe:
   64-bit combined key/value, with 16-bit key+tag, and 48 bits value;
   Well, vs 80 or 96 bits per value;
Or, the key also allows for variable 32/64/128 bit payload:
   So, 48/80/144 bits per member.

Say, for example, 48 bits to hold an "Integer" member, etc.

Values that can't fit directly into the value field (such as strings) 
likely being expressed as size+offset fields to a data area stored at 
the end of the node.

The number of rows per node is variable, if an insert would result in a 
node over-filling, it is split.

There could be "data nodes" for items that are too large to be 
reasonably held directly in a node.

Say, for example, with a 1K node size, strings:
   <=8 or 16: Inline to Row
   <=64 bytes: Inline to Node
   65..511 bytes: Small Data Node (multiple items end-to-end per node);
   512+: Bulk data node (data is held within 1 or more nodes).
     Entire node designated to a single data item,
     or part of a larger item

========== REMAINDER OF ARTICLE TRUNCATED ==========