Article <vacasa$1bn7v$1@dont-email.me>

Deutsch English Français Italiano
<vacasa$1bn7v$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: BGB <cr88192@gmail.com>
Newsgroups: comp.arch
Subject: Re: Article on new mainframe use
Date: Sat, 24 Aug 2024 04:58:31 -0500
Organization: A noiseless patient Spider
Lines: 122
Message-ID: <vacasa$1bn7v$1@dont-email.me>
References: <v9iqko$h7vd$1@dont-email.me>
 <bb873f7f6a14f222f73abacd698e60eb@www.novabbs.org>
 <3f8sbj9chugcr6arbpck2t7nb0g87ff6ik@4ax.com>
 <f7fe11f84f9342f0a7e27d4a729aadad@www.novabbs.org>
 <li71t8Fs9jnU1@mid.individual.net> <v9mc57$15mm9$2@dont-email.me>
 <kmvubjdn7ub4bkgfhpj89c5vsl37vpp16d@4ax.com> <va88rq$ioap$1@dont-email.me>
 <va8sji$otgt$5@dont-email.me> <va9dt6$r72m$1@dont-email.me>
 <vac17h$1ab6s$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 24 Aug 2024 11:58:35 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="8f77949c20783bea9f997058f07a6a39";
	logging-data="1432831"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX19ff0mocaXr0sNvpeDxw/LC8i4J/FV3DXA="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:9HTezzgOGGPtc3ETT2EStPdfx24=
In-Reply-To: <vac17h$1ab6s$1@dont-email.me>
Content-Language: en-US
Bytes: 6013

On 8/24/2024 2:13 AM, Lawrence D'Oliveiro wrote:
> On Fri, 23 Aug 2024 02:31:46 -0500, BGB wrote:
> 
>> On 8/22/2024 9:36 PM, Lawrence D'Oliveiro wrote:
>>
>>> On Thu, 22 Aug 2024 15:59:34 -0500, BGB wrote:
>>>
>>>> Underused: It is a sensible way of structuring data, but needing to
>>>> interface with it though SQL strings is awkward and inefficient.
>>>
>>> Actually, that works fine in any language with decent string handling.
>>>
>> String processing adds bulk and inefficiency.
>> Granted, maybe not enough to matter in the face of a typical database
>> query.
> 
> Remember what Knuth (or maybe it was Hoare) said: “Premature optimization
> is the root of all evil”. String processing has to be in the language for
> other reasons (think: composing messages and processing input when
> interacting with a human user), why not use it for this?
> 
> Also, it is quite common now to use text-based protocols for networks
> (e.g. messages in JSON format). That may sound inefficient, but it eases
> debugging so much, that has become a major consideration.
> 

I think that saying is also over applied.


For example, in my recent poking around with fonts, I got around a 30x 
reduction by converting from UFO / GLIF to a custom format.

Where, say, the GLIF fonts were typically several MB, and the converted 
fonts were ~ 80-200K.

Main source of difference:
   The GLIF files are basically raw XML.

If you take a line like:
       <point x="307" y="-110" type="curve" smooth="yes"/>
And reduce it to 4 bytes, this saves some bytes...



Premature optimization is more a problem when applied to contexts where 
one is only likely to save maybe 5% or 10%.

Or, potentially misguided things like trying to write everything in 
assembler rather than writing stuff in C, or in contexts where it is 
unlikely to effect overall performance.


As far as I am concerned, it is misapplication in cases where there is 
likely to be a significant or drastic cost difference.



>> But, I was left thinking, some programs use SQLite, which exists as a
>> single giant C file. I guess it technically works, but has the downside
>> of adding something like IIRC around 900K or so to the size of the
>> binaries' ".text" section.
> 
> SQLite is so resource-light, it’s the world’s most popular DBMS. You
> almost certainly have a copy literally at your fingertips right now, in
> your mobile phone.


My cellphone also has 32x as much RAM and 46x faster clock speed than 
the BJX2 core running on the FPGA boards I have...

A cellphone is not a good metric of "lightweight" by the factors I am 
considering...


I am not inclined to use it directly because (if added to TestKern) it 
would roughly triple the size of the kernel.

It also weighs in at several times that of the Doom engine (in code 
size), and even for a small database may use multiple MB of RAM (vs, 
say, having a database engine that operates in KB territory).

I don't really consider this to be particularly lightweight.



OTOH:
Decided to poke at it, was able to implement a simplified "vaguely JPEG 
like" image format in around 500 lines for the decoder (or, around 1/4 
the size of my past JPEG decoder).

Ended up going with:
   AdRice+STF for the entropy coder;
     This simplified the encoder by allowing a single-pass design.
   Block Haar Transform
     Vaguely similar to Walsh Hadamard Transform,
     but differs in the specifics.
   Colorspace: Y=(4G+3R+B)/8, U=B-Y, V=R-Y
   Fixed-layout macroblocks with 4:2:0 chroma subsampling.
   A superficially similar scheme for encoding blocks of coefficients.
   ...

Interestingly, Q/bpp doesn't seem to be that far off from JPEG.

Also does not need big intermediate arrays or buffers (apart from the 
input and output buffers, needs less than 1K of working intermediate 
memory; most of this going into the entropy coding and quantization tables).

AdRice+STF is basically dynamically a adaptive Golumb-Rice coding 
scheme, with a Swap-Towards-Front symbol transform. This allows it to 
minic something like Huffman.

It is not usually as good as static Huffman (with a 13-bit symbol length 
limit) in terms of speed or compression, but has some other properties 
(smaller memory use; dynamically adaptive; small/simple code).

Also, unlike biwise range-coding, or FGK / Vitter (Adaptive Huffman), is 
not horridly slow...

But, seemingly, I am not aware of anyone else making much use combining 
symbol-swapping with Rice coding, but I have often tended to have 
decently good results with it.