Article <v1gel0$3npf$1@dont-email.me>

Warning: mysqli::__construct(): (HY000/1203): User howardkn already has more than 'max_user_connections' active connections in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\includes\artfuncs.php on line 21
Failed to connect to MySQL: (1203) User howardkn already has more than 'max_user_connections' active connections
Warning: mysqli::query(): Couldn't fetch mysqli in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\index.php on line 66
Article <v1gel0$3npf$1@dont-email.me>
Deutsch English Français Italiano
<v1gel0$3npf$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!weretis.net!feeder8.news.weretis.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: BGB <cr88192@gmail.com>
Newsgroups: comp.arch
Subject: Re: why bits, Byte Addressability And Beyond
Date: Wed, 8 May 2024 13:01:02 -0500
Organization: A noiseless patient Spider
Lines: 156
Message-ID: <v1gel0$3npf$1@dont-email.me>
References: <v0s17o$2okf4$2@dont-email.me> <v19f9u$2asct$1@dont-email.me>
 <v19goj$h9f$1@gal.iecc.com> <5r3i3j58je3e7q9j2lir1gd4ascsmumca2@4ax.com>
 <v1bgru$jo2$1@gal.iecc.com>
 <6d6fa399e0f5dd481125348fa56d8ef8@www.novabbs.org>
 <v1ci49$33ua6$1@dont-email.me> <20240507114742.00003e59@yahoo.com>
 <v1fqvb$3ushi$1@dont-email.me> <20240508153648.00005583@yahoo.com>
 <v1g83s$23im$1@dont-email.me> <v1gc0p$3306$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 08 May 2024 20:01:05 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="9114484c760b4ef2bce4f30078b7f8b6";
	logging-data="122671"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1/sWDQcqeCkGmnlyy+4VJpWwUTjB1mxhdU="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:vXrMzcRQjAa9Kk7q57qyjHqodQQ=
In-Reply-To: <v1gc0p$3306$1@dont-email.me>
Content-Language: en-US
Bytes: 7709

On 5/8/2024 12:16 PM, Terje Mathisen wrote:
> Stephen Fuld wrote:
>> Michael S wrote:
>>
>>> On Wed, 8 May 2024 14:25:15 +0200
>>> Terje Mathisen <terje.mathisen@tmsw.no> wrote:
>>>
>>>> Michael S wrote:
>>>>> On Tue, 7 May 2024 06:35:53 -0000 (UTC)
>>>>> "Stephen Fuld" <SFuld@alumni.cmu.edu.invalid> wrote:
>>>>>> MitchAlsup1 wrote:
>>>>>>> John Levine wrote:
>>>>>>>> According to John Savard  <quadibloc@servername.invalid>:
>>>>>>>>> On Mon, 6 May 2024 02:54:11 -0000 (UTC), John Levine
>>>>>>>>> <johnl@taugh.com> wrote:
>>>>>>>>>> Why do you think bit addressing will be
>>>>>>>>>> faster than shifting and masking? ...
>>>>>>>>> So just because a processor has a 64-bit bus to memory doesn't
>>>>>>>>> mean it has to implement fetching a single byte from memory by
>>>>>>>>> doing a shift and mask operation in a 64-bit register.
>>>> Instead, >>>>> each byte of the bus could have a direct wired path
>>>> to the low >>>>> 8-bits of the internal data bus feeding the
>>>> registers.  >>>
>>>>>>>> I was more thinking about storing bit fields, where you
>>>> probably >>>> have to fetch the whole word or cache line or
>>>> whatever, shift the >>>> new field into it, and then store it back.
>>>> You already have to do >>>> something like that for byte stores but
>>>> bit addressing makes it 8 >>>> times as hairy.
>>>>>>>
>>>>>>> Which is no different than ECC, BTW...
>>>>>>>
>>>>>>> Could someone invent a bit field ISA that was as efficient as a
>>>>>>> byte accessible architecture:: probably.
>>>>>>>
>>>>>>> Could this bit accessible architecture outperform a byte ISA on
>>>>>>> typical codes:: doubtful. Two reasons:: 1) more delay in the
>>>> LD/ST >>> pipeline, 2) most programs use as little bit-fielding as
>>>> possible >>> (not as much as practical) !!!
>>>>>>
>>>>>>
>>>>>> Some time ago, I proposed an additional instruction, a load
>>>> varient >> that allowed you to address bit fields.  Would it be
>>>> slower than a >> "normal" byte oriented load?  Almost certainly.
>>>> But would it be >> faster than doing all the shifts, masks, word
>>>> crossing >> calculations, etc. via extra instructions?  Again,
>>>> almost >> certainly.  So you keep the benefits of byte oriented
>>>> loads most >> of the time, but have "reasonable" access to bit
>>>> fields when you >> need them, faster than without the
>>>> extrainstructions.  Hopefully >> the best of both worlds.
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> When you load bit field from memory, there is very high chance
>>>>> that you would want adjacent bit field soon thereafter.
>>>>> Think about it.
>>>>
>>>> Which means that you would like to have a dedicated streaming
>>>> buffer cache for the EXTR operation?
>>>>
>>>> Terje
>>>>
>>>>
>>>
>>> That not what I wanted to hint to Stephen.
>>> I wanted to hint that in typical situation, i.e. when one 32-bit or
>>> 64-bit load serves several bit field extractions, his additional
>>> instruction would be slower rather than faster than existing practice.
>>
>>
>> Perhaps.  But if you aren't absolutely sure that the next field doesn't
>> cross a 64 bit boundry, then you have to test for that, and if it does,
>> add more instructions to handle it.  If that happens, your advantage is
>> lost.  Even the test and conditional jump/predication when you don't
>> cross the boundry makes it pretty close.
>>
>> And, as I mentioned in a previous post, I would expect higher end
>> implementations to make use of some sort of stream buffer, as Terje
>> suggests.
> 
> In typical codecs, tokens are mostly 2-3 to 8-10 bits long, so by having 
> a 64-bit buffer which always contains at least 32 bits, you don't need 
> to worry about any straddles, and for strings of shorter tokens, you 
> don't even need to check if a reload/buffer fill-up is needed.
> 

Yeah.

For something like the LDBITS idea I had mentioned elsewhere, it would 
probably work as, say:
Take the higher-order bits of the position and use them as a byte offset;
Perform a 64-bit load at a byte-aligned address;
Do a final shift right (0-7 bits) as part of a post-step (using the low 
3 bits of the index).

Conceptually, this wouldn't be too far removed from something like the 
LDTEX instruction.




Though, had noticed recently that a lot of typos seem to escape my 
notice on my end. This is possibly a downside of using a 9pt font on a 
4K monitor (22 inch) with 100% UI zoom (*). Can fir more stuff on 
screen, but potentially not the most easily readable experience.

*: Windows seemingly recommending 150%, but this largely defeats the 
purpose of having a higher resolution.

But, does bring up the idle mystery of, say, what the "TempleOS" UI 
would look like in a 4K format, if keeping a similar density of "visual 
clutter". Or, maybe do all of the text with 6x8 pixel glyphs.




In other idle news: After a bit of fiddling, have managed to get a 
"hardware viable" encoder for my 256-color palette to get 
"semi-passable" image quality.

Color normalization stage: Figures out, for Y (max of R/G/B), how to add 
a shifted input values together with the input to get an approximation 
of the normalized value, with a bias added to make up the difference 
(since "y+(y>>n)" may not add up to an exact target value; normalizing 
to target helps, but simply using a bias to normalize the color 
desaturates the vector; can't use real multiply as I don't want to burn 
a bunch of DSP48's on this).

This is turned into 6-bit (RGB222) and used to key a 6-bit lookup table, 
with some keys (where R=G=B in RGB222 space) keying into a second lookup 
for off-white or gray cases.

Possible could be to have 3 levels:
   Top level RGB222
   Second level: MSB's of R/G/B are all 1;
   Third level: R=G=B in RGB222
But, the gain would be smaller it seems.

Note that while a 9-bit lookup could avoid a lot of this, it would be 
too expensive.

For doing 4 pixels at a time, currently seems to take ~ 230 LUTs, or ~ 
57 LUT per color.


While worse color fidelity isn't ideal, being able to run Doom at ~ 20 
fps in a GUI window (vs ~ 8-12) is an improvement (where the 
RGB555->indexed lookup was a bit of a bottleneck).

....


> Terje
> 
>