Article <v1bbcb$2oef7$1@dont-email.me>

Warning: mysqli::__construct(): (HY000/1203): User howardkn already has more than 'max_user_connections' active connections in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\includes\artfuncs.php on line 21
Failed to connect to MySQL: (1203) User howardkn already has more than 'max_user_connections' active connections
Warning: mysqli::query(): Couldn't fetch mysqli in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\index.php on line 66
Article <v1bbcb$2oef7$1@dont-email.me>

Deutsch English Français Italiano

<v1bbcb$2oef7$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!weretis.net!feeder8.news.weretis.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: BGB <cr88192@gmail.com>
Newsgroups: comp.arch
Subject: Re: why bits, Byte Addressability And Beyond
Date: Mon, 6 May 2024 14:34:31 -0500
Organization: A noiseless patient Spider
Lines: 143
Message-ID: <v1bbcb$2oef7$1@dont-email.me>
References: <v0s17o$2okf4$2@dont-email.me>
 <2024May4.111127@mips.complang.tuwien.ac.at>
 <AnsZN.60734$gF_b.49289@fx17.iad> <v19f9u$2asct$1@dont-email.me>
 <v19goj$h9f$1@gal.iecc.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 06 May 2024 21:34:36 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="3fa912b12de5568ef34432866578c01c";
	logging-data="2898407"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX195hs0NcPHB8bmDd8KZTDwPqXSK+gct2SQ="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:/r1cL4ehrmq1zXkzRRI8kw1FJxY=
In-Reply-To: <v19goj$h9f$1@gal.iecc.com>
Content-Language: en-US
Bytes: 5985

On 5/5/2024 9:54 PM, John Levine wrote:
> According to Lawrence D'Oliveiro  <ldo@nz.invalid>:
>> On Sat, 04 May 2024 15:21:04 GMT, Scott Lurndal wrote:
>>
>>> d) all modern major architectures have instructions for bitfield
>>> manipulation (insert, extract) obviating any need for general bit-level
>>> addressing.
>>
>> Even if those bottom three bits of the address must be zero in every other
>> instruction but these, I thought it would be convenient to have them, just
>> for these bitfield instructions. It would save passing around a separate
>> bit-offset field in arbitrary-bit-aligned pointers.
> 
> The only significant application for bit addressing that anyone has
> mentioned is data compression. It's not something that computers spend
> a great deal of time doing, and I see no reason to believe that bit
> addressing would make it much faster than the way it's done now with
> shifting and masking.
> 

In my case, reading a Huffman symbol could be, say:
   //R8=byte pos, R9=bit pos, R10=symbol lookup
   MOV.L  (R8, 0), R4
   SHLR.L R4, R9, R4
   AND    R4, 4095, R4
   MOV.W  (R10, R4), R5
   EXTU.B R5, R11       //result
   SHLD.L R5, -12, R6   //get bit adjustment
   ADD    R9, R6, R9
   SHLD.L R9, -3, R7
   ADD    R8, R7, R8
   AND    R9, 7, R9
Or, in C terms:
   t0=*(u32 *)cs;
   t1=(t0>>bitpos)&4095;
   t2=lookup[t1];
   result=t2&255;
   bitpos+=(t2>>12);
   cs+=bitpos>>3;
   bitpos&=7;

What if bitpos was linear and we did not adjust cs?...
   SHLR.L  R9, -3, R4
   LEA.B   (R8, R9), R4
   MOV.L   (R4, 0), R4
   AND     R9, 7, R7
   SHLR.L  R4, R7, R4
   AND     R4, 4095, R4
   MOV.W   (R10, R4), R5
   EXTU.B  R5, R11
   SHLD.L  R5, -12, R6
   ADD     R9, R6, R9
Or, C:
   t0=*(u32 *)(cs+(bitpos>>3));
   t1=(t0>>(bitpos&7))&4095;
   t2=lookup[t1];
   result=t2&255;
   bitpos+=(t2>>12);

Where, here I am assuming a 12-bit symbol length limit. With 15 bit 
symbols, like Deflate, the performance would now be dominated by L1 
misses with this approach, and it is generally faster (also on x86) to 
use a 2-stage lookup (with a fallback case to deal with symbols longer 
than 8 bits).

One could argue, maybe a bit-load instruction, say:
   LDBITS  (R8, R9), R4
   AND     R4, 4095, R4
   MOV.W   (R10, R4), R5
   EXTU.B  R5, R11
   SHLD.L  R5, -12, R6
   ADD     R9, R6, R9

But, harder to come up with much else that would give meaningful benefit 
at this, and fits within existing encoding and pipeline constraints.

Maybe LDBITS could use a RiDISP encoding but treat the immediate field 
as a mask width:
   LDBITS  (R8, R9, 12), R4
   Where, for the disp:
     0=no mask
     1..56=mask by this many bits
     57+=invalid
Saving the Jumbo+AND instruction, assuming an implementation that would 
allow a maximum of a 56-bit bitfield.


> If you do want to make compression faster, it'd make more sense to add
> instructions to do the compressing you compare about, like DFLTCC in
> S/360 and zSeries that speed up gzip, rather than adding three bits to
> the other 99% of instructions that don't use bit fields.
> 
> If you think otherwise, what are the applications that will make all
> those address bits useful, and why do you think bit addressing will be
> faster than shifting and masking? There's still going to be memory
> underneath that's byte or word addressed so the shifting and masking
> is going to happen anyway.

I am not aware of much, in any case.

The two obvious use-cases for bits are:
   Things like Huffman decoding;
   Packed bitfields within words (usually shift+mask, and constant).

The latter would likely be best served with something like:
   BITEXT  Rm, Imm12, Rn
With, say, 12-bits giving two 6-bit fields.

But, within 32-bit ops, no encoding space for this.

Could in theory use a 64-bit Op64 encoding, probably just burning an 
Imm17s (Though, would be Imm16 in this case) and treating it as 2 8-bit 
sub-fields.
   FFw0_0kjj_F2nm_9Gii

Where:
   ii: Behaves as in normal shift, -63 .. 63
   jj=0: No mask, jj=1..63: mask.

Would use an FF prefix rather than an FE:
   Because an Imm33s field would be a waste;
   Because the FE spot had already been claimed for SIMD/NNX ops...


Though, another possibility would be tweaking the behavior of SHLD.Q:
   MOV     Imm16u, Rt
   SHLD.Q  Rm, Rt, Rn

Where, say:
   Rt(7:0): Gives the shift, as before (MOD n).
   Rt(15:8): Gives a mask, only interpreted as a mask if (1..63).
   Both 0 and -1/255 would be no-mask (typical of a sign-extended shift).

Though, this would only give slight benefit over a Shift+AND or 
Shift+Shift in that it could potentially save 1 cycle of interlock penalty.


This later change would likely be implicit in the former.

....