Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: "Stephen Fuld" <SFuld@alumni.cmu.edu.invalid>
Newsgroups: comp.arch
Subject: Re: why bits, Byte Addressability And Beyond
Date: Wed, 8 May 2024 16:09:32 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 92
Message-ID: <v1g83s$23im$1@dont-email.me>
References: <v0s17o$2okf4$2@dont-email.me> <v19f9u$2asct$1@dont-email.me> <v19goj$h9f$1@gal.iecc.com> <5r3i3j58je3e7q9j2lir1gd4ascsmumca2@4ax.com> <v1bgru$jo2$1@gal.iecc.com> <6d6fa399e0f5dd481125348fa56d8ef8@www.novabbs.org> <v1ci49$33ua6$1@dont-email.me> <20240507114742.00003e59@yahoo.com> <v1fqvb$3ushi$1@dont-email.me> <20240508153648.00005583@yahoo.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 08 May 2024 18:09:32 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="a94c25946c95f02a4f0a94b290ee391c";
	logging-data="69206"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1+0BjFvfGQnUPN4J+S/9NuJ0mO3yALNMZA="
User-Agent: XanaNews/1.21-f3fb89f (x86; Portable ISpell)
Cancel-Lock: sha1:q2Sf12df73jYW1pLfPvjKp1ZIV4=
Bytes: 4885

Michael S wrote:

> On Wed, 8 May 2024 14:25:15 +0200
> Terje Mathisen <terje.mathisen@tmsw.no> wrote:
> 
> > Michael S wrote:
> > > On Tue, 7 May 2024 06:35:53 -0000 (UTC)
> > > "Stephen Fuld" <SFuld@alumni.cmu.edu.invalid> wrote:
> > >   
> > >> MitchAlsup1 wrote:
> > >>  
> > >>> John Levine wrote:
> > >>>      
> > >>>> According to John Savard  <quadibloc@servername.invalid>:  
> > >>>>> On Mon, 6 May 2024 02:54:11 -0000 (UTC), John Levine
> > >>>>> <johnl@taugh.com> wrote:
> > >>>>>      
> > >>>>>> Why do you think bit addressing will be
> > >>>>>> faster than shifting and masking? ...  
> > >>>      
> > >>>>> So just because a processor has a 64-bit bus to memory doesn't
> > >>>>> mean it has to implement fetching a single byte from memory by
> > >>>>> doing a shift and mask operation in a 64-bit register.
> > Instead, >>>>> each byte of the bus could have a direct wired path
> > to the low >>>>> 8-bits of the internal data bus feeding the
> > registers.  >>>      
> > >>>> I was more thinking about storing bit fields, where you
> > probably >>>> have to fetch the whole word or cache line or
> > whatever, shift the >>>> new field into it, and then store it back.
> > You already have to do >>>> something like that for byte stores but
> > bit addressing makes it 8 >>>> times as hairy.  
> > > > > 
> > >>> Which is no different than ECC, BTW...
> > > > > 
> > >>> Could someone invent a bit field ISA that was as efficient as a
> > >>> byte accessible architecture:: probably.
> > > > > 
> > >>> Could this bit accessible architecture outperform a byte ISA on
> > >>> typical codes:: doubtful. Two reasons:: 1) more delay in the
> > LD/ST >>> pipeline, 2) most programs use as little bit-fielding as
> > possible >>> (not as much as practical) !!!  
> > > > 
> > > > 
> > >> Some time ago, I proposed an additional instruction, a load
> > varient >> that allowed you to address bit fields.  Would it be
> > slower than a >> "normal" byte oriented load?  Almost certainly.
> > But would it be >> faster than doing all the shifts, masks, word
> > crossing >> calculations, etc. via extra instructions?  Again,
> > almost >> certainly.  So you keep the benefits of byte oriented
> > loads most >> of the time, but have "reasonable" access to bit
> > fields when you >> need them, faster than without the
> > extrainstructions.  Hopefully >> the best of both worlds.
> > > > 
> > > > 
> > > > 
> > >>  
> > > 
> > > When you load bit field from memory, there is very high chance
> > > that you would want adjacent bit field soon thereafter.
> > > Think about it.  
> > 
> > Which means that you would like to have a dedicated streaming
> > buffer cache for the EXTR operation?
> > 
> > Terje
> > 
> > 
> 
> That not what I wanted to hint to Stephen.
> I wanted to hint that in typical situation, i.e. when one 32-bit or
> 64-bit load serves several bit field extractions, his additional
> instruction would be slower rather than faster than existing practice.


Perhaps.  But if you aren't absolutely sure that the next field doesn't
cross a 64 bit boundry, then you have to test for that, and if it does,
add more instructions to handle it.  If that happens, your advantage is
lost.  Even the test and conditional jump/predication when you don't
cross the boundry makes it pretty close.

And, as I mentioned in a previous post, I would expect higher end
implementations to make use of some sort of stream buffer, as Terje
suggests.






-- 
 - Stephen Fuld 
(e-mail address disguised to prevent spam)