Deutsch English Français Italiano |
<v1gel0$3npf$1@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!weretis.net!feeder8.news.weretis.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: BGB <cr88192@gmail.com> Newsgroups: comp.arch Subject: Re: why bits, Byte Addressability And Beyond Date: Wed, 8 May 2024 13:01:02 -0500 Organization: A noiseless patient Spider Lines: 156 Message-ID: <v1gel0$3npf$1@dont-email.me> References: <v0s17o$2okf4$2@dont-email.me> <v19f9u$2asct$1@dont-email.me> <v19goj$h9f$1@gal.iecc.com> <5r3i3j58je3e7q9j2lir1gd4ascsmumca2@4ax.com> <v1bgru$jo2$1@gal.iecc.com> <6d6fa399e0f5dd481125348fa56d8ef8@www.novabbs.org> <v1ci49$33ua6$1@dont-email.me> <20240507114742.00003e59@yahoo.com> <v1fqvb$3ushi$1@dont-email.me> <20240508153648.00005583@yahoo.com> <v1g83s$23im$1@dont-email.me> <v1gc0p$3306$1@dont-email.me> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Date: Wed, 08 May 2024 20:01:05 +0200 (CEST) Injection-Info: dont-email.me; posting-host="9114484c760b4ef2bce4f30078b7f8b6"; logging-data="122671"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/sWDQcqeCkGmnlyy+4VJpWwUTjB1mxhdU=" User-Agent: Mozilla Thunderbird Cancel-Lock: sha1:vXrMzcRQjAa9Kk7q57qyjHqodQQ= In-Reply-To: <v1gc0p$3306$1@dont-email.me> Content-Language: en-US Bytes: 7709 On 5/8/2024 12:16 PM, Terje Mathisen wrote: > Stephen Fuld wrote: >> Michael S wrote: >> >>> On Wed, 8 May 2024 14:25:15 +0200 >>> Terje Mathisen <terje.mathisen@tmsw.no> wrote: >>> >>>> Michael S wrote: >>>>> On Tue, 7 May 2024 06:35:53 -0000 (UTC) >>>>> "Stephen Fuld" <SFuld@alumni.cmu.edu.invalid> wrote: >>>>>> MitchAlsup1 wrote: >>>>>>> John Levine wrote: >>>>>>>> According to John Savard <quadibloc@servername.invalid>: >>>>>>>>> On Mon, 6 May 2024 02:54:11 -0000 (UTC), John Levine >>>>>>>>> <johnl@taugh.com> wrote: >>>>>>>>>> Why do you think bit addressing will be >>>>>>>>>> faster than shifting and masking? ... >>>>>>>>> So just because a processor has a 64-bit bus to memory doesn't >>>>>>>>> mean it has to implement fetching a single byte from memory by >>>>>>>>> doing a shift and mask operation in a 64-bit register. >>>> Instead, >>>>> each byte of the bus could have a direct wired path >>>> to the low >>>>> 8-bits of the internal data bus feeding the >>>> registers. >>> >>>>>>>> I was more thinking about storing bit fields, where you >>>> probably >>>> have to fetch the whole word or cache line or >>>> whatever, shift the >>>> new field into it, and then store it back. >>>> You already have to do >>>> something like that for byte stores but >>>> bit addressing makes it 8 >>>> times as hairy. >>>>>>> >>>>>>> Which is no different than ECC, BTW... >>>>>>> >>>>>>> Could someone invent a bit field ISA that was as efficient as a >>>>>>> byte accessible architecture:: probably. >>>>>>> >>>>>>> Could this bit accessible architecture outperform a byte ISA on >>>>>>> typical codes:: doubtful. Two reasons:: 1) more delay in the >>>> LD/ST >>> pipeline, 2) most programs use as little bit-fielding as >>>> possible >>> (not as much as practical) !!! >>>>>> >>>>>> >>>>>> Some time ago, I proposed an additional instruction, a load >>>> varient >> that allowed you to address bit fields. Would it be >>>> slower than a >> "normal" byte oriented load? Almost certainly. >>>> But would it be >> faster than doing all the shifts, masks, word >>>> crossing >> calculations, etc. via extra instructions? Again, >>>> almost >> certainly. So you keep the benefits of byte oriented >>>> loads most >> of the time, but have "reasonable" access to bit >>>> fields when you >> need them, faster than without the >>>> extrainstructions. Hopefully >> the best of both worlds. >>>>>> >>>>>> >>>>>> >>>>> >>>>> When you load bit field from memory, there is very high chance >>>>> that you would want adjacent bit field soon thereafter. >>>>> Think about it. >>>> >>>> Which means that you would like to have a dedicated streaming >>>> buffer cache for the EXTR operation? >>>> >>>> Terje >>>> >>>> >>> >>> That not what I wanted to hint to Stephen. >>> I wanted to hint that in typical situation, i.e. when one 32-bit or >>> 64-bit load serves several bit field extractions, his additional >>> instruction would be slower rather than faster than existing practice. >> >> >> Perhaps. But if you aren't absolutely sure that the next field doesn't >> cross a 64 bit boundry, then you have to test for that, and if it does, >> add more instructions to handle it. If that happens, your advantage is >> lost. Even the test and conditional jump/predication when you don't >> cross the boundry makes it pretty close. >> >> And, as I mentioned in a previous post, I would expect higher end >> implementations to make use of some sort of stream buffer, as Terje >> suggests. > > In typical codecs, tokens are mostly 2-3 to 8-10 bits long, so by having > a 64-bit buffer which always contains at least 32 bits, you don't need > to worry about any straddles, and for strings of shorter tokens, you > don't even need to check if a reload/buffer fill-up is needed. > Yeah. For something like the LDBITS idea I had mentioned elsewhere, it would probably work as, say: Take the higher-order bits of the position and use them as a byte offset; Perform a 64-bit load at a byte-aligned address; Do a final shift right (0-7 bits) as part of a post-step (using the low 3 bits of the index). Conceptually, this wouldn't be too far removed from something like the LDTEX instruction. Though, had noticed recently that a lot of typos seem to escape my notice on my end. This is possibly a downside of using a 9pt font on a 4K monitor (22 inch) with 100% UI zoom (*). Can fir more stuff on screen, but potentially not the most easily readable experience. *: Windows seemingly recommending 150%, but this largely defeats the purpose of having a higher resolution. But, does bring up the idle mystery of, say, what the "TempleOS" UI would look like in a 4K format, if keeping a similar density of "visual clutter". Or, maybe do all of the text with 6x8 pixel glyphs. In other idle news: After a bit of fiddling, have managed to get a "hardware viable" encoder for my 256-color palette to get "semi-passable" image quality. Color normalization stage: Figures out, for Y (max of R/G/B), how to add a shifted input values together with the input to get an approximation of the normalized value, with a bias added to make up the difference (since "y+(y>>n)" may not add up to an exact target value; normalizing to target helps, but simply using a bias to normalize the color desaturates the vector; can't use real multiply as I don't want to burn a bunch of DSP48's on this). This is turned into 6-bit (RGB222) and used to key a 6-bit lookup table, with some keys (where R=G=B in RGB222 space) keying into a second lookup for off-white or gray cases. Possible could be to have 3 levels: Top level RGB222 Second level: MSB's of R/G/B are all 1; Third level: R=G=B in RGB222 But, the gain would be smaller it seems. Note that while a 9-bit lookup could avoid a lot of this, it would be too expensive. For doing 4 pixels at a time, currently seems to take ~ 230 LUTs, or ~ 57 LUT per color. While worse color fidelity isn't ideal, being able to run Doom at ~ 20 fps in a GUI window (vs ~ 8-12) is an improvement (where the RGB555->indexed lookup was a bit of a bottleneck). .... > Terje > >