Deutsch English Français Italiano |
<uuks6s$7p08$1@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!weretis.net!feeder8.news.weretis.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: BGB <cr88192@gmail.com> Newsgroups: comp.arch Subject: Re: "Mini" tags to reduce the number of op codes Date: Wed, 3 Apr 2024 19:27:59 -0500 Organization: A noiseless patient Spider Lines: 217 Message-ID: <uuks6s$7p08$1@dont-email.me> References: <uuk100$inj$1@dont-email.me> <uukduu$4o4p$1@dont-email.me> <e915303b53f3b4099ff254a4dcdfbe17@www.novabbs.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Date: Thu, 04 Apr 2024 00:28:13 +0200 (CEST) Injection-Info: dont-email.me; posting-host="7a85b7280e08e1d7944c412aa4f1d5d9"; logging-data="254984"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/uAVUDSi6Y3+T5xBWHhAz+9+BnqJQEpZ8=" User-Agent: Mozilla Thunderbird Cancel-Lock: sha1:rRpZoXPUj62iOM64UWvqwo+7guo= In-Reply-To: <e915303b53f3b4099ff254a4dcdfbe17@www.novabbs.org> Content-Language: en-US Bytes: 9209 On 4/3/2024 4:53 PM, MitchAlsup1 wrote: > BGB-Alt wrote: > >> >> FWIW: >> This doesn't seem too far off from what would be involved with dynamic >> typing at the ISA level, but with many of same sorts of drawbacks... > > > >> Say, for example, top 2 bits of a register: >> 00: Object Reference >> Next 2 bits: >> 00: Pointer (with type-tag) >> 01: ? >> 1z: Bounded Array >> 01: Fixnum (route to ALU) >> 10: Flonum (route to FPU) >> 11: Other types >> 00: Smaller value types >> Say: int/uint, short/ushort, ... >> ... > > So, you either have 66-bit registers, or you have 62-bit FP numbers ?!? > This solves nobody's problems; not even LISP. > Yeah, there is likely no way to make this worthwhile... >> One issue: >> Decoding based on register tags would mean needing to know the >> register tag bits at the same time the instruction is being decoded. >> In this case, one is likely to need two clock-cycles to fully decode >> the opcode. > > Not good. But what if you don't know the tag until the register is > delivered from a latent FU, do you stall DECODE, or do you launch and > make the instruction > queue element have to deal with all outcomes. > It is likely that the pipeline would need to stall until results are available. It is also likely that such a CPU would have a minimum effective latency of 2 or 3 clock cycles for *every* instruction (and probably 4 or 5 cycles for memory load), in addition to requiring pipeline stalls. >> ID1: Unpack instruction to figure out register fields, etc. >> ID2: Fetch registers, specialize variable instructions based on tag bits. > >> For timing though, one ideally doesn't want to do anything with the >> register values until the EX stages (since ID2 might already be tied >> up with the comparably expensive register-forwarding logic), but >> asking for 3 cycles for decode is a bit much. > >> Otherwise, if one does not know which FU should handle the operation >> until EX1, this has its own issues. > > Real-friggen-ely > These issues could be a deal-breaker for such a CPU. >> Or, possible, the FU's decide >> whether to accept the operation: >> ALU: Accepts operation if both are fixnum, FPU if both are Flonum. > > What if IMUL is performed in FMAC, IDIV in FDIV,... Int<->FP routing is > based on calculation capability {Even CDC 6600 performed int × in the FP > × unit (not in Thornton's book, but via conversation with 6600 logic > designer at Asilomar some time ago. All they had to do to get FP × to > perform int × was disable 1 gate.......) > Then you have a mess... So, probably need to sort it out before EX in any case. >> But, a proper dynamic language allows mixing fixnum and flonum with >> the result being implicitly converted to flonum, but from the FPU's >> POV, this would effectively require two chained FADD operations (one >> for the Fixnum to Flonum conversion, one for the FADD itself). > > That is a LANGUAGE problem not an ISA problem. SNOBOL allowed one to add > a string to an integer and the string would be converted to int before..... > If you have dynamic types in hardware in this way, then effectively the typesystem mechanics switch from being a language issue to a hardware issue. One may also end up with, say, a CPU that can run Scheme or JavaScript or similar, but likely couldn't run C without significant hassles. >> Many other cases could get hairy, but to have any real benefit, the >> CPU would need to be able to deal with them. In cases where the >> compiler deals with everything, the type-tags become mostly moot (or >> potentially detrimental). > > You are arguing that the added complexity would somehow pay for itself. > I can't see it paying for itself. > One either goes all in, or abandons the idea entirely. There isn't really a middle option in this scenario (then one just ends up with something that is bad at everything). I was not saying it could work, but in a way, pointing out the issues that would likely make this unworkable. Though, that said, there could be possible merit in a CPU core that could run a language like ECMAScript at roughly C like speeds, even if it was basically unusable for pretty much anything else. Though, for ECMAScript, also make a case for taking the SpiderMonkey option and largely abandoning the use of an integer ALU (instead running all of the integer math through the FPU; which could be modified to support bitwise integer operations and similar as well). >> But, then, there is another issue: >> C code expects C type semantics to be respected, say: >> Signed int overflow wraps at 32 bits (sign extending); > maybe >> Unsigned int overflow wraps at 32 bits (zero extending); > maybe I am dealing with some code that has a bad habit of breaking if integer overflows don't happen in the expected ways (say, the ROTT engine is pretty bad about this one...). When I first started working on my ROTT port, there was also a lot of wackiness where the engine would go out of bounds, then behavior would depend on what other things in memory it encountered when it did so. I have mostly managed to fix up all the out-of-bounds issues, but this isn't enough to keep the demo's from desyncing (a similar issue applies with my Doom port). Apparently, other engines like ZDoom and similar needed to do a bit of "heavy lifting" to get the demos from all of the various WAD versions to play without desync; as Doom was also dependent on the behavior of out-of-bounds memory accesses, and it was needed to turn these into in-bounds accesses (to larger memory objects) with the memory contents of the out-of-bounds accesses being faked. Of course, the other option is just to "fix" the out-of-bounds accesses, and live with a port where the demo playback desyncs. Meanwhile, Quake entirely avoided this issue: The demo playback is based on recording the location and orientation of the player and any enemies at every point in time and similar, rather than based on recording and replaying the original sequence of keyboard inputs (and assuming that everything always happens exactly the same each time). Then again, these sorts of issues are not unique to these games. Have watched more than a few speed-runs involving using glitches either to leave the playable parts of the map, or using convoluted sequences of actions to corrupt memory in such a way as to achieve a desired effect (such as triggering a warp to the end of the game). Like, during normal gameplay, these games are seemingly just sorta corrupting memory all over the place but, for the most part, no one ========== REMAINDER OF ARTICLE TRUNCATED ==========