Path: ...!eternal-september.org!feeder3.eternal-september.org!i2pn.org!i2pn2.org!.POSTED!not-for-mail From: mitchalsup@aol.com (MitchAlsup1) Newsgroups: comp.arch Subject: Re: Cost of handling misaligned access Date: Tue, 4 Feb 2025 18:58:43 +0000 Organization: Rocksolid Light Message-ID: <12d9d216c9a094ef963217baa35793e9@www.novabbs.org> References: <5lNnP.1313925$2xE6.991023@fx18.iad> <2025Feb2.184458@mips.complang.tuwien.ac.at> <112ffb344782247afc7b5e9e36c085d5@www.novabbs.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Info: i2pn2.org; logging-data="2689573"; mail-complaints-to="usenet@i2pn2.org"; posting-account="o5SwNDfMfYu6Mv4wwLiW6e/jbA93UAdzFodw5PEa6eU"; User-Agent: Rocksolid Light X-Rslight-Site: $2y$10$jh4cmoV1WxCh03i0SwwmMOw5nUx4novsonjZHcfijUaN4QtApRXUS X-Spam-Checker-Version: SpamAssassin 4.0.0 X-Rslight-Posting-User: cb29269328a20fe5719ed6a1c397e21f651bda71 Bytes: 2823 Lines: 43 On Tue, 4 Feb 2025 4:49:57 +0000, EricP wrote: > MitchAlsup1 wrote: >> >> Basically, VAX taught us why we did not want to do "all that" in >> a single instruction; while Intel 432 taught us why we did not bit >> aligned decoders (and a lot of other things). > > I case people are interested... > > [paywalled] > The Instruction Decoding Unit for the VLSI 432 General Data Processor, > 1981 > https://ieeexplore.ieee.org/abstract/document/1051633/ > > The benchmarks in table 1(a) below tell it all: > a 4 MHz 432 is 1/15 to 1/20 the speed (slower) than a 5 MHz VAX/780, > 1/4 to 1/7 speed than a 8 MHz 68000 or 5 MHz 8086 > > A Performance Evaluation of The Intel iAPX 432, 1982 > https://dl.acm.org/doi/pdf/10.1145/641542.641545 > > And the reasons are covered here: > > Performance Effects of Architectural Complexity in the Intel 432, 1988 > https://www.princeton.edu/~rblee/ELE572Papers/Fall04Readings/I432.pdf From the link:: The 432’s procedure calls are quite costly. A typical procedure call requires 16 read accesses to memory and 24 write accesses, and it consumes 982 machine cycles. In terms of machine cycles, this makes it about ten times as slow as a call on the MC68010 or VAX 11/780. almost 1000 cycles just to call a subroutine !!! Lots of thinigs teh architects got wrong in there..... > > Bob Colwell, one of the authors of the third paper, later joined > Intel as a senior architect and was involved in the development of the > P6 core used in the Pentium Pro, Pentium II, and Pentium III > microprocessors, > and designs derived from it are used in the Pentium M, Core Duo and > Core Solo, and Core 2.