Deutsch English Français Italiano |
<ddfe16ae5b6b2fd1339602826246b849@www.novabbs.org> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!weretis.net!feeder9.news.weretis.net!i2pn.org!i2pn2.org!.POSTED!not-for-mail From: mitchalsup@aol.com (MitchAlsup1) Newsgroups: comp.arch Subject: Re: Making Lemonade (Floating-point format changes) Date: Tue, 14 May 2024 15:19:34 +0000 Organization: Rocksolid Light Message-ID: <ddfe16ae5b6b2fd1339602826246b849@www.novabbs.org> References: <abe04jhkngt2uun1e7ict8vmf1fq8p7rnm@4ax.com> <memo.20240512203459.16164W@jgd.cix.co.uk> <v1rab7$2vt3u$1@dont-email.me> <20240513151647.0000403f@yahoo.com> <v1tre1$3leqn$1@dont-email.me> <9c79fb24a0cf92c5fac633a409712691@www.novabbs.org> <2024May14.073553@mips.complang.tuwien.ac.at> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Info: i2pn2.org; logging-data="1085138"; mail-complaints-to="usenet@i2pn2.org"; posting-account="65wTazMNTleAJDh/pRqmKE7ADni/0wesT78+pyiDW8A"; User-Agent: Rocksolid Light X-Rslight-Site: $2y$10$KUGRnMX1DPlK0bKoiD7WGukcgMDeNzpMk1XbYRWGi.oUhsFrUEVuu X-Spam-Checker-Version: SpamAssassin 4.0.0 X-Rslight-Posting-User: ac58ceb75ea22753186dae54d967fed894c3dce8 Bytes: 4058 Lines: 66 Anton Ertl wrote: > mitchalsup@aol.com (MitchAlsup1) writes: >>I recall that MIPS could emulate a TLB table walk in something like >>19 cycles. That is:: a few cycles to get there, a hash table access, >>a check, a TLB install, and a few cycles to get back. > Which MIPS? R2000? R10000? Something else? Was this an inverted page > table? R3000 and it was a hast table ~1MB in size. >>On an x86 this would be at least 200 cycles just getting there and back. > Which x86? 8086? 80186? 80286? These (maybe the 8088 and V20, too) > are the only implementations that deserve to be called x86. If you > mean some IA-32 or AMD64 implementations, which ones? > Anyway, let's see how this works for the U74 (a RISC-V implementation > which apparently uses trapping for unaligned loads); here we have a > 10M iteration loop with a payload that performs one load per > iteration: > [fedora-starfive:~/nfstmp/gforth-riscv:104544] perf stat -e instructions -e cycles gforth-fast -e ': foo 10000000 0 do @ loop ; 0 value x here aligned to x x x ! x foo drop bye' > Performance counter stats for 'gforth-fast -e : foo 10000000 0 do @ loop ; 0 value x here aligned to x x x ! x foo drop bye': > 223805151 instructions:u # 0.70 insn per cycle > 318131306 cycles:u > 0.352533487 seconds time elapsed > 0.257061000 seconds user > 0.064265000 seconds sys > [fedora-starfive:~/nfstmp/gforth-riscv:104545] perf stat -e instructions -e cycles gforth-fast -e ': foo 10000000 0 do @ loop ; 0 value x here aligned 1+ to x x x ! x foo drop bye' > Performance counter stats for 'gforth-fast -e : foo 10000000 0 do @ loop ; 0 value x here aligned 1+ to x x x ! x foo drop bye': > 5329494415 instructions:u # 0.75 insn per cycle > 7149481783 cycles:u > 7.183239751 seconds time elapsed > 7.082298000 seconds user > 0.070121000 seconds sys > So the unaligned access handling result in 511 additional instructions > per load compared to an aligned access (so it obviously does the > handling using some kind of trapping). Each unaligned access results > in 683 additional cycles. Yes, but notice sys time hardly changes, so, RISC-V is performing the misaligned LD in user mode (2 context switches -- likely somewhat light weight). > So better use the unspecified MIPS, right? However, if the > unspecified MIPS is an R2000, 19 cycles on a 12.5MHz R2000 cost > 1.52us, whereas 683 cycles on a 1000MHz U74 cost 0.683us (and I have > heard that in the Visionfive V2 the U74 runs at 1500MHz). Given at least the same cache footprint a 2GHz R3000 would still be in the 20-cycle range. {{That 19 cycle TLB reload is dependent on the handler and its table have a footprint in the cache(s). > - anton