Deutsch English Français Italiano |
<v46571$7cro$1@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Terje Mathisen <terje.mathisen@tmsw.no> Newsgroups: comp.arch Subject: Re: Making Lemonade (Floating-point format changes) Date: Mon, 10 Jun 2024 08:07:28 +0200 Organization: A noiseless patient Spider Lines: 49 Message-ID: <v46571$7cro$1@dont-email.me> References: <abe04jhkngt2uun1e7ict8vmf1fq8p7rnm@4ax.com> <memo.20240512203459.16164W@jgd.cix.co.uk> <v1rab7$2vt3u$1@dont-email.me> <20240513151647.0000403f@yahoo.com> <v1tre1$3leqn$1@dont-email.me> <9c79fb24a0cf92c5fac633a409712691@www.novabbs.org> <v45pdn$3n9j$1@dont-email.me> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable Injection-Date: Mon, 10 Jun 2024 08:07:29 +0200 (CEST) Injection-Info: dont-email.me; posting-host="61a4720bc8db2ac0344820afeea0b6ea"; logging-data="242552"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+A67C+w1zYFgX1c/wV6z8V9yrYvnpdB+J+V0MBf2QBEg==" User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Firefox/91.0 SeaMonkey/2.53.18.2 Cancel-Lock: sha1:0WvV+elYo1grbP3HFWGi5ztrf6o= In-Reply-To: <v45pdn$3n9j$1@dont-email.me> Bytes: 3317 Lawrence D'Oliveiro wrote: > On Mon, 13 May 2024 21:16:48 +0000, MitchAlsup1 wrote: >=20 >> Emulation is slow when trap overhead is large and not-slow when trap >> overhead is small. >=20 > I think it was a particular version of the old Mac OS, from around 1990= or > so, that implemented a really amazing hack. Some 32-bit machines had > hardware floating-point, others didn=C3=A2=E2=82=AC=E2=84=A2t. So devel= opers of numerics- > intensive apps had to build two versions of their code, one with the > floating-point instructions, the other with calls to Apple=C3=A2=E2=82=AC= =E2=84=A2s SANE library. >=20 > The hack involved running code built to use hardware floating-point > instructions, on hardware that didn=C3=A2=E2=82=AC=E2=84=A2t have them.= The instructions were of > course trapped and emulated. But more than that, the system would patch= > the instruction that caused the trap, turning it into a direct call int= o > the emulation routine. So after the first execution, each such instruct= ion > would run much faster. Until the code got unloaded from RAM and the pat= ch > was lost, of course. This only works when each FP instruction is at least as long as a=20 function call. This particular approach was standard on PCs more or less = from the very beginning (i.e. 1981++): You could build applicatons with direct 8087 instructions, with pure sw=20 emulation via CALL FDIV_emulation etc, or in a mode where each emitted=20 hw fp instruction was followed by enough NOPs to make the total length=20 at least 5 bytes: This way the missing HW trap handler could patch them=20 into CALLs (possibly followed by one or more NOPS if the HW opcode was=20 very long) instead. Since all those 8087 instructions were _very_ slow (30-300 clock=20 cycles?), executiong an extra NOP or two made no discernible difference. Terje --=20 - <Terje.Mathisen at tmsw.no> "almost all programming can be viewed as an exercise in caching"