Deutsch English Français Italiano |
<2024May12.083633@mips.complang.tuwien.ac.at> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!news.mixmin.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: anton@mips.complang.tuwien.ac.at (Anton Ertl) Newsgroups: comp.arch Subject: Compiler use of instructions (was: Oops) Date: Sun, 12 May 2024 06:36:33 GMT Organization: Institut fuer Computersprachen, Technische Universitaet Wien Lines: 171 Message-ID: <2024May12.083633@mips.complang.tuwien.ac.at> References: <ofeq3j9ni63e7tmccf2qbkb9t0naui44ei@4ax.com> <memo.20240510001905.16164S@jgd.cix.co.uk> <2024May11.194851@mips.complang.tuwien.ac.at> <46067b1e52e8998acdd607199d3dbf30@www.novabbs.org> Injection-Date: Sun, 12 May 2024 10:39:08 +0200 (CEST) Injection-Info: dont-email.me; posting-host="fa63f79be1b668e15e3900e1b4d19fc8"; logging-data="2773410"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/1zOCmcRBX2DDB9SPcqfyx" Cancel-Lock: sha1:Sp4rTW5vOx5vrwe4yPwNkIzB9EE= X-newsreader: xrn 10.11 Bytes: 9177 mitchalsup@aol.com (MitchAlsup1) writes: >Anton Ertl wrote: >> 1) The compiler writers found it too hard to use the complex >> instructions or addressing modes. For some kinds of instructions that >> is the case (e.g, for the AES instructions in Intel and AMD CPUs), but >> at least these days such instructions are there for use in libraries >> written in assembly language/with intrinsics. > >The 801 was correct on this:: > >The compiler must be developed at the same time as ISA, if ISA has it >and the compiler cannot use it then why is it there {yes there are >certain privileged instructions lacking this property} In case of the AMD64+ AES instructions, they are there to support efficient AES libraries that do not have the cache side channel for which Daniel Bernstein presented an exploit in: @Unpublished{bernstein05, author = {Bernstein, Daniel J.}, title = {Cache-timing attacks on {AES}}, note = {}, year = {2005}, url={https://cr.yp.to/antiforgery/cachetiming-20050414.pdf}, OPTannote = {} } So they are there because AES consumes significant amounts of CPU in some application scenarios, and because providing these instructions is helpful for security. In instruction sets like the S/360, VAX and 8086 that were designed when significant amounts of software were still written in assembly language, there are some instructions that are designed for use by assembly language programmers that compilers tend not to use or not use well (e.g., LODS on the 8086, IA-32 and AMD64). But these kinds of instructions have not been added in the last decades, since assembly-language programming has become a tiny niche. >Conversely is >compiler could almost use an instruction but does not, then adjust >the instruction specification so the compiler can !! Do you have something specific in mind? >> 2) Some instructions are slower than a sequence of simpler >> instructions, so compilers will avoid them even if they would >> otherwise use them. > >VAX CALL instructions did more work than what was required, it did >the work it was specified to perform as rapidly as the HW could perform >the specified task. It took 10 years to figure out that the CALL/RET >overhead was excessive and wasteful. Interestingly, when I look into <https://inst.eecs.berkeley.edu/~n252/paper/RISC-patterson.pdf>, the point 2) is mentioned on page 27 as "Irrational Implementations", and the examples given are the S/370 load-multiple instruction for fewer than 4 registers and the VAX INDEX instruction. On page 30 they mention that PUSHL R0 is slower than MOVL R0, -(SP) on the VAX 11/780. In <https://dl.acm.org/doi/pdf/10.1145/2465.214917> from 1985 (<8 years after the VAX was introduced) the authors already report that Michael L. Powell of DEC found the following for his experimental Modula-2 compiler: "By replacing the CALL instruction with a sequence of simple instructions that do only what is necessary, Powell was able to improve performance by 20 percent." I found <https://dl.acm.org/doi/pdf/10.1145/502874.502905> about this compiler, published in June 1984; it says: |On most processors, the procedure calling sequence defines a standard |interface between languages. As such, it is often more general than |required by a particular programming language. The compiler can detect |procedures that are called only by Modula-2 routines in the current |compilation, and replace the more expensive procedure call mechanism |with a simpler, faster one. Given how often the VAX call is mentioned, I would expect to find some paper with a more elaborate analysis (and I dimly remember reading such), but came up empty in short web searches. Anyway, let's look at the VAX CALLS instruction <http://odl.sysworks.biz/disk$vaxdocmar002/opsys/vmsos721/4515/4515pro_020.html> <https://people.computing.clemson.edu/~mark/subroutines/vax.html>. What it pushes on the stack: argument count registers specified in a mask that is at the start of the callee old pc old fp old ap (argument pointer) mask|psw condition handler (initially 0) Of these: * Pushing the argument count is not done in modern calling conventions. Instead, for languages like C with variable argument numbers, the caller is responsible for removing the arguments from the stack (if there are any on the stack, and in C the callee has to know how many arguments there are (e.g., from the format string for printf()). I guess VAX CALLS pushes the argument count in order to have a common base for these things. * The registers specified in the mask would be the callee-saved registers that the callee is using (and apparently in the usual VAX calling convention all registers are considered to be callee-saved to minimize the code size of the call and function entry code. Modern calling conventions also have caller-saved registers which makes leaf calls cheaper (up to a point). The mask feature could also be used for such calling conventions, but for many leaf functions the mask would then be empty. * Old PC is also stored by the simpler JSB instruction. RISCs store the old PC in a register and leave the saving to a separate store instruction (which is unnecessary for a leaf function). * Old FP is stored in frame-pointer-based calling conventions, but a frame pointer tends to be optional and is only used for functions where the stack grows by a variable amount (e.g., alloca()) or is needed for introspective purposes (debugging and such). * Old AP: an argument pointer seems unnecessary given that the compiler knows how big the data saved by the CALLS instruction is. I also have never seen it in a calling convention other than that of the VAX. Maybe they added this so that the backtrace can easily access the arguments without having to know anything about the stack frames. * mask|psw: Given the use of the mask for saving the registers on call, the return instruction also needs the mask; either as immediate argument, or on the stack; the latter is better for stack unwinding (throwing exceptions, debugging, and such). Modern calling conventions treat flags (if present) as caller-saved, but given that they had space left, saving the psw seems like a good use of the space. * condition handler: no idea what that was good for. It seems to me that a lot of this caters to having a good software ecosystem where lots of languages can call each other and stuff like debuggers easily know lots of things about the program. As the RISCs have demonstrated, this has a cost in performance, but OTOH, have we seen comparisons of the size and capabilities of backtrace-generating code and the like on VAX and RISCs? Of course, if the performance cost of these features is so high that compiler writers prefer to avoid the full-featured call instruction, the instruction misses the target, too. >> That has been reported by both the IBM 801 >> project about some S/370 instructions and by the Berkeley RISC project >> about the VAX. I don't remember any reports about addressing modes >> with that problem. > >The problem with address modes is their serial decode, not with the ability >to craft any operand the instruction needs. The second problem with VAX-like >addressing modes is that it is overly expressive, all operands can be >constants, whereas a good compiler will never need more than 1 constant >per instruction (because otherwise some constant arithmetic could be >performed at compile (or link) time.) It's not just the VAX addressing modes. Consider how frequently the addressing modes that the 68020 had but the 68000 did not were used by compilers; I don't think they were used often. As for decoding, AFAIK parallel decoding is a solved problem (at a cost, but still). - anton -- 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.' Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>