Article <2024May12.083633@mips.complang.tuwien.ac.at>

Warning: mysqli::__construct(): (HY000/1203): User howardkn already has more than 'max_user_connections' active connections in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\includes\artfuncs.php on line 21
Failed to connect to MySQL: (1203) User howardkn already has more than 'max_user_connections' active connections
Warning: mysqli::query(): Couldn't fetch mysqli in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\index.php on line 66
Article <2024May12.083633@mips.complang.tuwien.ac.at>
Deutsch English Français Italiano
<2024May12.083633@mips.complang.tuwien.ac.at>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!news.mixmin.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Compiler use of instructions (was: Oops)
Date: Sun, 12 May 2024 06:36:33 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 171
Message-ID: <2024May12.083633@mips.complang.tuwien.ac.at>
References: <ofeq3j9ni63e7tmccf2qbkb9t0naui44ei@4ax.com> <memo.20240510001905.16164S@jgd.cix.co.uk> <2024May11.194851@mips.complang.tuwien.ac.at> <46067b1e52e8998acdd607199d3dbf30@www.novabbs.org>
Injection-Date: Sun, 12 May 2024 10:39:08 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="fa63f79be1b668e15e3900e1b4d19fc8";
	logging-data="2773410"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1/1zOCmcRBX2DDB9SPcqfyx"
Cancel-Lock: sha1:Sp4rTW5vOx5vrwe4yPwNkIzB9EE=
X-newsreader: xrn 10.11
Bytes: 9177

mitchalsup@aol.com (MitchAlsup1) writes:
>Anton Ertl wrote:
>> 1) The compiler writers found it too hard to use the complex
>> instructions or addressing modes.  For some kinds of instructions that
>> is the case (e.g, for the AES instructions in Intel and AMD CPUs), but
>> at least these days such instructions are there for use in libraries
>> written in assembly language/with intrinsics.
>
>The 801 was correct on this::
>
>The compiler must be developed at the same time as ISA, if ISA has it 
>and the compiler cannot use it then why is it there {yes there are 
>certain privileged instructions lacking this property}

In case of the AMD64+ AES instructions, they are there to support
efficient AES libraries that do not have the cache side channel for which
Daniel Bernstein presented an exploit in:

@Unpublished{bernstein05,
  author =       {Bernstein, Daniel J.},
  title =        {Cache-timing attacks on {AES}},
  note =         {},
  year =         {2005},
  url={https://cr.yp.to/antiforgery/cachetiming-20050414.pdf},
  OPTannote =    {}
}

So they are there because AES consumes significant amounts of CPU in
some application scenarios, and because providing these instructions
is helpful for security.

In instruction sets like the S/360, VAX and 8086 that were designed
when significant amounts of software were still written in assembly
language, there are some instructions that are designed for use by
assembly language programmers that compilers tend not to use or not
use well (e.g., LODS on the 8086, IA-32 and AMD64).  But these kinds
of instructions have not been added in the last decades, since
assembly-language programming has become a tiny niche.

>Conversely is 
>compiler could almost use an instruction but does not, then adjust 
>the instruction specification so the compiler can !!

Do you have something specific in mind?

>> 2) Some instructions are slower than a sequence of simpler
>> instructions, so compilers will avoid them even if they would
>> otherwise use them.
>
>VAX CALL instructions did more work than what was required, it did
>the work it was specified to perform as rapidly as the HW could perform
>the specified task. It took 10 years to figure out that the CALL/RET
>overhead was excessive and wasteful.

Interestingly, when I look into
<https://inst.eecs.berkeley.edu/~n252/paper/RISC-patterson.pdf>, the
point 2) is mentioned on page 27 as "Irrational Implementations", and
the examples given are the S/370 load-multiple instruction for fewer
than 4 registers and the VAX INDEX instruction.  On page 30 they
mention that PUSHL R0 is slower than MOVL R0, -(SP) on the VAX 11/780.

In <https://dl.acm.org/doi/pdf/10.1145/2465.214917> from 1985 (<8
years after the VAX was introduced) the authors already report that
Michael L. Powell of DEC found the following for his experimental
Modula-2 compiler: "By replacing the CALL instruction with a sequence
of simple instructions that do only what is necessary, Powell was able
to improve performance by 20 percent."  I found
<https://dl.acm.org/doi/pdf/10.1145/502874.502905> about this
compiler, published in June 1984; it says:

|On most processors, the procedure calling sequence defines a standard
|interface between languages. As such, it is often more general than
|required by a particular programming language. The compiler can detect
|procedures that are called only by Modula-2 routines in the current
|compilation, and replace the more expensive procedure call mechanism
|with a simpler, faster one.

Given how often the VAX call is mentioned, I would expect to find some
paper with a more elaborate analysis (and I dimly remember reading
such), but came up empty in short web searches.  Anyway, let's look at
the VAX CALLS instruction
<http://odl.sysworks.biz/disk$vaxdocmar002/opsys/vmsos721/4515/4515pro_020.html>
<https://people.computing.clemson.edu/~mark/subroutines/vax.html>.

What it pushes on the stack:

argument count
registers specified in a mask that is at the start of the callee
old pc
old fp
old ap (argument pointer)
mask|psw
condition handler (initially 0)

Of these:

* Pushing the argument count is not done in modern calling
  conventions.  Instead, for languages like C with variable argument
  numbers, the caller is responsible for removing the arguments from
  the stack (if there are any on the stack, and in C the callee has to
  know how many arguments there are (e.g., from the format string for
  printf()).  I guess VAX CALLS pushes the argument count in order to
  have a common base for these things.

* The registers specified in the mask would be the callee-saved
  registers that the callee is using (and apparently in the usual VAX
  calling convention all registers are considered to be callee-saved
  to minimize the code size of the call and function entry code.
  Modern calling conventions also have caller-saved registers which
  makes leaf calls cheaper (up to a point).  The mask feature could
  also be used for such calling conventions, but for many leaf
  functions the mask would then be empty.

* Old PC is also stored by the simpler JSB instruction.  RISCs store
  the old PC in a register and leave the saving to a separate store
  instruction (which is unnecessary for a leaf function).

* Old FP is stored in frame-pointer-based calling conventions, but a
  frame pointer tends to be optional and is only used for functions
  where the stack grows by a variable amount (e.g., alloca()) or is
  needed for introspective purposes (debugging and such).

* Old AP: an argument pointer seems unnecessary given that the
  compiler knows how big the data saved by the CALLS instruction is.
  I also have never seen it in a calling convention other than that of
  the VAX.  Maybe they added this so that the backtrace can easily
  access the arguments without having to know anything about the stack
  frames.

* mask|psw: Given the use of the mask for saving the registers on
  call, the return instruction also needs the mask; either as
  immediate argument, or on the stack; the latter is better for stack
  unwinding (throwing exceptions, debugging, and such).  Modern
  calling conventions treat flags (if present) as caller-saved, but
  given that they had space left, saving the psw seems like a good use
  of the space.

* condition handler: no idea what that was good for.

It seems to me that a lot of this caters to having a good software
ecosystem where lots of languages can call each other and stuff like
debuggers easily know lots of things about the program.  As the RISCs
have demonstrated, this has a cost in performance, but OTOH, have we
seen comparisons of the size and capabilities of backtrace-generating
code and the like on VAX and RISCs?

Of course, if the performance cost of these features is so high that
compiler writers prefer to avoid the full-featured call instruction,
the instruction misses the target, too.

>>                      That has been reported by both the IBM 801
>> project about some S/370 instructions and by the Berkeley RISC project
>> about the VAX.  I don't remember any reports about addressing modes
>> with that problem.
>
>The problem with address modes is their serial decode, not with the ability
>to craft any operand the instruction needs. The second problem with VAX-like
>addressing modes is that it is overly expressive, all operands can be
>constants, whereas a good compiler will never need more than 1 constant
>per instruction (because otherwise some constant arithmetic could be
>performed at compile (or link) time.)

It's not just the VAX addressing modes.  Consider how frequently the
addressing modes that the 68020 had but the 68000 did not were used by
compilers; I don't think they were used often.  As for decoding, AFAIK
parallel decoding is a solved problem (at a cost, but still).

- anton
-- 
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
  Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>