Deutsch English Français Italiano |
<v09gmk$1te49$1@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!3.eu.feeder.erje.net!feeder.erje.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: BGB <cr88192@gmail.com> Newsgroups: comp.arch Subject: Re: Short Vectors Versus Long Vectors Date: Tue, 23 Apr 2024 18:36:49 -0500 Organization: A noiseless patient Spider Lines: 99 Message-ID: <v09gmk$1te49$1@dont-email.me> References: <v06vdb$17r2v$1@dont-email.me> <5451dcac941e1f569397a5cc7818f68f@www.novabbs.org> <v078td$1df76$4@dont-email.me> <2024Apr23.082238@mips.complang.tuwien.ac.at> <v098so$1rp16$1@dont-email.me> <d46723273a62c22283387893da40e7e4@www.novabbs.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Date: Wed, 24 Apr 2024 01:36:53 +0200 (CEST) Injection-Info: dont-email.me; posting-host="9b128de1d320d65ee4a49c78ee8f6778"; logging-data="2013321"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19zMhC1KKeS+RUm1LHgVfzDIO+QWy+Xx3k=" User-Agent: Mozilla Thunderbird Cancel-Lock: sha1:W0jlgXWGw56TtcjJwe26bdTbG1k= Content-Language: en-US In-Reply-To: <d46723273a62c22283387893da40e7e4@www.novabbs.org> Bytes: 4393 On 4/23/2024 5:39 PM, MitchAlsup1 wrote: > BGB wrote: > >> On 4/23/2024 1:22 AM, Anton Ertl wrote: >>> Lawrence D'Oliveiro <ldo@nz.invalid> writes: >>>> On Tue, 23 Apr 2024 02:14:32 +0000, MitchAlsup1 wrote: >>>> big snip> > > >> As can be noted, SIMD is easy to implement. > > ADD/SUB is, MUL and DIV and SHIFTs and CMPs are not; especially when > MUL does 2n = n × n and DIV does 2n / n -> n (quotient) + n (remainder) > MUL: Have a few instructions, giving the Low, High-Signed, and High-Unsigned results. DIV: Didn't bother with this. Typically faked using multiply-by-reciprocal and taking the high result. Something like MOD would need to be faked, but SIMD modulo doesn't really tend to be a thing IME. Division by a non-constant scalar/vector value will need a runtime call. SHIFT: Mostly faked using ALU shifts and masking. CMPxx: There are dedicated instructions for this. >> Main obvious drawback is the potential for combinatorial explosions of >> instructions. One needs to keep a fairly careful watch over this. > >> Like, if one is faced with an NxN or NxM grid of possibilities, naive >> strategy is to be like "I will define an instruction for every >> possibility in the grid.", but this is bad. More reasonable to devise >> a minimal set of instructions that will allow the operation to be done >> within in a reasonable number of instructions. > >> But, then again, I can also note that I axed things like packed-byte >> operations and saturating arithmetic, which are pretty much de-facto >> in packed-integer SIMD. > > MANY SIMD algorithms need saturating arithmetic because they cannot do > b + b -> h and avoid the overflow. And they cannot do B + b -> h because > that would consume vast amounts of encoding space. > There are ways to fake it. Though, granted, most end up involving extra instructions and 1 bit of dynamic range. Though, the main case where one can't spare any dynamic range is typically packed byte, which I had skipped (in favor of faking packed-byte scenarios using packed word). But, could add, say: PSHAR.W Rm, Rn //Packed Shift right 1 bit, arithmetic PSHLR.W Rm, Rn //Packed Shift right 1 bit, logical PSHAL.W Rm, Rn //Packed Shift left 1 bit, arithmetic saturate PSHLL.W Rm, Rn //Packed Shift left 1 bit, logical saturate In a naive case, one could fake a virtual PADDSS.W instruction as, say: PSHAR.W R4, R16 PSHAR.W R5, R17 PADD.W R16, R17, R18 PSHAL.W R18, R2 These could more efficiently address both saturation, and 1-bit shift (the most common case). Shift left 1-bit (without saturation) can generally be handled with: PADD.W R4, R4, R2 Or similar. >> Likewise, a lot of the gaps are filled in with specialized converter >> and helper ops. Even here, some conversion chains will require >> multiple instructions. > >> Well, and if there is no practical difference between a scalar and >> SIMD version of an instruction, may well just use the SIMD version for >> scalar. > >> .... > > >>> - anton