Article <v09gmk$1te49$1@dont-email.me>

Warning: mysqli::__construct(): (HY000/1203): User howardkn already has more than 'max_user_connections' active connections in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\includes\artfuncs.php on line 21
Failed to connect to MySQL: (1203) User howardkn already has more than 'max_user_connections' active connections
Warning: mysqli::query(): Couldn't fetch mysqli in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\index.php on line 66
Article <v09gmk$1te49$1@dont-email.me>

Deutsch English Français Italiano

<v09gmk$1te49$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!3.eu.feeder.erje.net!feeder.erje.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: BGB <cr88192@gmail.com>
Newsgroups: comp.arch
Subject: Re: Short Vectors Versus Long Vectors
Date: Tue, 23 Apr 2024 18:36:49 -0500
Organization: A noiseless patient Spider
Lines: 99
Message-ID: <v09gmk$1te49$1@dont-email.me>
References: <v06vdb$17r2v$1@dont-email.me>
 <5451dcac941e1f569397a5cc7818f68f@www.novabbs.org>
 <v078td$1df76$4@dont-email.me> <2024Apr23.082238@mips.complang.tuwien.ac.at>
 <v098so$1rp16$1@dont-email.me>
 <d46723273a62c22283387893da40e7e4@www.novabbs.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 24 Apr 2024 01:36:53 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="9b128de1d320d65ee4a49c78ee8f6778";
	logging-data="2013321"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX19zMhC1KKeS+RUm1LHgVfzDIO+QWy+Xx3k="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:W0jlgXWGw56TtcjJwe26bdTbG1k=
Content-Language: en-US
In-Reply-To: <d46723273a62c22283387893da40e7e4@www.novabbs.org>
Bytes: 4393

On 4/23/2024 5:39 PM, MitchAlsup1 wrote:
> BGB wrote:
> 
>> On 4/23/2024 1:22 AM, Anton Ertl wrote:
>>> Lawrence D'Oliveiro <ldo@nz.invalid> writes:
>>>> On Tue, 23 Apr 2024 02:14:32 +0000, MitchAlsup1 wrote:
>>>> big snip>
> 
> 
>> As can be noted, SIMD is easy to implement.
> 
> ADD/SUB is, MUL and DIV and SHIFTs and CMPs are not; especially when
> MUL does 2n = n × n and DIV does 2n / n -> n (quotient) + n (remainder)
> 

MUL:
Have a few instructions, giving the Low, High-Signed, and High-Unsigned 
results.


DIV:
Didn't bother with this.
Typically faked using multiply-by-reciprocal and taking the high result.

Something like MOD would need to be faked, but SIMD modulo doesn't 
really tend to be a thing IME.

Division by a non-constant scalar/vector value will need a runtime call.


SHIFT:
Mostly faked using ALU shifts and masking.

CMPxx:
There are dedicated instructions for this.


>> Main obvious drawback is the potential for combinatorial explosions of 
>> instructions. One needs to keep a fairly careful watch over this.
> 
>> Like, if one is faced with an NxN or NxM grid of possibilities, naive 
>> strategy is to be like "I will define an instruction for every 
>> possibility in the grid.", but this is bad. More reasonable to devise 
>> a minimal set of instructions that will allow the operation to be done 
>> within in a reasonable number of instructions.
> 
>> But, then again, I can also note that I axed things like packed-byte 
>> operations and saturating arithmetic, which are pretty much de-facto 
>> in packed-integer SIMD.
> 
> MANY SIMD algorithms need saturating arithmetic because they cannot do
> b + b -> h and avoid the overflow. And they cannot do B + b -> h because
> that would consume vast amounts of encoding space.
> 

There are ways to fake it.

Though, granted, most end up involving extra instructions and 1 bit of 
dynamic range.

Though, the main case where one can't spare any dynamic range is 
typically packed byte, which I had skipped (in favor of faking 
packed-byte scenarios using packed word).


But, could add, say:
   PSHAR.W  Rm, Rn  //Packed Shift right 1 bit, arithmetic
   PSHLR.W  Rm, Rn  //Packed Shift right 1 bit, logical
   PSHAL.W  Rm, Rn  //Packed Shift left 1 bit, arithmetic saturate
   PSHLL.W  Rm, Rn  //Packed Shift left 1 bit, logical saturate

In a naive case, one could fake a virtual PADDSS.W instruction as, say:
   PSHAR.W  R4, R16
   PSHAR.W  R5, R17
   PADD.W   R16, R17, R18
   PSHAL.W  R18, R2


These could more efficiently address both saturation, and 1-bit shift 
(the most common case).

Shift left 1-bit (without saturation) can generally be handled with:
   PADD.W R4, R4, R2
Or similar.


>> Likewise, a lot of the gaps are filled in with specialized converter 
>> and helper ops. Even here, some conversion chains will require 
>> multiple instructions.
> 
>> Well, and if there is no practical difference between a scalar and 
>> SIMD version of an instruction, may well just use the SIMD version for 
>> scalar.
> 
>> ....
> 
> 
>>> - anton