Warning: mysqli::__construct(): (HY000/1203): User howardkn already has more than 'max_user_connections' active connections in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\includes\artfuncs.php on line 21
Failed to connect to MySQL: (1203) User howardkn already has more than 'max_user_connections' active connections
Warning: mysqli::query(): Couldn't fetch mysqli in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\index.php on line 66
Article <a37e9bd652d7674493750ccc04674759@www.novabbs.org>
Deutsch   English   Français   Italiano  
<a37e9bd652d7674493750ccc04674759@www.novabbs.org>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!weretis.net!feeder9.news.weretis.net!news.nk.ca!rocksolid2!i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup1)
Newsgroups: comp.arch
Subject: Re: Tonights Tradeoff
Date: Sun, 8 Sep 2024 18:06:48 +0000
Organization: Rocksolid Light
Message-ID: <a37e9bd652d7674493750ccc04674759@www.novabbs.org>
References: <vbgdms$152jq$1@dont-email.me> <17537125c53e616e22f772e5bcd61943@www.novabbs.org> <vbj5af$1puhu$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
	logging-data="1325097"; mail-complaints-to="usenet@i2pn2.org";
	posting-account="65wTazMNTleAJDh/pRqmKE7ADni/0wesT78+pyiDW8A";
User-Agent: Rocksolid Light
X-Spam-Checker-Version: SpamAssassin 4.0.0
X-Rslight-Site: $2y$10$d4tNArLoItQYgAlGTQMPPe18J3oeSSCUXRMGAqbX7eKcDVaa.9cjq
X-Rslight-Posting-User: ac58ceb75ea22753186dae54d967fed894c3dce8
Bytes: 4605
Lines: 72

On Sun, 8 Sep 2024 3:22:55 +0000, Robert Finch wrote:

> On 2024-09-07 10:41 a.m., MitchAlsup1 wrote:
>> On Sat, 7 Sep 2024 2:27:40 +0000, Robert Finch wrote:
>>
>>> Making the scalar register file a subset of the vector register file.
>>> And renaming only vector elements.
>>>
>>> There are eight elements in a vector register and each element is
>>> 128-bits wide. (Corresponding to the size of a GPR). Vector register
>>> file elements are subject to register renaming to allow the full power
>>> of the OoO machine to be used to process vectors. The issue is that with
>>> both the vector and scalar registers present for renaming there are a
>>> lot of registers to rename. It is desirable to keep the number of
>>> renamed registers (including vector elements) <= 256 total. So, the 64
>>> scalar registers are aliased with the first eight vector registers.
>>> Leaving only 24 truly available vector registers. Hm. There are 1024
>>> physical registers, so maybe going up to about 300 renamable register
>>> would not hurt.
>>
>> Why do you think a vector register file is the way to go ??
>
> I think vector use is somewhat dubious, but they have some uses. In many
> cases data can be processed just fine without vector registers. In the
> current project vector instructions use the scalar functional units to
> compute, making them no faster than scalar calcs. But vectors have a lot
> of code density where parallel computation on multiple data items using
> a single instruction is desirable. I do not know why people use vector
> registers in general, but they are present in some modern architectures.

There is no doubt that much code can utilize vector arrangements, and
that a processor should be very efficient in performing these work
loads.

The problem I see is that CRAY-like vectors vectorize instructions
instead of vectorizing loops. Any kind of flow control within the
loop becomes tedious at best.

On the other hand, the Virtual Vector Method vectorizes loops and
can be implemented such that it performs as well as CRAY-like
vector machines without the overhead of a vector register file.
In actuality there are only 6-bits of HW flip-flops governing
VVM--compared to 4 KBytes for CRAY-1.

> Qupls vector registers are 512 bits wide (8 64-bit elements). Bigfoot’s
> vector registers are 1024 bits wide (8 128-bit elements).

When properly abstracted, one can dedicate as many or few HW
flip-flops as staging buffers for vector work loads to suit
the implementation at hand. A GBOoO may utilize that 4KB
file of CRAY-1 while the little low power core 3-cache lines.
Both run the same ASM code and both are efficient in their own
sense of "efficient".

So, instead of having ~500 vector instructions and ~1000 SIMD
instructions one has 2 instructions and a medium scale state
machine.

> One use I am considering is the graphics transform function for doing
> rotates and translates of pixels. It uses a 3x4 matrix. ATM this is done
> with specially dedicated registers, but the matrix could be fit into a
> vector register and the transform function applied with it. Another use
> is neural net instructions.
>
>
> I added a fixed length vector type to the compiler to make it easier to
> experiment with vectors.
>
> The processor handles vector instructions by replicating them one to
> eight times depending on the vector length. It then fixes up the
> register spec fields with incrementing register numbers for each
> instruction. They get fed into the remainder of the CPU as a series of
> scalar instructions.