Article <f2d99c60ba76af28c8b63b9628fb56fa@www.novabbs.org>

Warning: mysqli::__construct(): (HY000/1203): User howardkn already has more than 'max_user_connections' active connections in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\includes\artfuncs.php on line 21
Failed to connect to MySQL: (1203) User howardkn already has more than 'max_user_connections' active connections
Warning: mysqli::query(): Couldn't fetch mysqli in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\index.php on line 66
Article <f2d99c60ba76af28c8b63b9628fb56fa@www.novabbs.org>

Deutsch English Français Italiano

<f2d99c60ba76af28c8b63b9628fb56fa@www.novabbs.org>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!news.misty.com!weretis.net!feeder9.news.weretis.net!i2pn.org!i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup1)
Newsgroups: comp.arch
Subject: Re: Tonights Tradeoff
Date: Wed, 11 Sep 2024 21:27:21 +0000
Organization: Rocksolid Light
Message-ID: <f2d99c60ba76af28c8b63b9628fb56fa@www.novabbs.org>
References: <vbgdms$152jq$1@dont-email.me> <17537125c53e616e22f772e5bcd61943@www.novabbs.org> <vbj5af$1puhu$1@dont-email.me> <a37e9bd652d7674493750ccc04674759@www.novabbs.org> <vbog6d$2p2rc$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
	logging-data="1708536"; mail-complaints-to="usenet@i2pn2.org";
	posting-account="65wTazMNTleAJDh/pRqmKE7ADni/0wesT78+pyiDW8A";
User-Agent: Rocksolid Light
X-Rslight-Posting-User: ac58ceb75ea22753186dae54d967fed894c3dce8
X-Rslight-Site: $2y$10$bdHY.2.JTkQdFBl9IJQOLu/MIIpZ/oDUe8C04ej3u.TXQfCr2wafa
X-Spam-Checker-Version: SpamAssassin 4.0.0
Bytes: 5758
Lines: 94

On Tue, 10 Sep 2024 3:59:05 +0000, Robert Finch wrote:

> On 2024-09-08 2:06 p.m., MitchAlsup1 wrote:
>> On Sun, 8 Sep 2024 3:22:55 +0000, Robert Finch wrote:
>>
>>> On 2024-09-07 10:41 a.m., MitchAlsup1 wrote:
>>>> On Sat, 7 Sep 2024 2:27:40 +0000, Robert Finch wrote:
>>>>
>>>>> Making the scalar register file a subset of the vector register file.
>>>>> And renaming only vector elements.
>>>>>
>>>>> There are eight elements in a vector register and each element is
>>>>> 128-bits wide. (Corresponding to the size of a GPR). Vector register
>>>>> file elements are subject to register renaming to allow the full power
>>>>> of the OoO machine to be used to process vectors. The issue is that
>>>>> with
>>>>> both the vector and scalar registers present for renaming there are a
>>>>> lot of registers to rename. It is desirable to keep the number of
>>>>> renamed registers (including vector elements) <= 256 total. So, the 64
>>>>> scalar registers are aliased with the first eight vector registers.
>>>>> Leaving only 24 truly available vector registers. Hm. There are 1024
>>>>> physical registers, so maybe going up to about 300 renamable register
>>>>> would not hurt.
>>>>
>>>> Why do you think a vector register file is the way to go ??
>>>
>>> I think vector use is somewhat dubious, but they have some uses. In many
>>> cases data can be processed just fine without vector registers. In the
>>> current project vector instructions use the scalar functional units to
>>> compute, making them no faster than scalar calcs. But vectors have a lot
>>> of code density where parallel computation on multiple data items using
>>> a single instruction is desirable. I do not know why people use vector
>>> registers in general, but they are present in some modern architectures.
>>
>> There is no doubt that much code can utilize vector arrangements, and
>> that a processor should be very efficient in performing these work
>> loads.
>>
>> The problem I see is that CRAY-like vectors vectorize instructions
>> instead of vectorizing loops. Any kind of flow control within the
>> loop becomes tedious at best.
>>
>> On the other hand, the Virtual Vector Method vectorizes loops and
>> can be implemented such that it performs as well as CRAY-like
>> vector machines without the overhead of a vector register file.
>> In actuality there are only 6-bits of HW flip-flops governing
>> VVM--compared to 4 KBytes for CRAY-1.
>>
>>> Qupls vector registers are 512 bits wide (8 64-bit elements). Bigfoot’s
>>> vector registers are 1024 bits wide (8 128-bit elements).
>>
>> When properly abstracted, one can dedicate as many or few HW
>> flip-flops as staging buffers for vector work loads to suit
>> the implementation at hand. A GBOoO may utilize that 4KB
>> file of CRAY-1 while the little low power core 3-cache lines.
>> Both run the same ASM code and both are efficient in their own
>> sense of "efficient".
>>
>> So, instead of having ~500 vector instructions and ~1000 SIMD
>> instructions one has 2 instructions and a medium scale state
>> machine.
>>
>
>
> Still trying to grasp the virtual vector method. Been wondering if it
> can be implemented using renamed registers.

Think of VVM as a set (8) of staging flip-flops taking data (line) from
L1
and feeding it into 4-wide ALUs then back into another set (4)
flip-flops
which deliver data to L1; with wide muxes to get the LD data aligned
with the SLU and the ALU result aligned back to L1.

Then support this infrastructure with a reservation station-like queue
which can advance (1,2,4) iterations per clock.

The registers named in the asm are named into the staging flip-flops
{like renaming} and the whole thing optimized for multi-lane execution
with 6-bits of total overhead.

> Qupls has RISC-V style vector / SIMD registers. For Q+ every instruction
> can be a vector instruction, as there are bits indicating which
> registers are vector registers in the instruction. All the scalar
> instructions become vector. This cuts down on some of the bloat in the
> ISA. There is only a handful of vector specific instructions (about
> eight I think). The drawback is that the ISA is 48-bits wide. However,
> the code bloat is less than 50% as some instructions have
> dual-operations. Branches can increment or decrement and loop. Bigfoot
> uses a postfix word to indicate to use the vector form of the
> instruction. Bigfoot’s code density is a lot better being variable
> length, but I suspect it will not run as fast. Bigfoot and Q+ share a
> lot of the same code. Trying to make the guts of the cores generic.

Too bad...