Article <vedg1s$43mp$1@dont-email.me>

Deutsch English Français Italiano
<vedg1s$43mp$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!news.mixmin.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Robert Finch <robfi680@gmail.com>
Newsgroups: comp.arch
Subject: Re: Tonights Tradeoff - Carry and Overflow
Date: Sat, 12 Oct 2024 05:38:01 -0400
Organization: A noiseless patient Spider
Lines: 107
Message-ID: <vedg1s$43mp$1@dont-email.me>
References: <vbgdms$152jq$1@dont-email.me> <vbog6d$2p2rc$1@dont-email.me>
 <f2d99c60ba76af28c8b63b9628fb56fa@www.novabbs.org>
 <vc61e6$21skv$1@dont-email.me> <vc8gl4$2m5tp$1@dont-email.me>
 <vcv5uj$3arh6$1@dont-email.me>
 <37067f65c5982e4d03825b997b23c128@www.novabbs.org>
 <vd352q$3s1e$1@dont-email.me>
 <5f8ee3d3b2321ffa7e6c570882686b57@www.novabbs.org>
 <vd6a5e$o0aj$2@dont-email.me> <vdnpg4$3c9e$2@dont-email.me>
 <2024Oct4.081931@mips.complang.tuwien.ac.at> <vdp343$9d38$1@dont-email.me>
 <2024Oct5.114309@mips.complang.tuwien.ac.at> <ve5mpq$2jt5k$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 12 Oct 2024 11:38:04 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="08e3a93de117abaa6a6a445b83662458";
	logging-data="134873"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1+Aic68P6B8Bk9DJgGgLPRXhPm+EYVf6sk="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:eDCOuRJ/Z+wzQiWMR0vZ+WxHL3M=
In-Reply-To: <ve5mpq$2jt5k$1@dont-email.me>
Content-Language: en-US
Bytes: 6856

On 2024-10-09 6:44 a.m., Robert Finch wrote:
> On 2024-10-05 5:43 a.m., Anton Ertl wrote:
>> Robert Finch <robfi680@gmail.com> writes:
>>> On 2024-10-04 2:19 a.m., Anton Ertl wrote:
>>>> 4) Keep the flags results along with GPRs: have carry and overflow as
>>>> bit 64 and 65, N is bit 63, and Z tells something about bits 0-63.
>>>> The advantage is that you do not have to track the flags separately
>>>> (and, in case of AMD64, track each of C, O, and NZP separately), but
>>>> instead can use the RAT that is already there for the GPRs.  You can
>>>> find a preliminary paper on that on
>>>> <https://www.complang.tuwien.ac.at/anton/tmp/carry.pdf>.
>> ...
>>> One solution, not mentioned in your article, is to support arithmetic
>>> with two bits less than the number of bit a register can support, so
>>> that the carry and overflow can be stored. On a 64-bit machine have all
>>> operations use only 62-bits. It would solve the issue of how to load or
>>> store the carry and overflow bits associated with a register.
>>
>> Yes, that's a solution, but the question is how well existing software
>> would react to having no int64_t (and equivalent types, such as long
>> long), but instead an int62_t (or maybe int63_t, if the 64th bit is
>> used for both signed and unsigned overflow, by having separate signed
>> and unsigned addition etc.).  I expect that such an architecture would
>> have low acceptance.  By contrast, in my paper I suggest an addition
>> to existing 64-bit architectures that has fewer of the same
>> disadvantages as the widely-used condition-code-register approach has,
>> but still has a few of them.
>>
>>> Sometimes
>>> arithmetic is performed with fewer bits, as for pointer representation.
>>> I wonder if pointer masking could somehow be involved. It may be useful
>>> to have a bit indicating the presence of a pointer. Also thinking of how
>>> to track a binary point position for fixed point arithmetic. Perhaps
>>> using the whole upper byte of a register for status/control bits 
>>> would work.
>>
>> There are some extensions for AMD64 in that direction.
>>
>>> It may be possible with Q+ to support a second destination register
>>> which is in a subset of the GPRs. For example, one of eight registers
>>> could be specified to holds the carry/overflow status. That effectively
>>> ties up a second ALU though as an extra write port is needed for the
>>> instruction.
>>
>> Needing only one write port is an advantage of my approach.
>>
>> - anton
> 
> Been thinking some about the carry and overflow and what to do about 
> register spills and reloads during expression processing. My thought was 
> that on the machine with 256 registers, simply allocate a ridiculous 
> number of registers for expression processing, for example 25 or even 
> 50. Then if the expression is too complex, have the compiler spit out an 
> error message to the programmer to simplify the expression. Remnants of 
> the ‘expression too complex’ error in BASIC. So, there are no spills or 
> reloads during expression processing. I think the storextra / loadextra 
> registers used during context switching would work okay. But in Q+ there 
> are 256 regs which require eight storextra / loadextra registers. I 
> think the store extra / load extra registers could be hidden in the 
> context save and restore hardware. Not even requiring access via CSRs or 
> whatever. I suppose context loads and stores could be done in blocks of 
> 32 registers. An issue is that the load extra needs to be done before 
> registers are loaded. So, the extra word full of carry/overflow bits 
> would need to be fetched in a non-sequential fashion. Assuming for 
> instance, that saving register values is followed by a save of the CO 
> word. Then it is positioned wrong for a sequential load. It may be 
> better to have the wrong position for a store, so loads can proceed 
> sequentially.
> It strikes me that there is no real good solution, only perhaps an 
> engineered one. Toyed with the idea of having 16 separate flags 
> registers, but not liking that as a solution as much as the store/load 
> extra.
> 
> Another thought is to store additional info such as a CRC check of the 
> register file on context save and restore.
> 
> *****
> 
> Finally wrote the SM to walk the ROB backwards and restore register 
> mappings for a checkpoint restore. Cannot get Q+ to do more than light 
> up one LED in SIM. Register values are not propagating properly.
> 
> 
Mulled over carry and overflow in arithmetic operations. Looked at 
widening the datapath to 66-bits to hold carry and overflow bits. 
Thinking it may increase the size of the design by over 3% just to 
support carry and overflow. For now, an instruction, ADDGC, was added to 
generate the carry bit as a result. A 256-bit add looks like:

; 256 bit add
; A = r1,r2,r3,r4
; B = r5,r6,r7,r8
; S = r9,r10,r11,r12

	add r9,r1,r5,r0
	addgc r13,r1,r5,r0
	add r10,r2,r6,r13
	addgc r13,r2,r6,r13
	add r11,r7,r3,r13
	addgc r13,r7,r3,r13
	add r12,r8,r4,r13

Not very elegant a solution, but it is simple. I think it requires 
minimal hardware. Three input ADD is already present and ADDGC just 
routes the carry bit to the output.