Article <uv46rg$e4nb$1@dont-email.me>

Warning: mysqli::__construct(): (HY000/1203): User howardkn already has more than 'max_user_connections' active connections in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\includes\artfuncs.php on line 21
Failed to connect to MySQL: (1203) User howardkn already has more than 'max_user_connections' active connections
Warning: mysqli::query(): Couldn't fetch mysqli in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\index.php on line 66
Article <uv46rg$e4nb$1@dont-email.me>

Deutsch English Français Italiano

<uv46rg$e4nb$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!weretis.net!feeder8.news.weretis.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: BGB <cr88192@gmail.com>
Newsgroups: comp.arch
Subject: Re: "Mini" tags to reduce the number of op codes
Date: Tue, 9 Apr 2024 15:01:50 -0500
Organization: A noiseless patient Spider
Lines: 104
Message-ID: <uv46rg$e4nb$1@dont-email.me>
References: <uuk100$inj$1@dont-email.me>
 <6mqu0j1jf5uabmm6r2cb2tqn6ng90mruvd@4ax.com>
 <15d1f26c4545f1dbae450b28e96e79bd@www.novabbs.org>
 <lf441jt9i2lv7olvnm9t7bml2ib19eh552@4ax.com> <uuv1ir$30htt$1@dont-email.me>
 <d71c59a1e0342d0d01f8ce7c0f449f9b@www.novabbs.org>
 <uv02dn$3b6ik$1@dont-email.me> <uv415n$ck2j$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Tue, 09 Apr 2024 20:01:53 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="f0369543e96ead342fc87227dc1b0b19";
	logging-data="463595"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1/YnfunL7AsMWsJHlJiAs5W7aZRnMTtWvY="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:KFwau5CvQmOcF8DgeSZZJ31HX4w=
In-Reply-To: <uv415n$ck2j$1@dont-email.me>
Content-Language: en-US
Bytes: 5638

On 4/9/2024 1:24 PM, Thomas Koenig wrote:
> I wrote:
> 
>> MitchAlsup1 <mitchalsup@aol.com> schrieb:
>>> Thomas Koenig wrote:
>>>
>>>> John Savard <quadibloc@servername.invalid> schrieb:
>>>
>>>>> Thus, instead of having mode bits, one _could_ do the following:
>>>>>
>>>>> Usually, have 28 bit instructions that are shorter because there's
>>>>> only one opcode for each floating and integer operation. The first
>>>>> four bits in a block give the lengths of data to be used.
>>>>>
>>>>> But have one value for the first four bits in a block that indicates
>>>>> 36-bit instructions instead, which do include type information, so
>>>>> that very occasional instructions for rarely-used types can be mixed
>>>>> in which don't fill a whole block.
>>>>>
>>>>> While that's a theoretical possibility, I don't view it as being
>>>>> worthwhile in practice.
>>>
>>>> I played around a bit with another scheme:  Encoding things into
>>>> 128-bit blocks, with either 21-bit or 42-bit or longer instructions
>>>> (or a block header with six bits, and 20 or 40 bits for each
>>>> instruction).
>>>
>>> Not having seen said encoding scheme:: I suspect you used the Rd=Rs1
>>> destructive operand model for the 21-bit encodings. Yes :: no ??
>>
>> It was not very well developed, I gave it up when I saw there wasn't
>> much to gain.
> 
> Maybe one more thing: In order to justify the more complex encoding,
> I was going for 64 registers, and that didn't work out too well
> (missing bits).
> 
> Having learned about M-Core in the meantime, pure 32-register,
> 21-bit instruction ISA might actually work better.

For 32-bit instructions at least, 64 GPRs can work out OK.

Though, the gain of 64 over 32 seems to be fairly small for most 
"typical" code, mostly bringing a benefit if one is spending a lot of 
CPU time in functions that have large numbers of local variables all 
being used at the same time.

Seemingly:
16/32/48 bit instructions, with 32 GPRs, seems likely optimal for code 
density;
32/64/96 bit instructions, with 64 GPRs, seems likely optimal for 
performance.

Where, 16 GPRs isn't really enough (lots of register spills), and 128 
GPRs is wasteful (would likely need lots of monster functions with 250+ 
local variables to make effective use of this, *, which probably isn't 
going to happen).

*: Where, it appears it is most efficient (for non-leaf functions) if 
the number of local variables is roughly twice that of the number of CPU 
registers. If more local variables than this, then spill/fill rate goes 
up significantly, and if less, then the registers aren't utilized as 
effectively.

Well, except in "tiny leaf" functions, where the criteria is instead 
that the number of local variables be less than the number of scratch 
registers. However, for many/most small leaf functions, the total number 
of variables isn't all that large either.

Where, function categories:
   Tiny Leaf:
     Everything fits in scratch registers, no stack frame, no calls.
   Leaf:
     No function calls (either explicit or implicit);
     Will have a stack frame.
   Non-Leaf:
     May call functions, has a stack frame.

There is a "static assign everything" case in my case, where all of the 
variables are statically assigned to registers (for the scope of the 
function). This case typically requires that everything fit into callee 
save registers, so (like the "tiny leaf" category, requires that the 
number of local variables is less than the available registers).

On a 32 register machine, if there are 14 available callee-save 
registers, the limit is 14 variables. On a 64 register machine, this 
limit might be 30 instead. This seems to have good coverage.

In the non-static case, the top N variables might be static-assigned, 
and the remaining variables dynamically assigned. Though, it appears 
this is more an artifact of my naive register allocator, and might not 
be as effective of a strategy with an "actually clever" register 
allocator (like those in GCC or LLVM), where purely dynamic allocation 
may be better (they are able to carry dynamic assignments across basic 
block boundaries, rather than needing to spill/fill everything whenever 
a branch or label is encountered).

....