Article <uv7l00$1fc2u$2@dont-email.me>

Warning: mysqli::__construct(): (HY000/1203): User howardkn already has more than 'max_user_connections' active connections in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\includes\artfuncs.php on line 21
Failed to connect to MySQL: (1203) User howardkn already has more than 'max_user_connections' active connections
Warning: mysqli::query(): Couldn't fetch mysqli in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\index.php on line 66
Article <uv7l00$1fc2u$2@dont-email.me>
Deutsch English Français Italiano
<uv7l00$1fc2u$2@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: BGB <cr88192@gmail.com>
Newsgroups: comp.arch
Subject: Re: "Mini" tags to reduce the number of op codes
Date: Wed, 10 Apr 2024 22:21:33 -0500
Organization: A noiseless patient Spider
Lines: 137
Message-ID: <uv7l00$1fc2u$2@dont-email.me>
References: <uuk100$inj$1@dont-email.me>
 <6mqu0j1jf5uabmm6r2cb2tqn6ng90mruvd@4ax.com>
 <15d1f26c4545f1dbae450b28e96e79bd@www.novabbs.org>
 <lf441jt9i2lv7olvnm9t7bml2ib19eh552@4ax.com> <uuv1ir$30htt$1@dont-email.me>
 <d71c59a1e0342d0d01f8ce7c0f449f9b@www.novabbs.org>
 <uv02dn$3b6ik$1@dont-email.me> <uv415n$ck2j$1@dont-email.me>
 <uv46rg$e4nb$1@dont-email.me>
 <a81256dbd4f121a9345b151b1280162f@www.novabbs.org>
 <uv4ghh$gfsv$1@dont-email.me>
 <8e61b7c856aff15374ab3cc55956be9d@www.novabbs.org>
 <uv7h9k$1ek3q$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 11 Apr 2024 05:21:36 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="059e35bc5e274e101eeeb06f16103042";
	logging-data="1552478"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX19y7LfSOloJZ6oHCUo6Vbjn5bhFepfZDUw="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:4JT0sjweHF+HrsfwzcOmN1+QaVs=
In-Reply-To: <uv7h9k$1ek3q$1@dont-email.me>
Content-Language: en-US
Bytes: 7148

On 4/10/2024 9:18 PM, Paul A. Clayton wrote:
> On 4/9/24 8:28 PM, MitchAlsup1 wrote:
>> BGB-Alt wrote:
> [snip]
>>> Things like memcpy/memmove/memset/etc, are function calls in cases 
>>> when not directly transformed into register load/store sequences.
>>
>> My 66000 does not convert them into LD-ST sequences, MM is a single 
>> instruction.
> 
> I wonder if it would be useful to have an immediate count form of
> memory move. Copying fixed-size structures would be able to use an
> immediate. Aside from not having to load an immediate for such
> cases, there might be microarchitectural benefits to using a
> constant. Since fixed-sized copies would likely be limited to
> smaller regions (with the possible exception of 8 MiB page copies)
> and the overhead of loading a constant for large sizes would be
> tiny, only providing a 16-bit immediate form might be reasonable.
> 

As noted, in my case, the whole thing of Ld/St sequences, and memcpy 
slides, mostly applies to constant cases.

If the copy size is variable, the compiler merely calls "memcpy()", 
which will then generally figure out which loop to use, and one has to 
pay the penalty of the runtime overhead of memcpy needing to figure out 
what it needs to do.


>>> Did end up with an intermediate "memcpy slide", which can handle 
>>> medium size memcpy and memset style operations by branching into a 
>>> slide.
>>
>> MMs and MSs that do not cross page boundaries are ATOMIC. The entire 
>> system
>> sees only the before or only the after state and nothing in between. 
> 
> I still feel that this atomicity should somehow be included with
> ESM just because they feel related, but the benefit seems likely
> to be extremely small. How often would software want to copy
> multiple regions atomically or combine region copying with
> ordinary ESM atomicity?? There *might* be some use for an atomic
> region copy and an updating of a separate data structure (moving a
> structure and updating one or a very few pointers??). For
> structures three cache lines in size where only one region
> occupies four cache lines, ordinary ESM could be used.
> 
> My feeling based on "relatedness" is not a strong basis for such
> an architectural design choice.
> 
> (Simple page masking would allow false conflicts when smaller
> memory moves are used. If there is a separate pair of range
> registers that is checked for coherence of memory moves, this
> issue would only apply for multiple memory moves _and_ all eight
> of the buffer entries could be used for smaller accesses.)
> 

All seems a bit complicated to me.

But, as noted, I went for a model of weak memory coherence and leaving 
most of this stuff for software to sort out.


> [snip]
>>> As noted, on a 32 GPR machine, most leaf functions can fit entirely 
>>> in scratch registers. 
>>
>> Which is why one can blow GPRs for SP, FP, GOT, TLS, ... without 
>> getting totally screwed.
> 
> I wonder how many instructions would have to have access to such a
> set of "special registers" and if a larger number of extra
> registers would be useful. (One of the issues — in my opinion —
> with PowerPC's link register and count register was that they
> could not be directly loaded from or stored to memory [or loaded \
> with a constant from the instruction stream]. For counted loops,
> loading the count register from the instruction stream would
> presumably have allowed early branch determination even for deep
> pipelines and small loop counts.) SP, FP, GOT, and TLS hold
> "stable values", which might facilitate some microarchitectural
> optimizations compared to more frequently modified register names.
> 
> (I am intrigued by the possibility of small contexts for some 
> multithreaded workloads, similar to how some GPUs allow variable context 
> sizes.)

In my case, yeah, there are two semi-separate register spaces here:
   GPRs: R0..R63
     R0, R1, and R15 are Special
       R0/DLR: Hard-coded register for some instructions;
         Assembler may stomp without warning for pseudo-instructions.
       R1/DHR:
         Was originally intended similar to DLR;
         Now mostly used as an auxiliary link register.
       R15/SP:
         Stack Pointer.
   CRs: C0..C63
     Various special purpose registers;
     Most are privileged only.
     LR, GBR, etc, are in CR space.


Though, internally, GPRs and CRs both exist within a combined register 
space in the CPU:
   00..3F: Mostly GPR space
   40..7F: CR and SPR space.

Generally, CRs may only be accessed by certain register ports though.


By default, the only way to save/restore CRs is by shuffling them 
through GPRs. There is an optional MOV.C instruction for this, but 
generally it is not enabled as it isn't clear that it saves enough to be 
worth the added LUT cost.

There is a subset version, where MOV.C exists, but is only really able 
to be used with LR and GBR and similar. Generally, this version exists 
as RISC-V Mode needs to be able to save/restore these registers (they 
exist in the GPR space in RISC-V).


As I can note, if I did a new ISA, most likely the register assignment 
scheme would differ, say:
   R0: ZR / PC
   R1: LR / TP (TBR)
   R2: SP
   R3: GP (GBR)
Where the interpretation of R0 and R1 would depend on context (ZR and LR 
for most instructions, PC and TP when used as a Ld/St base address).


Though, some ideas had involved otherwise keeping a similar register 
space layout to my existing ABI, mostly because significant ABI changes 
would not be easy for my compiler as-is.