| Deutsch English Français Italiano |
|
<ve54vi$2i0bh$1@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!news.mixmin.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: BGB <cr88192@gmail.com>
Newsgroups: comp.arch
Subject: Re: Misc: BGBCC targeting RV64G, initial results...
Date: Wed, 9 Oct 2024 00:40:02 -0500
Organization: A noiseless patient Spider
Lines: 105
Message-ID: <ve54vi$2i0bh$1@dont-email.me>
References: <vd5uvd$mdgn$1@dont-email.me> <vd69n0$o0aj$1@dont-email.me>
<vd6tf8$r27h$1@dont-email.me>
<1b8c005f36fd5a86532103a8fb6a9ad6@www.novabbs.org>
<vd7gk6$tquh$1@dont-email.me>
<abf735f7cab1885028cc85bf34130fe9@www.novabbs.org>
<vd80r8$148fc$1@dont-email.me>
<58bd95eee31b53933be111d0d941203a@www.novabbs.org>
<vdd1s0$22tpk$1@dont-email.me> <vdgh8i$2m458$1@dont-email.me>
<vdlk51$3lm0a$1@dont-email.me>
<dd19cb13c16cec5913df46da8083c867@www.novabbs.org>
<ve3vru$29pnk$1@dont-email.me>
<b9e523bdf11b3422c719ce2d8ad4c9d4@www.novabbs.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 09 Oct 2024 07:40:02 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="5a0c6d63c9b45fdf5260c4b6d0071564";
logging-data="2687345"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+OvyhEAnV9gTQ+8vrkD9+57LHT5UzAJDs="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:4ODit6Pml+nkCa974KMjLDnBGFU=
In-Reply-To: <b9e523bdf11b3422c719ce2d8ad4c9d4@www.novabbs.org>
Content-Language: en-US
Bytes: 5226
On 10/8/2024 3:48 PM, MitchAlsup1 wrote:
> On Tue, 8 Oct 2024 19:06:34 +0000, BGB wrote:
>
>> On 10/5/2024 6:10 PM, MitchAlsup1 wrote:
>>> On Thu, 3 Oct 2024 8:20:46 +0000, BGB wrote:
>>>
>>>>> How well does JTT work with large tables? What if there are several
>>>>> hundred table entries?
>>>
>>> Tables can have 2^16-1 (65534) case entries.
>>>
>>
>> There is no hard-limit in my case, but BGBCC had generally broken up
>> tables larger than 256.
>>
>> IIRC, this was because larger tables were more likely to have "voids"
>> which could lead to a large number of branches to default, while still
>> being over a 75% density threshold, splitting the table apart was more
>> likely to expose these voids (and make the binary smaller).
>
> Yes, one would expect voids in tables would cause the compiler to
> break the switch table into more dense chunks.
>
At present, there isn't anything to detect voids directly, but, density
percentage is easy to detect:
(100*count)/(max-min)
....
But, as the count increases, the probability of large voids escaping
notice also increases.
Say, at 256, one could potentially have a hidden void of 64 labels
(separating two regions at 100% density) without triggering a split.
At present, it always splits the table in half (in terms of case-label
count). Possibly, it could try to probe each point of the table and
detect if there is another point where the density of each sub-table is
somewhat higher than if the table were split evenly in half.
>> I guess a possible tweak could be, say, if the density is over 93% or
>> so, it will allow a table-jump regardless of size.
>
> If the void is greater than the overhead to break the table, then
> the table should be broken.
>
To some amount, everything is heuristics.
I guess also possible is to make it such that a the required density (to
avoid split) is correlated to table size.
Say (table size, minimum density):
128, 75%
256, 80%
512, 85%
1024, 90%
2048, 95%
Would likely need to fiddle a bit with this though.
>> Though, for the most part the programs I am testing with tend not to
>> really have too many large switch blocks, and in the case of "switch"
>> with a byte (most common case in these cases), a limit of 256 works.
>>
>>
>> Meanwhile, for some other cases, like a switch full of TWOCC and FOURCC
>> values, one really does not want to use a jump table (but, this is
>> avoided as these cases tend to have a very low density).
>>
>>
>>
>>>>> For Q+ indirect jump the values loaded from the table replace the low
>>>>> order bits of the PC instead of being a displacement. Only {W,T,O} are
>>>>> supported. (W=wyde,T=tetra,O=octa). Should add an option for
>>>>> displacements. Borrowed the memory indirect jump from the 68k.
>>>
>>> My 66000 Loads the table entry directly into IP in 1 cycle less
>>> than LD latency.
>>>
>>
>> I guess, specialized Load+Branch could potentially have less latency
>> than separate load+branch, or the current strategy of double-branching.
>
> Think of it as a LD IP,[address]
Possibly. Early on, I had a RET instruction that essentially did:
MOV @SP+, PC
But, at the time Load/Store operations were not pipelined.
EX:
Generated address, submit address and access request to L1 cache;
Wait for response to become OK;
Do whatever with result.
Now, Load/Store is pipelined, but also branches are initiated from EX1,
but no Load result until EX2 (special case, aligned-only DWORD/QWORD) or
EX3 (generic).
Would need a mechanism to allow initiating a branch from EX3.