Deutsch   English   Français   Italiano  
<6bb58806dc9aad223e36c238bebf6b78@www.novabbs.org>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!weretis.net!feeder9.news.weretis.net!i2pn.org!i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup1)
Newsgroups: comp.arch
Subject: Re: Decrement And Branch
Date: Tue, 13 Aug 2024 17:18:13 +0000
Organization: Rocksolid Light
Message-ID: <6bb58806dc9aad223e36c238bebf6b78@www.novabbs.org>
References: <v9f7b9$3qj3c$1@dont-email.me> <2024Aug13.152807@mips.complang.tuwien.ac.at>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
	logging-data="2460330"; mail-complaints-to="usenet@i2pn2.org";
	posting-account="65wTazMNTleAJDh/pRqmKE7ADni/0wesT78+pyiDW8A";
User-Agent: Rocksolid Light
X-Rslight-Site: $2y$10$GAVmetSNtFFaK/scSYN4FubAjSqwStlE7pqWmnuGHSWemqrsjH4T2
X-Rslight-Posting-User: ac58ceb75ea22753186dae54d967fed894c3dce8
X-Spam-Checker-Version: SpamAssassin 4.0.0
Bytes: 2808
Lines: 38

On Tue, 13 Aug 2024 13:28:07 +0000, Anton Ertl wrote:

> Lawrence D'Oliveiro <ldo@nz.invalid> writes:
>>The original designers of POWER clearly thought there was a point to
>>having such instructions; do you agree?
>
> Sure.  The question is what it was.  Maybe they wanted to look good on
> some kernels.  In the same vein they also added loads and stores with
> update (i.e., autoincrement/decrement addressing), and in one version
> of the architecture reference manual I found the warning that these
> may be as slow as a separate load and update.
>
> AMD64 has LOOP.  I looked at it here several times.  Theoretically one
> can branch-predict it perfectly, but when I measured that
> <2016Jun16.103617@mips.complang.tuwien.ac.at>
> <2017Mar14.183125@mips.complang.tuwien.ac.at>, I found that they just
> use history-based branch prediction for these instructions like
> everybody else.
>
> I think that the major reason is that in an OoO CPU the OoO part would
> need to move the count to the front end, and either let the front end
> wait until that is done, or introduce some mechanism to let the front
> end run ahead and, when the count finally becomes available to the
> front end, update it to the right value where the front end is now.

Actually that is not necessary, but there are additional advantages.

Imagine a GBOoO machine with reservation stations and one runs into
a recognizable loop. Once the RSs are setup, one turns off the FETCH
stage, adds an increment to each station, and then each time the
loop instruction is encountered, you just fire off the RSs again.
This saves around 1/3 of the power being consumed at no loss in
perf.
>
> Moreover, at least some AMD64 CPUs take more cycles for a LOOP than
> for the equivalent "sub; jne" sequence
> <2017Mar15.141411@mips.complang.tuwien.ac.at>
>
> - anton