Deutsch English Français Italiano |
<valoqv$361fu$1@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Brett <ggtgp@yahoo.com> Newsgroups: comp.arch Subject: Re: Banked register files Date: Tue, 27 Aug 2024 23:51:59 -0000 (UTC) Organization: A noiseless patient Spider Lines: 43 Message-ID: <valoqv$361fu$1@dont-email.me> References: <va0eev$31fml$1@dont-email.me> <484586d667d1e9e7ae11184dbd362619@www.novabbs.org> <va0k4v$32dgq$1@dont-email.me> <2cf5a18a58a4281b1b67935b31a8fe49@www.novabbs.org> <va1412$3881u$1@dont-email.me> <va8c9j$j6q1$1@dont-email.me> <vabto4$19iip$1@dont-email.me> <vad56j$1fm8b$1@dont-email.me> <vair0o$2k32g$1@dont-email.me> <95b2ce27c781e0556864a8b7d4b55187@www.novabbs.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Injection-Date: Wed, 28 Aug 2024 01:52:00 +0200 (CEST) Injection-Info: dont-email.me; posting-host="a53ed2363a9fef4f00a182799be3e418"; logging-data="3343870"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19i/Uz2sbYAX89ZtcniufMP" User-Agent: NewsTap/5.5 (iPad) Cancel-Lock: sha1:HF1dpy0uy8uUEX8d+fy5zf++BPE= sha1:KnWROMAeeuVeWDowpq4S2B3LJ/w= Bytes: 2871 MitchAlsup1 <mitchalsup@aol.com> wrote: > On Mon, 26 Aug 2024 21:10:48 +0000, Brett wrote: > >> Brett <ggtgp@yahoo.com> wrote: >>> Robert Finch <robfi680@gmail.com> wrote: >>>> On 2024-08-22 5:58 p.m., Brett wrote: >>>>> Brett <ggtgp@yahoo.com> wrote: >>>>>> MitchAlsup1 <mitchalsup@aol.com> wrote: >>>> >>>> I saw a design where there was an attempt to process basic blocks in >>>> parallel silos feeding functional units. It made use of fewer registers >>>> by holding data in pipeline registers instead of GPRs which it could do >>>> since some of the data for a basic block never goes outside the block. >> >> No reply’s, so I figure y’all are under NDA. ;) > > It has been well known since mid 1990s that most loops end up with a > single > or dual stream of self dependent instructions and few loop dependencies > {mostly the loop index itself}. This leads to instruction dependency > graphs (and execution times) that look like:: > > | LD | > | LD | > | FMUL | > | FADD | > | STA | | STD | > | ADD | > | CMP | > | BV | > ------------------------------------------------------------ > | LD | > | LD | > | FMUL | > To even out the cluster load you would want the compiler to unroll once, first bank second, then second bank first. Can also be done without compiler by mapping the links on the second pass of a loop. I am assuming clusters or banks as naming and issue width continue growing.