Deutsch English Français Italiano |
<var0db$5t51$1@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Brett <ggtgp@yahoo.com> Newsgroups: comp.arch Subject: Re: Banked register files Date: Thu, 29 Aug 2024 23:31:56 -0000 (UTC) Organization: A noiseless patient Spider Lines: 60 Message-ID: <var0db$5t51$1@dont-email.me> References: <va0eev$31fml$1@dont-email.me> <484586d667d1e9e7ae11184dbd362619@www.novabbs.org> <va0k4v$32dgq$1@dont-email.me> <2cf5a18a58a4281b1b67935b31a8fe49@www.novabbs.org> <va1412$3881u$1@dont-email.me> <va8c9j$j6q1$1@dont-email.me> <vabto4$19iip$1@dont-email.me> <vad56j$1fm8b$1@dont-email.me> <vair0o$2k32g$1@dont-email.me> <95b2ce27c781e0556864a8b7d4b55187@www.novabbs.org> <valoqv$361fu$1@dont-email.me> <30aabccbd91948d21b674551ce9a8ddc@www.novabbs.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Injection-Date: Fri, 30 Aug 2024 01:31:56 +0200 (CEST) Injection-Info: dont-email.me; posting-host="4c8a25442a54e9cf3fdaf2e3dc81b490"; logging-data="193697"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/4vO3j9trT18dpUYWcGRKX" User-Agent: NewsTap/5.5 (iPad) Cancel-Lock: sha1:Aug853b1LBiWaOzka4p9oKybGKo= sha1:5RYWtrTsTplsA77KNruc5XarisE= Bytes: 3512 MitchAlsup1 <mitchalsup@aol.com> wrote: > On Tue, 27 Aug 2024 23:51:59 +0000, Brett wrote: > >> MitchAlsup1 <mitchalsup@aol.com> wrote: >>> On Mon, 26 Aug 2024 21:10:48 +0000, Brett wrote: >>> >>>> Brett <ggtgp@yahoo.com> wrote: >>>>> Robert Finch <robfi680@gmail.com> wrote: >>>>>> On 2024-08-22 5:58 p.m., Brett wrote: >>>>>>> Brett <ggtgp@yahoo.com> wrote: >>>>>>>> MitchAlsup1 <mitchalsup@aol.com> wrote: >>>>>> >>>>>> I saw a design where there was an attempt to process basic blocks in >>>>>> parallel silos feeding functional units. It made use of fewer registers >>>>>> by holding data in pipeline registers instead of GPRs which it could do >>>>>> since some of the data for a basic block never goes outside the block. >>>> >>>> No reply’s, so I figure y’all are under NDA. ;) >>> >>> It has been well known since mid 1990s that most loops end up with a >>> single >>> or dual stream of self dependent instructions and few loop dependencies >>> {mostly the loop index itself}. This leads to instruction dependency >>> graphs (and execution times) that look like:: >>> >>> | LD | >>> | LD | >>> | FMUL | >>> | FADD | >>> | STA | | STD | >>> | ADD | >>> | CMP | >>> | BV | >>> ------------------------------------------------------------ >>> | LD | >>> | LD | >>> | FMUL | >>> >> >> To even out the cluster load you would want the compiler to unroll once, >> first bank second, then second bank first. >> >> Can also be done without compiler by mapping the links on the second >> pass of a loop. > > The above is done with simple reservation stations and no compiler work. > >> I am assuming clusters or banks as naming and issue width continue >> growing. > > Once you start doing reservation station machines, your 72-entry banked > register file needs to have the RSs watch 72 results instead of just 32. ALU’s are cheap, so each bank has its own set. You can forward and complete twice as many results. The traditional problem of banking is a one cycle delay crossing banks, a compiler can fix that, a CPU cannot on first pass.