Article <2024Aug1.174030@mips.complang.tuwien.ac.at>

Deutsch English Français Italiano
<2024Aug1.174030@mips.complang.tuwien.ac.at>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: anton@mips.complang.tuwien.ac.at (Anton Ertl)
Newsgroups: comp.arch
Subject: Re: Array vs. Linked List Traversal
Date: Thu, 01 Aug 2024 15:40:30 GMT
Organization: Institut fuer Computersprachen, Technische Universitaet Wien
Lines: 34
Message-ID: <2024Aug1.174030@mips.complang.tuwien.ac.at>
References: <v8g7lr$27dq5$1@dont-email.me>
Injection-Date: Thu, 01 Aug 2024 17:53:07 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="9c6cb8df3cbe72c15b04922cd88fdb90";
	logging-data="2345580"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX18biqGqodDKrQquLunoe1Ah"
Cancel-Lock: sha1:r1igCagJ1gZecINMrGP6yYKwJOA=
X-newsreader: xrn 10.11
Bytes: 2334

jseigh <jseigh_es00@xemaps.com> writes:
>Array elements being in cache, assuming the element size is small
>enough to fit multiple elements in a cache line?

Yes.  Even without, hardware branch predictors will see the stride and
prefetch the cache lines appropriately.

>No dependent load delay from having to load the linked list next
>pointer?  If that was the case, then moving the load of the next
>pointer to the beginning of the loop should ameliorate that somewhat.

Depends on how much work you have to do per element.  If it's just a
little work, like adding up one field of each element, or searching
for a particular value of a field, the linked-list will mean that each
iteration takes 4-5 cycles on a modern CPU even on a D-cache hit,
while such loops may take 1 cycle/iteration or less if an array is
used.  If you have more than 5 cycles of work per element, then that's
not an issue (as long as the elements are in the D-cache or the
linked-list layout works for the hardware prefetchers), but modern
CPUs can do a lot in 5 cycles.

Rearranging the load of the next pointer will typically not make a big
difference on OoO CPUs, unless you have a lot of branch
mispredictions.

>Something else?

Please limit your lines to around 70 characters.  I reformatted the
quoted material.

- anton
-- 
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
  Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>