Deutsch English Français Italiano |
<2024Apr24.081658@mips.complang.tuwien.ac.at> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!news.nobody.at!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: anton@mips.complang.tuwien.ac.at (Anton Ertl) Newsgroups: comp.arch Subject: Re: Short Vectors Versus Long Vectors Date: Wed, 24 Apr 2024 06:16:58 GMT Organization: Institut fuer Computersprachen, Technische Universitaet Wien Lines: 30 Message-ID: <2024Apr24.081658@mips.complang.tuwien.ac.at> References: <v06vdb$17r2v$1@dont-email.me> <5451dcac941e1f569397a5cc7818f68f@www.novabbs.org> <hqmg2j1vbkf6suddfnsh3h3uhtkqqio4uk@4ax.com> Injection-Date: Wed, 24 Apr 2024 08:26:30 +0200 (CEST) Injection-Info: dont-email.me; posting-host="68f13f15e74c6cc1e6ed32f2711e82b5"; logging-data="2295708"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/8iwosbhEqSnX3QsIbwpkt" Cancel-Lock: sha1:ai88ncY9CsDDpG9BIRRIXNHxy+0= X-newsreader: xrn 10.11 Bytes: 2331 John Savard <quadibloc@servername.invalid> writes: >And if memory bandwidth issues make Cray-style vector machines >impractical, then wouldn't it be even worse for GPUs? The claim by Mitch Alsup is that latency makes the Crays impractical, because of chaining issues. Do GPUs have chaining? My understanding is that GPUs deal with latency in the barrel processor way: use another data-parallel thread while waiting for memory. Tera also pursued this idea, but the GPUs succeeded with it. >If >most problems anyone would want to use a vector CPU for today do >involve a large amount of memory, used in a random fashion, so as to >fit poorly in cache When the working set is larger than the cache, it does not fit even when accesses regularly. Prefetchers can reduce the latency, but they will not increase the bandwidth. So if you have a problem that walks through a lot of memory and performs only a few operations per data item, that's where CPUs will wait for memory a lot, due to limited bandwidth (and you won't benefit from SIMD/vector instructions on these kinds of problems). For that kind of stuff you better use GPUs, which have memory systems with more bandwidth. - anton -- 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.' Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>