Path: ...!weretis.net!feeder6.news.weretis.net!feeder8.news.weretis.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: anton@mips.complang.tuwien.ac.at (Anton Ertl) Newsgroups: comp.arch Subject: Re: Short Vectors Versus Long Vectors Date: Wed, 24 Apr 2024 09:18:56 GMT Organization: Institut fuer Computersprachen, Technische Universitaet Wien Lines: 34 Message-ID: <2024Apr24.111856@mips.complang.tuwien.ac.at> References: <5451dcac941e1f569397a5cc7818f68f@www.novabbs.org> <5ad43f26367ef2d5e8b3c298511ddf45@www.novabbs.org> Injection-Date: Wed, 24 Apr 2024 11:27:45 +0200 (CEST) Injection-Info: dont-email.me; posting-host="68f13f15e74c6cc1e6ed32f2711e82b5"; logging-data="2368321"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18RIXt8vp21LoR5rNzaXU8O" Cancel-Lock: sha1:LWPTqAIGYqQT5lq/98zTy0ALvZo= X-newsreader: xrn 10.11 Bytes: 2572 John Savard writes: >On Wed, 24 Apr 2024 02:00:10 +0000, mitchalsup@aol.com (MitchAlsup1) >wrote: > >>Everyone has to have hope on something. > >But false hopes are a waste of time. > >The reason for my interest in long vectors is primarily because I >imagine that, if the Cray I was an improvement on the IBM System/360 >Model 195, then, apparently, today a chip like the Cray I would be >the next logical step after the Pentium II (OoO plus cache, just like >a Model 195). But the Cray-1 is not an improvement on the Model 195. It has no cache. Neither the Cray-1 nor the Model 195 have OoO as the term is commonly understood today: OoO execution, in-order completion, allowing register renaming, speculative execution, and precise exceptions. One may consider the Model 91/195 a predecessor of today's OoO, because it supports register renaming, and you "just" need to add a reorder buffer to get in-order completion and speculative execution. >Well, apparently they do things like multiply 2048 by 2048 matrices. >Which is why they need stride. You can multiply dense matrices of any size efficiently with stride 1. And caches help a lot for matrix multiply; in HPC circles, (dense) matrix multiply is known as cache-friendly problem. - anton -- 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.' Mitch Alsup,