Deutsch English Français Italiano |
<f16fd31d0e97d55e6dabc560dcef4a38@www.novabbs.org> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!weretis.net!feeder9.news.weretis.net!i2pn.org!i2pn2.org!.POSTED!not-for-mail From: mitchalsup@aol.com (MitchAlsup1) Newsgroups: comp.arch Subject: Re: Concertlina II: Full Circle Date: Wed, 19 Jun 2024 00:28:24 +0000 Organization: Rocksolid Light Message-ID: <f16fd31d0e97d55e6dabc560dcef4a38@www.novabbs.org> References: <mas07jhu9i876gsov2gh8tap17kem5n21p@4ax.com> <132536f47d1b160ad3ad0340fc479c1d@www.novabbs.org> <v4c17j5eo503i93fb7imjpom5jqs3oivtv@4ax.com> <50c85586e1aec0eef53e83cef7cb1d5d@www.novabbs.org> <4mb37jdb25571s1q1pjlc3ludaaks7tukr@4ax.com> <e4c37jd4l9spbi5b23b525unp9p60ird8q@4ax.com> <1401408dead0bbc0b1e2ea7e053c873a@www.novabbs.org> <fbn37jpc0banppburc1t6r4hnp3kih23ui@4ax.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Info: i2pn2.org; logging-data="438262"; mail-complaints-to="usenet@i2pn2.org"; posting-account="65wTazMNTleAJDh/pRqmKE7ADni/0wesT78+pyiDW8A"; User-Agent: Rocksolid Light X-Rslight-Site: $2y$10$oVoW1.hSFHW5jUOtvu1QLOxir1E83WlROK4pp658jKBKuwne0i.Qa X-Rslight-Posting-User: ac58ceb75ea22753186dae54d967fed894c3dce8 X-Spam-Checker-Version: SpamAssassin 4.0.0 Bytes: 5453 Lines: 109 John Savard wrote: > On Tue, 18 Jun 2024 16:54:04 +0000, mitchalsup@aol.com (MitchAlsup1) > wrote: >>The semantics of instructions in a loop are subtly altered such >>that they can be vectorized and to execute multi-lane style. > I've decided that I will not be able to use the one from the original > Concertina, and will need to design a VVM-like instruction for > Concertina II from scratch. > Unlike yours, it won't be...subtle. > The action of the instruction which begins the loop will, I think, be > basically the same as yours. It willl issue successive iterations of > the loop starting in consecutive cycles. > To do so, though, that instruction will contain a number of fields in > which to specify parameters: > (3 bits) An index register, which is initialized to zero at the start > of the loop, and "incremented" (the quote marks are, of course, > because it won't really be the same register on each iteration) for > subsequent iterations. Ri is provided in the LOOP instruction > (3 bits) The power of two which is to serve as the increment. There is no such need in VVM, increment is either a constant or a register and is not restricted to powers of 2. > (8 bits) A register mask, in which a 1 bit corresponds to a register > used for intermediate results within the loop. This will become a > forwarding node rather than a register; all other registers can only > be read, and serve as constant values only. The index register set up > previously does not need to be indicated by this. The contrapositive of this is provided for in VEC. > (2 bits) This indicates which of the four groups of 8 registers in a > bank of 32 registers the register mask applies to. I found no need. > (1 bit) This indicates whether we're talking about the integer > registers or the floating-point ones. I have but 1 register file. > In addition, in the long version of the instruction, there's a 16-bit > register mask for the short vector registers. I also have not software addressable vector registers. > Because iterations are independent, one can't handle a stride in the > natural efficient manner of adding the stride value to a second > pointer register. This could be a common source of error, so I feel > the need to make some provision for this. for( i = 0; i < max; i +=7 ) falls out for free. But also note:: for( i = 0; i < max; i++ ) a[i] = b[i]; is always faster than: for( i = 0; i < max; i++ ) *ap++ = *bp++; The top loop is 3 instruction, the bottom one is 5. > One scheme I am considering would be to include one bit in the > instruction that begins a loop to indicate the loop contains a > preamble. The preambles execute serially, and when they conclude, > everything that follows is issued immediately, to execute in parallel > (but now with a multi-cycle offset) to previous iterations. VVM just has instruction before the VEC instruction to deal with this. > Upon reflection, this doesn't waste a huge amount of time, so it is > better to go with it than including fields for stride value and a > second counter register in the loop start instruction. > Since the preambles do execute serially, the "end preamble" > instruction would point to the loop start instruction. Instead of full > memory-reference, though, it would just include a short value that is > a negative program-relative address. > Iterations that execute in parallel, though, don't "branch back" > anywhere, so the loop end instruction has no parameters. At least > something is like your VVM. By considering the the branch back to the top as a return, those loops which were executed simultaneously just die instead of returning to the top, only the MOD-N lane returns to the top. > So this is how I take your VVM concept, and mess it up by making it > unnecessarily complicated; basically, because I don't want to make an > ISA that requires implementations to be, so to speak, "intelligent". > (i.e. upon the first store into a register in the loop, categorize > that register as a node reference) LOL but have fun. > John Savard