Article <875xtn87u5.fsf@localhost>

Warning: mysqli::__construct(): (HY000/1203): User howardkn already has more than 'max_user_connections' active connections in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\includes\artfuncs.php on line 21
Failed to connect to MySQL: (1203) User howardkn already has more than 'max_user_connections' active connections
Warning: mysqli::query(): Couldn't fetch mysqli in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\index.php on line 66
Article <875xtn87u5.fsf@localhost>
Deutsch English Français Italiano
<875xtn87u5.fsf@localhost>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Lynn Wheeler <lynn@garlic.com>
Newsgroups: comp.arch
Subject: Re: Architectural implications of locate mode I/O
Date: Tue, 02 Jul 2024 17:36:50 -1000
Organization: Wheeler&Wheeler
Lines: 110
Message-ID: <875xtn87u5.fsf@localhost>
References: <v61jeh$k6d$1@gal.iecc.com> <v61oc8$1pf3p$1@dont-email.me>
	<HYZgO.719$xA%e.597@fx17.iad>
	<8bfe4d34bae396114050ad1000f4f31c@www.novabbs.org>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Date: Wed, 03 Jul 2024 05:36:53 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="ce340dd851f633aef133aba63d783337";
	logging-data="2138502"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX19CK82e2uDL/OOIUCftMsohc2swQ50+8WA="
User-Agent: Gnus/5.13 (Gnus v5.13)
Cancel-Lock: sha1:UT3+cHsRy1ovw7Lhs+6o8AtZB3o=
	sha1:bxZjVpyyU7CoSc/Nwr+5Whn8TjQ=
Bytes: 6653

mitchalsup@aol.com (MitchAlsup1) writes:
> Once you recognize that I/O is eating up your precious CPU, and you
> get to the point you are willing to expend another fixed programmed
> device to make the I/O burden manageable, then you basically have
> CDC 6600 Peripheral Processors, programmed in code or microcode.

QSAM library does serialization for the application ...  get/put calls
does "wait" operations inside the library for I/O complete. BSAM library
has the applications performing serialization with their own "wait"
operations for read/write calls (application handling overlap of
possible processing with I/O).

Recent IBM articles mentioning that QSAM default multiple buffering was
established years ago as "five" ... but current recommendations are for
more like 150 (for QSAM to have high processing overlapped with
I/O). Note while they differentiate between application buffers and
"system" buffers (for move & locate mode), QSAM (system) buffers run was
part of application address space but are managed as part of QSAM
library code.

Both QSAM & BSAM libraries build the (application) channel programs
.... and since OS/360 move to virtual memory for all 370s, they all have
(application address space) virtual addresses. When the library code
passes the channel program to EXCP/SVC0, a copy of the passed channel
programs are made, replacing the virtual addresses in the CCWs with
"real addresses". QSAM GET can return the address within its buffers
(involved in the actual I/O, "locate" mode) or copy data from its
buffers to the application buffers ("move" mode). The references on the
web all seemed to reference "system" and "application" buffers, but I
think it would be more appropriate to reference them as "library" and
"application" buffers.

370/158 had "integrated channels" ... the 158 engine ran both 370
instruction set microcode and the integrated channel microcode.

when future system imploded, there was mad rush to get stuff back into
the 370 product pipelines, including kicking off the quick&dirty
303x&3081 efforts in parallel.

for 303x they created "external channels" by taking a 158 engine with
just the integrated channel microcode (and no 370 microcode) for the
303x "channel director".

a 3031 was two 158 engines, one with just the 370 microcode and a 2nd
with just the integrated channel microcode

a 3032 was 168-3 remapped to use channel director for external channels.

a 3033 started with 168-3 logic remapped to 20% faster chips.

Jan1979, I had lots of use of an early engineering 4341 and was con'ed
into doing a (cdc6600) benchmark for national lab that was looking for
70 4341s for computer farm (sort of leading edge of the coming cluster
supercomputing tsunami). Benchmark was fortran compute doing no I/O and
executed with nothing else running.

4341: 36.21secs, 3031: 37.03secs, 158: 45.64secs

now integrated channel microcode ... 158 even with no I/O running was
still 45.64secs compared to the same hardware in 3031 but w/o channel
microcode: 37.03secs.

I had a channel efficiency benchmark ... basically how fast can channel
handle each channel command word (CCW) in a channel program (channel
architecture required it fetched, decoded and executed purely
sequentially/synchronously). Test was to have two disk read ("chained")
CCWs for two consecutive records. Then add a CCW between the two disk
read CCWs (in the channel program) ... which results in a complete
revolution to read the 2nd data record (because the latency, while disk
is spinning, in handling the extra CCW separating the two record read
CCWs).

Then reformat the disk to add a dummy record between each data record,
gradually increasing the size of the dummy record until the two data
records can be done in single revolution.

The size of the dummy record required for single revolution reading the
two records was the largest for 158 integrated channel as well as all
the 303x channel directors. The original 168 external channels could do
single revolution with the smallest possible dummy record (but a 168
with channel director, aka 3032, couldn't, nor could 3033) ... also the
4341 integrated channel microcode could do it with smallest possible
dummy record.

The 3081 channel command word (CCW) processing latency was more like 158
integrated channel (and 303x channel directors)

Second half of the 80s, I was member of Chesson's XTP TAB ... found a
comparison between typical UNIX at the time for TCP/IP had on the order
of 5k instructions and five buffer copies ... while compareable
mainframe protocol in VTAM had 160k instructions and 15 buffer copies
(larger buffers on high-end cache machines ... the cache misses for the
15 buffer copies could exceed processor cycles for the 160k
instructions).

XTP was working on no buffer copies and streaming I/O ... attempting to
process TCP as close as possible to no buffer copy disk I/O.
Scatter/gather I/O for separate header and data ... also move from
header CRC protocol .... to trailor CRC protocol ... instead of software
prescanning the buffer to calculate CRC (for placing in header)
.... outboard processing the data as it streams through, doing the CRC
and then appended to the end of the record.

When doing IBM's HA/CMP and working with major RDBMS vendors on cluster
scaleup in late 80s/early 90s, there was lots of references to POSIX
light-weight threads and asynchronous I/O for RDBMS (with no buffer
copies) and the RDBMS managing large record cache.

-- 
virtualization experience starting Jan1968, online at home since Mar1970