Deutsch English Français Italiano |
<2024Sep13.082347@mips.complang.tuwien.ac.at> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!2.eu.feeder.erje.net!3.eu.feeder.erje.net!feeder.erje.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: anton@mips.complang.tuwien.ac.at (Anton Ertl) Newsgroups: comp.arch Subject: Re: big, fast, etc, was is Vax addressing sane today Date: Fri, 13 Sep 2024 06:23:47 GMT Organization: Institut fuer Computersprachen, Technische Universitaet Wien Lines: 91 Message-ID: <2024Sep13.082347@mips.complang.tuwien.ac.at> References: <vbd6b9$g147$1@dont-email.me> <2024Sep11.113204@mips.complang.tuwien.ac.at> <vbsh3q$3n09p$1@dont-email.me> <vbtqib$2sce$2@dont-email.me> <vbvhs3$2std$1@gal.iecc.com> Injection-Date: Fri, 13 Sep 2024 10:13:02 +0200 (CEST) Injection-Info: dont-email.me; posting-host="da068ec287f3285dee7d4af894aeb142"; logging-data="821640"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX182fF+jekf+BxqGszcleFTf" Cancel-Lock: sha1:6iFQGHienWvcNjTNssGOfT1roEM= X-newsreader: xrn 10.11 Bytes: 5904 John Levine <johnl@taugh.com> writes: >According to Lawrence D'Oliveiro <ldo@nz.invalid>: >>You solve that by having multiple units of the cheap machines to achieve >>the same level of redundancy, or even more. That ends up being more cost- >>effective than the mainframe. > >That's fine for workloads that work that way. > >Airline reservation systems historically ran on mainframes because >when they were invented that's all there was (original SABRE ran on >two 7090s) and they are business critical so they need to be very >reliable. > >About 30 years ago some guys at MIT realized that route and fare >search, which are some of the most demanding things that CRS do, are >easy to parallelize and don't have to be particularly reliable -- if >your search system crashes and restarts and reruns the search and the >result is a couple of seconds late, that's OK. So they started ITA >software which used racks of PC servers running parallel applications >written in Lisp (they were from MIT) and blew away the competition. > >However, that's just the search part. Actually booking the seats and >selling tickets stays on a mainframe or an Oracle system because >double booking or giving away free tickets would be really bad. Booking flights or seats can easily be distributed: each flight is assigned to one computer. To avoid double booking or free tickets even in case of a computer crash, you use the usual transaction processing approach, and report completion of the booking only when the booking has reached persistent memory. For persistent memory you use SSDs with power-loss protection. These SSDs, ECC RAM, RAID-1, redundant power supplies and UPSs protect against most hardware failures, but availability is still a concern (e.g., motherboard or CPU failure; that normally does not affect data integrity if the other measures are taken, but it affects availability). To increase availability, you can use e.g., DRBD (distributed replicated block device) to get the data on multiple machines. Concerning "real bad": Airlines overbook their flights as a matter of policy to increase their revenue. If they had a booking system that double-booked, say, 1ppm of all bookings, they probably would not even notice, and would deal with it in the same way they deal with the cases when the overbooking actually results in too many passengers arriving for the flight. Likewise, free tickets are not an issue if they occur rarely enough. Do they want to spend a million on a mainframe to avoid a revenue loss of $100k? But in any case, that's not the problem with cheap hardware. The problems are: When the persistent storage fails, you lose all transactions since the latest backup. To avoid that, RAID-1 helps, or a redundant distributed storage like DRBD, or a redundant distributed transaction system. You may also want more availability than a single system with RAID-1 (with a spare system standing by) provides, then you have to go for one of the redundant distributed approaches. However, my impression from booking flights online is that reliability of the booking platform is not at all a concern for the airlines. And as a customer, I find little difference between the booking front-end erroring out or the transaction back-end being unavailable. >There's also a rule of thumb about databases that says one system of >performance 100 is much better than 100 systems of performance 1 >because those 100 systems will spend all their time contending for >database locks. If you handle each flight on one system, the contention for locks is only within that one system. And I expect that there is not that much contention. How many people book the same flight within the same millisecond (or however long the lock is held)? Concerning performance 100 vs. performance 1, about what systems are you thinking? z17 will have 32*8=256 cores (of unknown performance that is likely to be disappointing, or IBM would not disallow publishing benchmark results), compared to similar numbers of cores on servers with AMD or Intel CPUs, or 16-24 cores on systems based on desktop chips (with Intel you pay a heavy premium these days if you want ECC memory, however). Interestingly, with increasing number of cores per socket in recent years, the number of sockets is going down. E.g., the successor for the HPE Superdome Flex with up to 32 sockets (up to 32*28=896 cores) is the HPE Compute Scale-Up Server 3200 with up to 16 sockets (16*60=960 cores). Either there is little demand for single systems with more cores, or there are technical difficulties (probably both). - anton -- 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.' Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>