Path: ...!weretis.net!feeder9.news.weretis.net!news.quux.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail
From: Stefan Monnier <monnier@iro.umontreal.ca>
Newsgroups: comp.arch
Subject: Re: Is Parallel Programming Hard, And, If So, What Can You Do About It?
Date: Mon, 12 May 2025 21:50:02 -0400
Organization: A noiseless patient Spider
Lines: 72
Message-ID: <jwvr00twbj5.fsf-monnier+comp.arch@gnu.org>
References: <vvnds6$3gism$1@dont-email.me>
	<edb59b7854474033c748f0fd668badaa@www.novabbs.org>
	<w32UP.481123$C51b.217868@fx17.iad> <vvqdas$g9oh$1@dont-email.me>
	<vvrcs9$msmc$2@dont-email.me>
	<0ec5d195f4732e6c92da77b7e2fa986d@www.novabbs.org>
	<vvribg$npn4$1@dont-email.me> <vvs343$ulkk$1@dont-email.me>
	<vvtt4d$1b8s7$4@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 13 May 2025 03:50:08 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="d1a63527bedce0b2b45e413598f24fe1";
	logging-data="1492577"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX18QiJVFSQ8WjTcb19yHb+yujPTVtwCXZtg="
User-Agent: Gnus/5.13 (Gnus v5.13)
Cancel-Lock: sha1:U7McsnjJspFHYVJISEZYTrtOxck=
	sha1:2gMTOU3xSDtBdTeAZiWMQETNPmc=
Bytes: 5149

Lawrence D'Oliveiro [2025-05-12 22:35:57] wrote:
> On Mon, 12 May 2025 08:05:56 +0200, Terje Mathisen wrote:
>> For reads it allows the disk to always read full sets of sectors, the
>> following blocks are likely to be needed soon anyway.
> Leave that up to the OS I/O optimization algorithms.  Because they know 
> things about the data that the drive doesn’t.

But the drive also knows things about the data that the OS can't know
(things that have to do with the physical location of the data on the
platters).  Which is why it makes sense for both the OS and the drive to
make their own efforts.

Lawrence D'Oliveiro [2025-05-12 22:39:02] wrote:
> On Mon, 12 May 2025 08:41:57 GMT, Anton Ertl wrote:
>> On SSDs DRAM cache is also used for storing the logical-to-physical
>> sector mapping of the flash translation layer; accessing it on flash is
>> apparently too slow.
> There is a lot of complicated firmware in SSDs to make them look as
> much  like a traditional hard drive as possible, so that traditional
> hard drive filesystems can be used unchanged. This firmware has been
> known to have bugs in it.

Bugs is largely attached to "complicated", yes.  This said, I've been
lucky enough not to bump into any of them in my years of use of SSDs.
I admittedly don't push them very hard.

> Whereas the Linux kernel includes a few filesystems purpose-designed
> for  operation on raw flash devices, that integrate wear-levelling etc
> right  into the block allocation algorithms.  Wouldn’t it be much
> better (more efficient and more reliable) to get rid of most of that
> firmware layer, and use these sorts of filesystems directly?

More reliable, I don't know: to get comparable performance, you'll need
comparable complexity, so probably comparable amount of bugs.
Tho I guess by being exposed to many more eyes (by virtue of being Free
Software), it could have a chance of being more reliable, maybe.

But in any case, your above argument has some problems:

- Those "few filesystems" aren't nearly good enough to compete with
  a normal filesystem running on top of a typical SSD.  Simply because
  those filesystems have not been designed for those kinds of uses.
  Last I checked, they don't scale very well to TB sizes, for example.
  And they haven't seen nearly as much work put into avoiding stuttering
  and poor performance when the drive is full.  More generally, they
  haven't received nearly as much attention as has been invested in
  SSDs' "FTL".

- The experience with flash technology in the Linux kernel for smaller
  devices like home routers and such suggests that doing wear leveling
  in the filesystem is a bad idea because you want to do it over the
  whole device: no big difference if you have a single filesystem on the
  whole drive, but for the general case you want something like UBI,
  i.e. a kind of volume-management system that takes care of spreading
  the writes over the whole drive as well as remapping defective pages,
  while still exposing some of the semantics of flash chips, so you need
  non-standard filesystems on top of that

- For better of for worse, drive manufacturers simply have not given
  access to the "raw" flash layer.  I'm not completely sure why, but
  I get the impression that manufacturers use it as a way to segment the
  market, with different prices for the same flash chips combined with
  different FTLs.
  But maybe at some point, market conditions will change and we'll be
  able to buy SSDs that can be accessed directly at the flash level?

I agree with you in theory, but in practice I think the potential gain is
rather small.  Maybe the "block device abstraction" isn't such a bad
choice in the end.


        Stefan