Article <100vuf8$1icjh$1@dont-email.me>

Deutsch English Français Italiano
<100vuf8$1icjh$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!weretis.net!feeder9.news.weretis.net!news.quux.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail
From: BGB <cr88192@gmail.com>
Newsgroups: comp.arch
Subject: Re: Is Parallel Programming Hard, And, If So, What Can You Do About
 It?
Date: Sun, 25 May 2025 15:21:01 -0500
Organization: A noiseless patient Spider
Lines: 234
Message-ID: <100vuf8$1icjh$1@dont-email.me>
References: <vvnds6$3gism$1@dont-email.me> <vvqdas$g9oh$1@dont-email.me>
 <vvrcs9$msmc$2@dont-email.me>
 <0ec5d195f4732e6c92da77b7e2fa986d@www.novabbs.org>
 <vvribg$npn4$1@dont-email.me> <vvs343$ulkk$1@dont-email.me>
 <vvtt4d$1b8s7$4@dont-email.me> <2025May13.094035@mips.complang.tuwien.ac.at>
 <vvuuua$1mt7m$1@dont-email.me> <vvvons$3uvs3$2@dont-email.me>
 <100oetb$c5c7$1@paganini.bofh.team> <100q47d$3k7s2$1@dont-email.me>
 <jwv8qmncayc.fsf-monnier+comp.arch@gnu.org> <100srqk$pos5$1@dont-email.me>
 <100vm4s$16av3$1@paganini.bofh.team>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 25 May 2025 22:27:21 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="b3d745afa7758f8c0852a226785a7fe1";
	logging-data="1651313"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX18FOiws7CEKc9BfEqM+7V9qHCHQ3q0/IYg="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:Vc+mjsw7PsAK0hdTLs8ebH+qABI=
In-Reply-To: <100vm4s$16av3$1@paganini.bofh.team>
Content-Language: en-US
Bytes: 12452

On 5/25/2025 1:05 PM, Waldek Hebisch wrote:
> Stephen Fuld <sfuld@alumni.cmu.edu.invalid> wrote:
>> On 5/23/2025 2:03 PM, Stefan Monnier wrote:
>>> Stephen Fuld [2025-05-23 08:28:44] wrote:
>>>> On 5/22/2025 5:18 PM, Waldek Hebisch wrote:
>>>>> It is pretty clear that due to drive mechanics track cache/buffer
>>>>> is useful.
>>>> Pretty clear to everyone except one person. :-)
>>>
>>> 🙂
>>>
>>>>> However, the real question is about size: how big
>>>>> should it be.  For "consumer" drives I see claims of 256 MB
>>>>> cache.  Given rather optimistic 200 MB/s transfer rate it is
>>>>> about 1.25s of drive data, that is 80-150 rotations.  I would
>>>>> expect that say 4 tracks should be enough for reading.  For
>>>>> writing one could use few more tracks.  Still, advertised cache
>>>>> sizes seem to be much bigger than necessary.
>>>> It's not just the rotations, but the seek time. So your example is fewer
>>>> "operations" than the 80-150 you get when just including rotations.
>>>
>>> I don't understand what you're getting at, here.
>>> I think Waldek's argument is that 256MB corresponds approximately
>>> to the amount of data stored in 80-150 tracks, and seek time doesn't
>>> change that fact.
>>
>> Yes, I didn't express myself well.  :-(  And once again, I have to say
>> that my information may be obsolete.
>>
>> I think it is useful to separate talking about read data from write
>> data.  For read data, as with any cache, more is always better than
>> less, though with diminishing returns.  Why pick 1.25 sec as the "cut
>> off point"?  If the host re-references data that it hasn't read for say
>> 3 seconds, having it in cache still saves, probably a seek time and on
>> average 1/2 rotation time.  Plus, it means the heads will be free to
>> handle other requests.  All of this is standard cache benefits.  I see
>> no reason to limit the cache size and reduce this benefit.
> 
> We are talking here about common case, that is when disc is accessed
> via OS cache.  OS cache is significantly larger than disc cache, so
> hit ratio for data sent to host is going to be quite low.  Disc
> cache has an advantage: it gets "for free" some data that host did
> not request.  But it is rather unlikely that keeping such data
> for long time has significant advantage.
> 
>>>> And if you are caching writes, more cache gives you more blocks to choose
>>>> from when optimizing the write back order, which reduces the time to write
>>>> them all back.
>>>
>>> IIUC, for SATA drives, NCQ is still limited to 32 in-flight commands, so
>>> unless the drive is allowed to do write-back caching it seems the amount
>>> of space used for write-buffering is likely small (compared to 256MB).
>>> [ Unless it is common for individual write commands to cover multi-MB
>>>     chunks of data?  ]
>>
>>
>> For write data, I was unaware of the 32 operation limit.  I was used to
>> SCSI, which, IIRC was larger, and for server type applications, where
>> some sort of UPS is more common, the site may choose to enable write
>> caching in the disk.  For a disk vendor, given the small cost of the
>> DRAM, it is an easy choice.
> 
> I do not look at details of disc protocol.  But with protocal done
> right host would first transfer commands and then deliver data
> in order requested by the drive.  So most buffering would be in
> the host and disc would need just enough buffering to ensure
> smooth transmission and low interrupt rate.  4 track looks like
> plenty for this purpose.
> 
>>>> The larger DRAM is a small component of drive cost, so the
>>>> manufacturers think it is worth including more.
>>>
>>> In some markets (e.g. home routers), the size of DRAM seems to be enough
>>> of a cost factor that it took many years until reaching 256MBs, even
>>> though those boxes *need* that RAM for all kinds of purposes (the 128MB
>>> of my current home-router seems to be its main source of instability).
>>> but HDDs are pretty damn expensive beasts nowadays (because prices have
>>> not gone down for the last 10 years or so), so I guess that makes
>>> the relative cost of 512MB of DRAM "negligible"?
>>
>> I can't comment on routers, but for disks, while the cost of the disk
>> may not have come down, increasing capacity allows reduced cost per
>> gigabyte.  A substantial portion of the cost is not subject to Moore's
>> law (e.g. drive motor, magnets and arm assembly, etc.) and some capacity
>> increasing technologies cost more (but not enough more to overwhelm the
>> capacity advantage).
> 
> In nineties I read that for motherboard manufactures 1 cent was
> "negligible", but 10 cents was significant: In volume transactions
> margins were low and no party were willing to absorb 10 cents
> per piece "loss".  Discs probably are less competitive than
> motherboards, but I would expect adding 256 MB to lead to 1
> dollar or more increase of cost.
> 

Dunno, I would maybe expect an 8 or 16MB chip, unless either:
Tracks have become so large that large RAM is needed to deal with them;
These larger RAM chips have actually become the cheapest usable option.

Had noted a correlation between RAM type and module size on FPGA boards:
   512K/1MB: QSPI
   32/64 MB: SDR SDRAM
   64/128MB: DDR1
   128MB: DDR2
   256MB: DDR3

So, maybe, this is the cheapest commodity option if they want a given 
RAM type.

Like, for example, when I was last looking at SDcards, 16GB was the 
cheapest option being sold.

Smaller sizes had fallen off the bottom, and plenty of larger sizes 
existed (say, 128 or 256GB).

So, even if a 4 or 8 GB SDcard would be sufficient, 16GB was what was 
available (in the projects I was doing, typically the biggest file on 
the SDcard ended up being the swapfile...).



> So IMO it is highly unclear why manufacturers use large caches.
> One possible explanation could be benchmarketing and using
> obsolete benchmarks.  Another could be inertia with customers
> thinking that "larger cache is better".
> 

Cache, and RPM, probably...


> Another things is fragmenting market into different "kinds" of
> drives.  Rationally, high performance drives should get
> better mechanical parts.  But in given performance area there
> seem to be no reason for different mechanics, so I suspect
> that they use the same.  They may get different firmware.
> "Green" consumer parts seem to be quite aggressive powering
> down (IIUC on recent WD parts it is impossible to permanently
> disable this), but beyond this it is not clear to me if there
> are rational reasons for significantly different firmware.
> 

My experience:
   WD Green, Power-use optimized
     Drives worked pretty well, but WD seemingly phased it out for HDDs.
     WD Green is back, but mostly for SSDs.
   WD Blue, marked as general purpose;
     Not great and worse reliability IME.
     Seems to be actually the more "budget optimized" line.
   WD Black, marketed as performance optimized.
     Typically 7200RPM
     Not much notable difference IME from the WD Reds
   WD Red, optimized for NAS
     Typically 5400RPM
     Have had mostly good results with these.
   WD Purple, optimized for video usage and similar.
     Typically 5400RPM.
     No first hand experience.

Doesn't seem to be an obvious difference in cache sizes between the 
drive families.

There does seem to be a weak positive correlation between cache size and 
drive size.

RPM seems to be negatively correlated with capacity:
   10K RPM: Seemingly mostly under 1TB
   7200RPM, mostly 1TB to 4TB drives
   5400RPM, most of the bigger drives.

In my use, I had seemingly been seeing the best results from WD Red drives.
========== REMAINDER OF ARTICLE TRUNCATED ==========