Deutsch   English   Français   Italiano  
<vvhktl$1k0km$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail
From: BGB <cr88192@gmail.com>
Newsgroups: comp.lang.c
Subject: Re: Rationale for aligning data on even bytes in a Unix shell file?
Date: Thu, 8 May 2025 01:57:05 -0500
Organization: A noiseless patient Spider
Lines: 97
Message-ID: <vvhktl$1k0km$1@dont-email.me>
References: <vuih43$2agfa$1@dont-email.me> <vuml73$1riea$1@dont-email.me>
 <vun04h$2fjrn$2@raubtier-asyl.eternal-september.org>
 <vun1nh$22hc5$3@dont-email.me>
 <vunak2$2p980$1@raubtier-asyl.eternal-september.org>
 <vunbgo$2q5u8$1@dont-email.me>
 <vunbjg$2q72n$1@raubtier-asyl.eternal-september.org>
 <vund1f$2rh3j$1@dont-email.me>
 <vungko$2uoa2$1@raubtier-asyl.eternal-september.org>
 <X9MPP.1383458$f81.819466@fx48.iad>
 <vuobri$3o38b$1@raubtier-asyl.eternal-september.org>
 <XtOPP.2986761$t84d.2537581@fx11.iad>
 <vuohq9$3tlhf$1@raubtier-asyl.eternal-september.org>
 <vuoig5$3ub4j$1@dont-email.me>
 <vuorpf$6tnn$1@raubtier-asyl.eternal-september.org>
 <vup2nt$bi1k$2@dont-email.me>
 <vupofl$13pg2$2@raubtier-asyl.eternal-september.org>
 <vuprce$15sqo$2@dont-email.me>
 <vvd6n5$353gs$1@raubtier-asyl.eternal-september.org>
 <vvfbnj$ulpc$1@dont-email.me> <vvh05a$1bfpj$2@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Thu, 08 May 2025 09:02:14 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="2ba0e0bff72c1f798f59c15520abdc28";
	logging-data="1704598"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1+XJzC5biTMNpugGCFpJW/cJMKJTeWz5eQ="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:QsfveNnkNAUt/Rmpw6+SDyvKufs=
Content-Language: en-US
In-Reply-To: <vvh05a$1bfpj$2@dont-email.me>

On 5/7/2025 8:07 PM, Lawrence D'Oliveiro wrote:
> On Wed, 7 May 2025 05:08:03 -0500, BGB wrote:
> 
>> Ideally, filesystems should be case sensitive by default;
>> If someone wants case insensitivity, this can be better handled at the
>> application or file-browser level.
> 
> Even Linux has given in on this. The widely-used ext4 filesystem has an
> option for case-insensitivity, which, once enabled for a volume, can be
> activated on a per-directory basis.

Ironically, Windows and NTFS went the other way, adding an option for 
case-sensitive directories (though, one needs to use special commands in 
PowerShell to enable it on a per-directory basis).

....


Either way, case-insensitivity at the FS level adds complexity.


I guess, one intermediate option could be to keep the FS proper as case 
sensitive, but then fake case insensitivity at the level of the OS APIs 
(based on a system-level locale setting).

Say, program tries to open "Foo.txt";
Kernel sees that no "Foo.txt" exists, but "foo.txt" does, and the 
directory was flagged as case-insensitive, and so the kernel does a 
case-folded open.

Granted, how to implement this semi-efficiently is its own issue.

Externally doing a directory walk and seeing if any of the files match 
the requested name is possible, if albeit inefficient. Building a 
case-folding hash of a directory could be possible, but only makes sense 
if one expects this to happen repeatedly in a given directory (if it is 
one-off, it is little better than a linear walk and match).



One intermediate option could be to have a hidden metadata file, such as 
case-folded names table. This merely lists all the files in a directory, 
but with all the filenames normalized to all lower case or similar (with 
a bitmap of which characters were case-folded).

Ironically, this isn't too far off from how one might support Unix style 
metadata on FAT32. Say, one has a hidden file, "$_TKMETA.DAT" which 
isn't shown in directory listings, but may be used by the FS driver for 
extended metadata (say, in this case keyed using the 8.3 name).

It is kind of a crap option, but (mostly) survives Windows intrusions 
(but will still break on directory copy, as modern Windows versions do 
not preserve the original 8.3 names). This variant using the LFN's for 
the user-visible name, unlike "UMSDOS" which provided its own filenames 
and didn't use the VFAT LFN scheme.



Though. if doing a natively case-insensitive filesystem, I guess one 
option could be to fold all names to lower case, and then store a 
bitmask of which bytes to flip back to upper case.


Assuming a 64 byte dirent, and a similar AVL-like directory structure:
   {
   u32 ino;              //00, inode number (low 32 bits)
   u16 lsn;              //04, left child node
   u16 psn;              //06, parent node
   u16 rsn;              //08, right child node
   u16 hsn;              //0A, node high bits
   u16 ino_hi;           //0C, inode high bits
   byte zdepth;          //0E, Z height of node (0=Leaf)
   byte etype;           //0F, dirent type
   byte name[40];	//10, name
   u32  ncase1;          //38, case fold (first 32 bytes)
   byte ncase2;          //3C, case fold (next 8 bytes)
   byte pad1;            //3D, MBZ
   u16 hsn2;             //3E, more node high bits
   }

Base name drops from 48 to 40, to accommodate the case-folding bits.
The LFN entries could have a similar modification.


The hsn member adds 5 more bits to lsn, rsn, and psn; extending each 
from 16 to 21 bits.

Though, hsn2 could potentially extend the size of the node index, 
increasing maximum directory size from 2 million files to 64 billion 
files. Granted, a hard limit of 2 million files in a directory is 
probably fairly reasonable (given the existing upper limit on my actual 
HDD's is seemingly around 3600 files in a directory).

....