Article <v3p9hj$tda7$1@dont-email.me>

Deutsch English Français Italiano
<v3p9hj$tda7$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: BGB <cr88192@gmail.com>
Newsgroups: comp.lang.c
Subject: Re: C23 thoughts and opinions
Date: Wed, 5 Jun 2024 04:01:28 -0500
Organization: A noiseless patient Spider
Lines: 156
Message-ID: <v3p9hj$tda7$1@dont-email.me>
References: <v2l828$18v7f$1@dont-email.me> <v2unfe$3alds$1@dont-email.me>
 <v2v637$3cunk$1@raubtier-asyl.eternal-september.org>
 <v3dmq6$2edto$1@dont-email.me> <hOu6O.6223$xPJ1.1866@fx09.iad>
 <20240602110213.00003b25@yahoo.com> <v3hn2j$3bdjn$1@dont-email.me>
 <20240602162914.0000648c@yahoo.com> <v3ii22$3g9ch$1@dont-email.me>
 <20240603120043.00003511@yahoo.com> <v3kra8$3vgef$1@dont-email.me>
 <Kvm7O.5231$Ktt5.2929@fx40.iad> <20240603225856.0000679d@yahoo.com>
 <3uq7O.9130$nd%8.1870@fx45.iad> <20240603221239.245@kylheku.com>
 <v3nk1q$h9jb$1@dont-email.me> <v3p38t$s5n4$2@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 05 Jun 2024 11:01:39 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="a31d2837ac1d7610bb3e7fe26372f67a";
	logging-data="963911"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1/nyCx2SpoextIbQ6LW+a3IEBGiuP4AmWM="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:Lr2gXqRV/eTqnJQxC2tjBf9aIcM=
Content-Language: en-US
In-Reply-To: <v3p38t$s5n4$2@dont-email.me>
Bytes: 8345

On 6/5/2024 2:14 AM, Lawrence D'Oliveiro wrote:
> On Tue, 4 Jun 2024 12:48:31 -0500, BGB wrote:
> 
>> Though, I do have some questionable experimental features, like a
>> compiler option to cause arrays and pointers to be bounds-checked, which
>> is sometimes useful in debugging (but in some cases can add bugs of its
>> own; also makes the binaries bigger and negatively effects performance,
>> ...).
> 
> I remember some research being done into this, back in the days of Pascal.
> 

Seemingly, by the time I started learning programming, Pascal had mostly 
already died.

So, I started out at first poking around in QBasic in elementary school. 
But, by around 6th grade or so, had hit the limits of what I could do 
with QBasic, and was learning C (IIRC, partly motivated by things like 
the Doom source code).

Initially, I started trying to use Turbo C, but it wasn't long until I 
realized that 16-bit real-mode wasn't working for me, and jumped over to 
Cygwin (and dual-booting NT4 and Linux).

Then, around 9th grade, went over to Win2K (still dual-booting Linux), 
but by the end of high-school was mostly back to just running Windows.

As noted, in this era, G++ in Cygwin seemingly did not work for the most 
part.

I had also looked at Java (which was much hyped at the time), and had 
also started messing around with Scheme and later JavaScript. Mostly 
ended up sticking with C though (at the time, Java didn't offer much 
reason for why I should use it; it was awkward to write code in, and 
horribly slow at the time).


By the time I graduated HS, WinXP had appeared, but I still kept using 
2K a little longer (until jumping to XP X64, with a then-new Athlon 64).

A few years later, got an Athlon X2, which initially came with 32-bit 
Vista, but I soon reverted to XP X64 (Vista sucked), and kept using XP64 
until later building a new computer and jumping to Windows 7.

....



> Remember that, in Pascal (and Ada), subranges exist as types in their own
> right, not just as bounds of arrays. And this allows the compiler to
> optimize array-bounds checks, and sometimes get rid of them altogether.
> E.g.
> 
>      type
>          boundstype = 1 .. 10;
>      var
>          myarr : array [boundstype] of elttype;
>          index : boundstype;
> 
> With these definitions, an expression like
> 
>      myarr[index]
> 
> doesn’t require any bounds-checking. Of course, assignments to index may
> require bounds-checking, depending on the types of values involved.


For my bounds-checking in C, there are no syntactic changes to C.

It is (almost) invisible to "normal" code (though, if one starts using 
int<->pointer casts, this transparency breaks down).

Writing code that works transparently across both normal and 
bounds-checked modes effectively requires avoiding the casual use of 
casting between pointer and integer types (unless one is willing and 
able to deal with the bounds-checking metadata).

Though, I could have made it more transparent, but this would have 
required turning these casts into a implicit runtime calls.

There is another incomplete mode, which I had called "Bounds Check 
Enforcing" that would turn these into runtime calls, but would have also 
used a different C ABI (with a 128-bit pointer format), with pointers 
following a structure along vaguely similar lines to the CHERI scheme.


In the normal bounds-checked mode, the pointers encode bounds-checking 
metadata in the high-order bits of the pointer (still 64 bits), and any 
pointer operations will adjust these bounds as needed. Trying to 
dereference or load/store an index relative to the pointer, will perform 
a bounds-check prior to the actual memory access; and triggering an 
exception if the access would be out-of-bounds.

Whenever loading a pointer to something like an array or similar, the 
compiler will tag the pointer with the bounds check metadata.

The bounds are approximate, due to limitations in the number of encoding 
bits. The compiler will pad arrays as needed and also pad up the 
bounds-checks (to try to avoid falsely triggering a bounds-check exception).

The scheme effectively represents the bounds as a sort of microfloat, 
with a bias adjustment (adjusted relative to the base pointer). Though, 
how it all works is a little fiddly (it involves trickery with carry 
bits, etc).



In the normal mode, bounds checking is not "enforced", so the compiler 
is mostly responsible for the bounds-checking. In this case, the 
bounds-checking is mostly intended as a debugging aide (but would also 
make the program more resistant against buffer overflow exploits).

The "enforced" mode would be something closer to the CHERI, but with 
more drastic changes needed to the ABI. In this mode only the 128-bit 
pointer format is allowed, and (optionally) all of the registers and 
memory paragraphs will be tagged as to whether they contain a valid 
pointer (or "capability").

Though, a weaker form (without the memory tagging) is a more viable 
option (if arguably weaker). Both cases would use the same pointer 
formats though.

Without the memory tagging, it would still useful for debugging and 
preventing buffer overflow; but can't validly be claimed to offer any 
protection against hostile machine code. However, securing the memory 
system against hostile machine code is a much harder problem (and less 
obvious is how to structure the ABI in a way that both "offers full 
protection" and "can actually run code").

The potential added security of the memory tagging seems like it is 
counter-balanced by there being likely no way to (fully) secure the 
memory space while still allowing the program to do useful work (short 
of turning nearly everything into a system call).

And, unless it can be made "essentially foolproof", then the merit of 
the memory tag bits is called into question (along with whether they 
make sense to justify the added costs and hassle).


In my project, I am designing things for use of some other strategies, 
namely ASLR and per-page ACL checking, which are comparably less 
expensive. Per-page ACL checks still require hardware support, but not 
nearly so steep as memory tag bits; and also doesn't have an obvious 
"Achilles Heel" that could allow it to be sidestepped (basically, ACL 
checks are similar technology to that typically used in filesystems, 
albeit applying it to memory protection).

Meanwhile, ASLR is "weak", but also offers some protection against 
hostile code injection (hostile code can't do much if it doesn't know 
where anything is).

Still, none of this is perfect, and my project as it exists still falls 
short of what I am aiming for here.

....