Article <vmbtcq$3lp99$1@dont-email.me>

Deutsch English Français Italiano
<vmbtcq$3lp99$1@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!eternal-september.org!.POSTED!not-for-mail
From: David Brown <david.brown@hesbynett.no>
Newsgroups: comp.arch
Subject: Re: Segments
Date: Thu, 16 Jan 2025 22:23:38 +0100
Organization: A noiseless patient Spider
Lines: 84
Message-ID: <vmbtcq$3lp99$1@dont-email.me>
References: <vdlgl9$3kq50$2@dont-email.me> <vdtmv9$16lu8$1@dont-email.me>
 <2024Oct6.150415@mips.complang.tuwien.ac.at>
 <vl7m2b$6iat$1@paganini.bofh.team>
 <2025Jan3.093849@mips.complang.tuwien.ac.at>
 <vlcddh$j2gr$1@paganini.bofh.team>
 <2025Jan5.121028@mips.complang.tuwien.ac.at>
 <vleuou$rv85$1@paganini.bofh.team>
 <ndamnjpnt8pkllatkdgq9qn2turaao1f0a@4ax.com>
 <2025Jan6.092443@mips.complang.tuwien.ac.at> <vlgreu$1lsr9$1@dont-email.me>
 <vlhjtm$1qrs5$1@dont-email.me> <bdZeP.23664$Hfb1.16566@fx46.iad>
 <vlj1pg$25p0e$1@dont-email.me> <87cygo97dl.fsf@nosuchdomain.example.com>
 <vm7mvi$2rr87$1@dont-email.me> <vmaig9$3ehn7$1@dont-email.me>
 <vmat2e$3geg9$1@dont-email.me> <874j1y8nxy.fsf@nosuchdomain.example.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 16 Jan 2025 22:23:39 +0100 (CET)
Injection-Info: dont-email.me; posting-host="274f6314c3d31355a7f854f44f61af25";
	logging-data="3859753"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1/1HnvIeY2U6HP+rJA8LCKBmH8td0JXG7A="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:WEOGJB2y+w4GkA1n99utZC5Nyhs=
Content-Language: en-GB
In-Reply-To: <874j1y8nxy.fsf@nosuchdomain.example.com>
Bytes: 6023

On 16/01/2025 22:10, Keith Thompson wrote:
> David Brown <david.brown@hesbynett.no> writes:
>> On 16/01/2025 10:11, Terje Mathisen wrote:
>>> Thomas Koenig wrote:
>>>> Keith Thompson <Keith.S.Thompson+u@gmail.com> schrieb:
>>>>> Thomas Koenig <tkoenig@netcologne.de> writes:
>>>>> [...]
>>>>>> CHERY targets C, which on the one hand, I understand (there's a
>>>>>> ton of C code out there), but trying to retrofit a safe memory
>>>>>> model onto C seems a bit awkward - it might have been better to
>>>>>> target a language which has arrays in the first place, unlike C.
>>>>> [...]
>>>>>
>>>>> C does have arrays.
>>>>
>>>> Sort of - they decay into pointers at first sight.
>>>>
>>>> But what I should have written was "multi-dimensional arrays",
>>>> with a reasonable way of handling them.
>>>>
>>> Rust provides an interesting data point here:
>>> It has Vec<> which is always implemented as a dope vector, i.e. a
>>> header which contains the starting point and current length, along
>>> with allocated size. For multidimendional work, the natural mapping
>>> is Vec<Vec<>>, i.e. similar to classic C arrays of arrays, but with
>>> boundary checking.
>>> However, in my own testing I have found that it is often faster to
>>> flatten those multi-dim vectors, and instead use explicit
>>> multiplication to get the actual position:
>>>     array[y][x] -> array[y*width + x]
> 
> Note that this will inhibit bounds checking on the inner dimension.
> That might be part of the reason for the improvement in speed.
> 
> For example, given int array[10][10], array[0][11] is out of bounds,
> even if it logically refers to the same location as array[1][0].  This
> results in undefined behavior in C, and perhaps some kind of exception
> in a language that requires bounds checking.  If you do this manually by
> defining a 1d array, any checking applies only to the entire array.
> 
>> That does not surprise me.  Vec<> in Rust is very similar to
>> std::vector<> in C++, as far as I know (correct me if that's wrong).
>> So a vector of vectors of int is not contiguous or consistent - each
>> subvector can have a different current size and capacity.  Doing a
>> bounds check for accessing xs[i][j] (or in C++ syntax, xs.at(i).at(j)
>> when you want bounds checking) means first reading the current size
>> member of the outer vector, and checking "i" against that.  Then xs[i]
>> is found (by adding "i * sizeof(vector)" to the data pointer stored in
>> the outer vector).  That is looked up to find the current size of this
>> inner vector for bounds checking, then the actual data can be found.
> 
> I'm not familiar with Rust's Vec<>, but C++'s std::vector<> guarantees
> that the elements are stored contiguously.  But the std::vector<> object
> itself doesn't contain those elements; it's a fixed-size chunk of data
> (basically a struct in C terms) whose size doesn't change regardless of
> the number of elements (and typically regardless of the element type).
> So a std::vector<std::vector<int>> will result in the data for each row
> being stored contiguously, but the rows themselves will be allocated
> dynamically.
> 

Yes, exactly.

Of course you could do as Terje did in Rust - make a std::vector<> of 
size N x M and do the "i * N + j" calculation manually.  Now that C++23 
has a multi-parameter subscript operator, you can do that quite neatly 
in a little wrapper class around a std::vector<> with a nice access 
operator.  But it's still more efficient to use a std::array<> if you 
know the sizes at compile time.

>> This is /completely/ different from classic C multi-dimensional
>> arrays. It is more akin to a one-dimensional C array of pointers to
>> individually allocated one-dimensional C arrays - but even less
>> efficient due to an extra layer of indirection.
>>
>> If you know the size of the data at compile time, then in C++ you have
>> std::array<> where the information about size is carried in the type,
>> with no run-time cost.  A nested std::array<> is a perfectly good and
>> efficient multi-dimensional array with runtime bounds checking if you
>> want to use it, as well as having value semantics (no decay to pointer
>> types in expressions).  I would guess there is something equivalent in
>> Rust ?
>