Article <f39cc5fb9e74d80d385797f0a5e1c3a0@www.novabbs.org>

Warning: mysqli::__construct(): (HY000/1203): User howardkn already has more than 'max_user_connections' active connections in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\includes\artfuncs.php on line 21
Failed to connect to MySQL: (1203) User howardkn already has more than 'max_user_connections' active connections
Warning: mysqli::query(): Couldn't fetch mysqli in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\index.php on line 66
Article <f39cc5fb9e74d80d385797f0a5e1c3a0@www.novabbs.org>

Deutsch English Français Italiano

<f39cc5fb9e74d80d385797f0a5e1c3a0@www.novabbs.org>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!weretis.net!feeder9.news.weretis.net!news.nk.ca!rocksolid2!i2pn2.org!.POSTED!not-for-mail
From: mitchalsup@aol.com (MitchAlsup1)
Newsgroups: comp.arch
Subject: Re: Memory ordering
Date: Sat, 16 Nov 2024 00:51:36 +0000
Organization: Rocksolid Light
Message-ID: <f39cc5fb9e74d80d385797f0a5e1c3a0@www.novabbs.org>
References: <vfono1$14l9r$1@dont-email.me> <vgm4vj$3d2as$1@dont-email.me> <vgm5cb$3d2as$3@dont-email.me> <YfxXO.384093$EEm7.56154@fx16.iad> <vh4530$2mar5$1@dont-email.me> <-rKdnTO4LdoWXKj6nZ2dnZfqnPWdnZ2d@supernews.com> <vh5t5b$312cl$2@dont-email.me> <5yqdnU9eL_Y_GKv6nZ2dnZfqn_GdnZ2d@supernews.com> <2024Nov15.082512@mips.complang.tuwien.ac.at> <vh7rlr$3fu9i$1@dont-email.me> <2024Nov15.182737@mips.complang.tuwien.ac.at> <vh8cbo$3j8c5$1@dont-email.me> <vh8gk5$3juug$2@dont-email.me> <vh8ls8$3l4bh$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Info: i2pn2.org;
	logging-data="2658438"; mail-complaints-to="usenet@i2pn2.org";
	posting-account="o5SwNDfMfYu6Mv4wwLiW6e/jbA93UAdzFodw5PEa6eU";
User-Agent: Rocksolid Light
X-Rslight-Posting-User: cb29269328a20fe5719ed6a1c397e21f651bda71
X-Rslight-Site: $2y$10$.sdSNEATrDhci7KCF2BUO.La5NjNnnicHrRPVxqYei7lxgKu3RHCC
X-Spam-Checker-Version: SpamAssassin 4.0.0
Bytes: 6743
Lines: 129

On Fri, 15 Nov 2024 23:35:22 +0000, BGB wrote:

> On 11/15/2024 4:05 PM, Chris M. Thomasson wrote:
>> On 11/15/2024 12:53 PM, BGB wrote:
>>> On 11/15/2024 11:27 AM, Anton Ertl wrote:
>>>> jseigh <jseigh_es00@xemaps.com> writes:
>>>>> Anybody doing that sort of programming, i.e. lock-free or distributed
>>>>> algorithms, who can't handle weakly consistent memory models, shouldn't
>>>>> be doing that sort of programming in the first place.
>>>>
>>>> Do you have any argument that supports this claim.
>>>>
>>>>> Strongly consistent memory won't help incompetence.
>>>>
>>>> Strong words to hide lack of arguments?
>>>>
>>>
>>> In my case, as I see it:
>>>    The tradeoff is more about implementation cost, performance, etc.
>>>
>>> Weak model:
>>>    Cheaper (and simpler) to implement;
>>>    Performs better when there is no need to synchronize memory;
>>>    Performs worse when there is need to synchronize memory;
>>>    ...
>> [...]
>>
>> A TSO from a weak memory model is as it is. It should not necessarily
>> perform "worse" than other systems that have TSO as a default. The
>> weaker models give us flexibility. Any weak memory model should be able
>> to give sequential consistency via using the right membars in the right
>> places.
>>
>
> The speed difference is mostly that, in a weak model, the L1 cache
> merely needs to fetch memory from the L2 or similar, may write to it
> whenever, and need not proactively store back results.
>
> As I understand it, a typical TSO like model will require, say:
> Any L1 cache that wants to write to a cache line, needs to explicitly
> request write ownership over that cache line;

The cache line may have been fetched from a core which modified the
data, and handed this line directly to this requesting core on a
typical read. So, it is possible for the line to show up with
write permission even if the requesting core did not ask for write
permission. So, not all lines being written have to request owner-
ship.

> Any attempt by other cores to access this line,

You are being rather loose with your time analysis in this question::

Access this line before write permission has been requested,
or
Access this line after write permission has been requested but
before it has arrived,
or
Access this line after write permission has arrived.

>                                                may require the L2 cache
> to send a message to the core currently holding the cache line for
> writing to write back its contents, with the request unable to be
> handled until after the second core has written back the dirty cache
> line.

L2 has to know something about how L1 has the line, and likely which
core cache the data is in.

> This would create potential for significantly more latency in cases
> where multiple cores touch the same part of memory; albeit the cores
> will see each others' memory stores.

One can ARGUE that this is a good thing as it makes latency part
of the memory access model. More interfering accesses=higher
latency.

>
> So, initially, weak model can be faster due to not needing any
> additional handling.
>
>
> But... Any synchronization points, such as a barrier or locking or
> releasing a mutex, will require manually flushing the cache with a weak
> model.

Not necessarily:: My 66000 uses causal memory consistency, yet when
an ATOMIC event begins it reverts to sequential consistency until
the end of the event where it reverts back to causal. Use of MMI/O
space reverts to sequential consistency, while access to config
space reverts all the way back to strongly ordered.

>        And, locking/releasing the mutex itself will require a mechanism
> that is consistent between cores (such as volatile atomic swaps or
> similar, which may still be weak as a volatile-atomic-swap would still
> not be atomic from the POV of the L2 cache; and an MMIO interface could
> be stronger here).
>
>
> Seems like there could possibly be some way to skip some of the cache
> flushing if one could verify that a mutex is only being locked and
> unlocked on a single core.
>
> Issue then is how to deal with trying to lock a mutex which has thus far
> been exclusive to a single core. One would need some way for the core
> that last held the mutex to know that it needs to perform an L1 cache
> flush.

This seems to be a job for Cache Consistency.

> Though, one possibility could be to leave this part to the OS
> scheduler/syscall/...

The OS wants nothing to do with this.

>                       mechanism; so the core that wants to lock the
> mutex signals its intention to do so via the OS, and the next time the
> core that last held the mutex does a syscall (or tries to lock the mutex
> again), the handler sees this, then performs the L1 flush and flags the
> mutex as multi-core safe (at which point, the parties will flush L1s at
> each mutex lock, though possibly with a timeout count so that, if the
> mutex has been single-core for N locks, it reverts to single-core
> behavior).
>
> This could reduce the overhead of "frivolous mutex locking" in programs
> that are otherwise single-threaded or single processor (leaving the
> cache flushes for the ones that are in-fact being used for
> synchronization purposes).
>
> ....