Path: ...!Xl.tags.giganews.com!local-1.nntp.ord.giganews.com!news.giganews.com.POSTED!not-for-mail
NNTP-Posting-Date: Sat, 04 May 2024 15:51:17 +0000
Subject: Re: Threads across programming languages
Newsgroups: comp.lang.c++,comp.lang.c
References: <GIL-20240429161553@ram.dialup.fu-berlin.de>
 <multithreading-20240430095639@ram.dialup.fu-berlin.de>
 <v14av0$10qlv$1@dont-email.me> <v14b45$10qlv$2@dont-email.me>
From: Ross Finlayson <ross.a.finlayson@gmail.com>
Date: Sat, 4 May 2024 08:51:23 -0700
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101
 Thunderbird/38.6.0
MIME-Version: 1.0
In-Reply-To: <v14b45$10qlv$2@dont-email.me>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Message-ID: <CsOcnXFwdvXoxKv7nZ2dnZfqnPidnZ2d@giganews.com>
Lines: 96
X-Usenet-Provider: http://www.giganews.com
X-Trace: sv3-iaPbfhi91Ya7Xyqz3SzOd5Ci3OEvCbfpl081K1uGs0dq3SX0iQxOnbda4RpDwrePtNi6um0QTaygwBs!odiOSW7Si6VR7GLpbiQAL/Zl57hw/BubRhvK49ZSciyfpEN7DHbwvLdbV4uimwzuGvR6Y4oqAQ==
X-Complaints-To: abuse@giganews.com
X-DMCA-Notifications: http://www.giganews.com/info/dmca.html
X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers
X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your complaint properly
X-Postfilter: 1.3.40
Bytes: 5604

On 05/03/2024 08:47 PM, Chris M. Thomasson wrote:
> On 5/3/2024 8:44 PM, Chris M. Thomasson wrote:
>> On 4/30/2024 2:04 AM, Stefan Ram wrote:
>>> ram@zedat.fu-berlin.de (Stefan Ram) wrote or quoted:
>>>> The GIL only prevents multiple Python statements from being
>>>> interpreted simultaneously, but if you're waiting on inputs (like
>>>> sockets), it's not active, so that could be distributed across
>>>> multiple cores.
>>>
>>>    Disclaimer: This is not on-topic here as it discusses Python,
>>>    not C or C++.
>>>
>>>    FWIW, here's some multithreaded Python code modeled after what
>>>    I use in an application.
>>>
>>>    I am using Python to prepare a press review for me, getting article
>>>    headers from several newssites, removing all headers matching a list
>>>    of regexps, and integrating everything into a single HTML resource.
>>>    (I do not like to read about Lindsay Lohan, for example, so articles
>>>    with the text "Lindsay Lohan" will not show up on my HTML review.)
>>>
>>>    I'm usually downloading all pages at once using Python threads,
>>>    which will make sure that a thread uses the CPU while another
>>>    thread is waiting for TCP/IP data. This is the code, taken from
>>>    my Python program and a bit simplified:
>>>
>>> from multiprocessing.dummy import Pool
>>>
>>> ...
>>>
>>> with Pool( 9 if fast_internet else 1 )as pool:
>>>      for i in range( 9 ):
>>>          content[ i ] = pool.apply_async( fetch,[ uris[ i ] ])
>>>      pool.close()
>>>      pool.join()
>>>
>>>    . I'm using my "fetch" function to fetch a single URI, and the
>>>    loop starts nine threads within a thread pool to fetch the
>>>    content of those nine URIs "in parallel". This is observably
>>>    faster than corresponding sequential code.
>>>
>>>    (However, sometimes I have a slow connection and have to download
>>>    sequentially in order not to overload the slow connection, which
>>>    would result in stalled downloads. To accomplish this, I just
>>>    change the "9" to "1" in the first line above.)
>>>
>>>    In case you wonder about the "dummy":
>>>
>>> |The multiprocessing.dummy module module provides a wrapper
>>> |for the multiprocessing module, except implemented using
>>> |thread-based concurrency.
>>> |
>>> |It provides a drop-in replacement for multiprocessing,
>>> |allowing a program that uses the multiprocessing API to
>>> |switch to threads with a single change to import statements.
>>>
>>>    . So, this is an area where multithreading the Python way is easy
>>>    to use and enhances performance even in the presence of the GIL!
>>
>> Agreed. However, its a very small sample. Try to download 60,000 files
>> concurrently from different sources all at once. This can be where the
>> single lock messes with performance...
>
> Certain sources are faster than others. That's always fun... Think of
> timeout logic... ;^D

In re-routines timeout logic is implemented because they eventually
come up and if expired then are retired.

Now, using words like retire gets involved when it's contextual
all the way to mu-ops of the core processor pipeline and the
notions of the usual model of speculative execution in modern
chips about mu-ops, pipelines, caches, and the execution order
and memory barriers and ordering guarantees of instruction
according to the chip.

Here though it means that implementing time out in open
items, gets involved checking each item at an interval
that represents the hard-timeout vis-a-vis the "it's expired"
timeout.

So in re-routines is that there's simply enough an auxiliary
data structure a task-set besides a task-queue, and going
through the items to finding expired items, yet, that's its
own sort of busy-working data structure, in a world where
items have apiece their own granular timeout lifetimes and intervals.

It's similar for open connections and something like as sweeper/closer,
with regards to protocol timeouts, socket timeouts, and these kinds
of things, with regards to whatever streams are implemented in
whatever system or user-space streams from sockets or datagrams.

Something like XmlHttpRequest or whatwg fetch, runs in its
own threads, sort of invisibly to a usual event-loop.