Path: ...!weretis.net!feeder8.news.weretis.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>
Newsgroups: comp.lang.c++,comp.lang.c
Subject: Re: Threads across programming languages
Date: Sat, 4 May 2024 13:03:56 -0700
Organization: A noiseless patient Spider
Lines: 109
Message-ID: <v164bb$1cg4b$3@dont-email.me>
References: <GIL-20240429161553@ram.dialup.fu-berlin.de>
 <multithreading-20240430095639@ram.dialup.fu-berlin.de>
 <v14av0$10qlv$1@dont-email.me> <v14b45$10qlv$2@dont-email.me>
 <CsOcnXFwdvXoxKv7nZ2dnZfqnPidnZ2d@giganews.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 04 May 2024 22:03:56 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="00566bb81b0a3452542610785f934900";
	logging-data="1458315"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX1/m3w+f8IlfzFkEkZ7ohHJ1iF9x4DhMBDg="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:tc8GQIVbgGb7sGAAquEiegSXRWk=
Content-Language: en-US
In-Reply-To: <CsOcnXFwdvXoxKv7nZ2dnZfqnPidnZ2d@giganews.com>
Bytes: 6541

On 5/4/2024 8:51 AM, Ross Finlayson wrote:
> On 05/03/2024 08:47 PM, Chris M. Thomasson wrote:
>> On 5/3/2024 8:44 PM, Chris M. Thomasson wrote:
>>> On 4/30/2024 2:04 AM, Stefan Ram wrote:
>>>> ram@zedat.fu-berlin.de (Stefan Ram) wrote or quoted:
>>>>> The GIL only prevents multiple Python statements from being
>>>>> interpreted simultaneously, but if you're waiting on inputs (like
>>>>> sockets), it's not active, so that could be distributed across
>>>>> multiple cores.
>>>>
>>>>    Disclaimer: This is not on-topic here as it discusses Python,
>>>>    not C or C++.
>>>>
>>>>    FWIW, here's some multithreaded Python code modeled after what
>>>>    I use in an application.
>>>>
>>>>    I am using Python to prepare a press review for me, getting article
>>>>    headers from several newssites, removing all headers matching a list
>>>>    of regexps, and integrating everything into a single HTML resource.
>>>>    (I do not like to read about Lindsay Lohan, for example, so articles
>>>>    with the text "Lindsay Lohan" will not show up on my HTML review.)
>>>>
>>>>    I'm usually downloading all pages at once using Python threads,
>>>>    which will make sure that a thread uses the CPU while another
>>>>    thread is waiting for TCP/IP data. This is the code, taken from
>>>>    my Python program and a bit simplified:
>>>>
>>>> from multiprocessing.dummy import Pool
>>>>
>>>> ...
>>>>
>>>> with Pool( 9 if fast_internet else 1 )as pool:
>>>>      for i in range( 9 ):
>>>>          content[ i ] = pool.apply_async( fetch,[ uris[ i ] ])
>>>>      pool.close()
>>>>      pool.join()
>>>>
>>>>    . I'm using my "fetch" function to fetch a single URI, and the
>>>>    loop starts nine threads within a thread pool to fetch the
>>>>    content of those nine URIs "in parallel". This is observably
>>>>    faster than corresponding sequential code.
>>>>
>>>>    (However, sometimes I have a slow connection and have to download
>>>>    sequentially in order not to overload the slow connection, which
>>>>    would result in stalled downloads. To accomplish this, I just
>>>>    change the "9" to "1" in the first line above.)
>>>>
>>>>    In case you wonder about the "dummy":
>>>>
>>>> |The multiprocessing.dummy module module provides a wrapper
>>>> |for the multiprocessing module, except implemented using
>>>> |thread-based concurrency.
>>>> |
>>>> |It provides a drop-in replacement for multiprocessing,
>>>> |allowing a program that uses the multiprocessing API to
>>>> |switch to threads with a single change to import statements.
>>>>
>>>>    . So, this is an area where multithreading the Python way is easy
>>>>    to use and enhances performance even in the presence of the GIL!
>>>
>>> Agreed. However, its a very small sample. Try to download 60,000 files
>>> concurrently from different sources all at once. This can be where the
>>> single lock messes with performance...
>>
>> Certain sources are faster than others. That's always fun... Think of
>> timeout logic... ;^D
> 
> In re-routines timeout logic is implemented because they eventually
> come up and if expired then are retired.
> 
> Now, using words like retire gets involved when it's contextual
> all the way to mu-ops of the core processor pipeline and the
> notions of the usual model of speculative execution in modern
> chips about mu-ops, pipelines, caches, and the execution order
> and memory barriers and ordering guarantees of instruction
> according to the chip.
> 
> Here though it means that implementing time out in open
> items, gets involved checking each item at an interval
> that represents the hard-timeout vis-a-vis the "it's expired"
> timeout.
> 
> So in re-routines is that there's simply enough an auxiliary
> data structure a task-set besides a task-queue, and going
> through the items to finding expired items, yet, that's its
> own sort of busy-working data structure, in a world where
> items have apiece their own granular timeout lifetimes and intervals.
> 
> It's similar for open connections and something like as sweeper/closer,
> with regards to protocol timeouts, socket timeouts, and these kinds
> of things, with regards to whatever streams are implemented in
> whatever system or user-space streams from sockets or datagrams.
> 
> Something like XmlHttpRequest or whatwg fetch, runs in its
> own threads, sort of invisibly to a usual event-loop.
> 
> 

The timeout logic was fun to play with back when I was programming 
server code. A connection would come in, and be very fast get its job 
done, got its result: over and out. Now, when a connection would come 
in, do a little something then stall for a while... My time code would 
flag it as a potential stalled connection. The problem is a bad actor 
can make a connection, send some data, then stop. Make a thousand others 
that do it. Make another ten thousand connections that do it via 
infected proxy computers. I wrote a program that simulated these 
scenarios. The timeout code needed to refer to a little database the 
server had about prior "potential" bad actors. It's a touchy situation 
to say the least.