Path: ...!weretis.net!feeder8.news.weretis.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: "Chris M. Thomasson" Newsgroups: comp.lang.c++,comp.lang.c Subject: Re: Threads across programming languages Date: Sat, 4 May 2024 13:03:56 -0700 Organization: A noiseless patient Spider Lines: 109 Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Date: Sat, 04 May 2024 22:03:56 +0200 (CEST) Injection-Info: dont-email.me; posting-host="00566bb81b0a3452542610785f934900"; logging-data="1458315"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/m3w+f8IlfzFkEkZ7ohHJ1iF9x4DhMBDg=" User-Agent: Mozilla Thunderbird Cancel-Lock: sha1:tc8GQIVbgGb7sGAAquEiegSXRWk= Content-Language: en-US In-Reply-To: Bytes: 6541 On 5/4/2024 8:51 AM, Ross Finlayson wrote: > On 05/03/2024 08:47 PM, Chris M. Thomasson wrote: >> On 5/3/2024 8:44 PM, Chris M. Thomasson wrote: >>> On 4/30/2024 2:04 AM, Stefan Ram wrote: >>>> ram@zedat.fu-berlin.de (Stefan Ram) wrote or quoted: >>>>> The GIL only prevents multiple Python statements from being >>>>> interpreted simultaneously, but if you're waiting on inputs (like >>>>> sockets), it's not active, so that could be distributed across >>>>> multiple cores. >>>> >>>>    Disclaimer: This is not on-topic here as it discusses Python, >>>>    not C or C++. >>>> >>>>    FWIW, here's some multithreaded Python code modeled after what >>>>    I use in an application. >>>> >>>>    I am using Python to prepare a press review for me, getting article >>>>    headers from several newssites, removing all headers matching a list >>>>    of regexps, and integrating everything into a single HTML resource. >>>>    (I do not like to read about Lindsay Lohan, for example, so articles >>>>    with the text "Lindsay Lohan" will not show up on my HTML review.) >>>> >>>>    I'm usually downloading all pages at once using Python threads, >>>>    which will make sure that a thread uses the CPU while another >>>>    thread is waiting for TCP/IP data. This is the code, taken from >>>>    my Python program and a bit simplified: >>>> >>>> from multiprocessing.dummy import Pool >>>> >>>> ... >>>> >>>> with Pool( 9 if fast_internet else 1 )as pool: >>>>      for i in range( 9 ): >>>>          content[ i ] = pool.apply_async( fetch,[ uris[ i ] ]) >>>>      pool.close() >>>>      pool.join() >>>> >>>>    . I'm using my "fetch" function to fetch a single URI, and the >>>>    loop starts nine threads within a thread pool to fetch the >>>>    content of those nine URIs "in parallel". This is observably >>>>    faster than corresponding sequential code. >>>> >>>>    (However, sometimes I have a slow connection and have to download >>>>    sequentially in order not to overload the slow connection, which >>>>    would result in stalled downloads. To accomplish this, I just >>>>    change the "9" to "1" in the first line above.) >>>> >>>>    In case you wonder about the "dummy": >>>> >>>> |The multiprocessing.dummy module module provides a wrapper >>>> |for the multiprocessing module, except implemented using >>>> |thread-based concurrency. >>>> | >>>> |It provides a drop-in replacement for multiprocessing, >>>> |allowing a program that uses the multiprocessing API to >>>> |switch to threads with a single change to import statements. >>>> >>>>    . So, this is an area where multithreading the Python way is easy >>>>    to use and enhances performance even in the presence of the GIL! >>> >>> Agreed. However, its a very small sample. Try to download 60,000 files >>> concurrently from different sources all at once. This can be where the >>> single lock messes with performance... >> >> Certain sources are faster than others. That's always fun... Think of >> timeout logic... ;^D > > In re-routines timeout logic is implemented because they eventually > come up and if expired then are retired. > > Now, using words like retire gets involved when it's contextual > all the way to mu-ops of the core processor pipeline and the > notions of the usual model of speculative execution in modern > chips about mu-ops, pipelines, caches, and the execution order > and memory barriers and ordering guarantees of instruction > according to the chip. > > Here though it means that implementing time out in open > items, gets involved checking each item at an interval > that represents the hard-timeout vis-a-vis the "it's expired" > timeout. > > So in re-routines is that there's simply enough an auxiliary > data structure a task-set besides a task-queue, and going > through the items to finding expired items, yet, that's its > own sort of busy-working data structure, in a world where > items have apiece their own granular timeout lifetimes and intervals. > > It's similar for open connections and something like as sweeper/closer, > with regards to protocol timeouts, socket timeouts, and these kinds > of things, with regards to whatever streams are implemented in > whatever system or user-space streams from sockets or datagrams. > > Something like XmlHttpRequest or whatwg fetch, runs in its > own threads, sort of invisibly to a usual event-loop. > > The timeout logic was fun to play with back when I was programming server code. A connection would come in, and be very fast get its job done, got its result: over and out. Now, when a connection would come in, do a little something then stall for a while... My time code would flag it as a potential stalled connection. The problem is a bad actor can make a connection, send some data, then stop. Make a thousand others that do it. Make another ten thousand connections that do it via infected proxy computers. I wrote a program that simulated these scenarios. The timeout code needed to refer to a little database the server had about prior "potential" bad actors. It's a touchy situation to say the least.