Path: ...!Xl.tags.giganews.com!local-1.nntp.ord.giganews.com!news.giganews.com.POSTED!not-for-mail NNTP-Posting-Date: Sat, 04 May 2024 15:51:17 +0000 Subject: Re: Threads across programming languages Newsgroups: comp.lang.c++,comp.lang.c References: From: Ross Finlayson Date: Sat, 4 May 2024 08:51:23 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Message-ID: Lines: 96 X-Usenet-Provider: http://www.giganews.com X-Trace: sv3-iaPbfhi91Ya7Xyqz3SzOd5Ci3OEvCbfpl081K1uGs0dq3SX0iQxOnbda4RpDwrePtNi6um0QTaygwBs!odiOSW7Si6VR7GLpbiQAL/Zl57hw/BubRhvK49ZSciyfpEN7DHbwvLdbV4uimwzuGvR6Y4oqAQ== X-Complaints-To: abuse@giganews.com X-DMCA-Notifications: http://www.giganews.com/info/dmca.html X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your complaint properly X-Postfilter: 1.3.40 Bytes: 5604 On 05/03/2024 08:47 PM, Chris M. Thomasson wrote: > On 5/3/2024 8:44 PM, Chris M. Thomasson wrote: >> On 4/30/2024 2:04 AM, Stefan Ram wrote: >>> ram@zedat.fu-berlin.de (Stefan Ram) wrote or quoted: >>>> The GIL only prevents multiple Python statements from being >>>> interpreted simultaneously, but if you're waiting on inputs (like >>>> sockets), it's not active, so that could be distributed across >>>> multiple cores. >>> >>> Disclaimer: This is not on-topic here as it discusses Python, >>> not C or C++. >>> >>> FWIW, here's some multithreaded Python code modeled after what >>> I use in an application. >>> >>> I am using Python to prepare a press review for me, getting article >>> headers from several newssites, removing all headers matching a list >>> of regexps, and integrating everything into a single HTML resource. >>> (I do not like to read about Lindsay Lohan, for example, so articles >>> with the text "Lindsay Lohan" will not show up on my HTML review.) >>> >>> I'm usually downloading all pages at once using Python threads, >>> which will make sure that a thread uses the CPU while another >>> thread is waiting for TCP/IP data. This is the code, taken from >>> my Python program and a bit simplified: >>> >>> from multiprocessing.dummy import Pool >>> >>> ... >>> >>> with Pool( 9 if fast_internet else 1 )as pool: >>> for i in range( 9 ): >>> content[ i ] = pool.apply_async( fetch,[ uris[ i ] ]) >>> pool.close() >>> pool.join() >>> >>> . I'm using my "fetch" function to fetch a single URI, and the >>> loop starts nine threads within a thread pool to fetch the >>> content of those nine URIs "in parallel". This is observably >>> faster than corresponding sequential code. >>> >>> (However, sometimes I have a slow connection and have to download >>> sequentially in order not to overload the slow connection, which >>> would result in stalled downloads. To accomplish this, I just >>> change the "9" to "1" in the first line above.) >>> >>> In case you wonder about the "dummy": >>> >>> |The multiprocessing.dummy module module provides a wrapper >>> |for the multiprocessing module, except implemented using >>> |thread-based concurrency. >>> | >>> |It provides a drop-in replacement for multiprocessing, >>> |allowing a program that uses the multiprocessing API to >>> |switch to threads with a single change to import statements. >>> >>> . So, this is an area where multithreading the Python way is easy >>> to use and enhances performance even in the presence of the GIL! >> >> Agreed. However, its a very small sample. Try to download 60,000 files >> concurrently from different sources all at once. This can be where the >> single lock messes with performance... > > Certain sources are faster than others. That's always fun... Think of > timeout logic... ;^D In re-routines timeout logic is implemented because they eventually come up and if expired then are retired. Now, using words like retire gets involved when it's contextual all the way to mu-ops of the core processor pipeline and the notions of the usual model of speculative execution in modern chips about mu-ops, pipelines, caches, and the execution order and memory barriers and ordering guarantees of instruction according to the chip. Here though it means that implementing time out in open items, gets involved checking each item at an interval that represents the hard-timeout vis-a-vis the "it's expired" timeout. So in re-routines is that there's simply enough an auxiliary data structure a task-set besides a task-queue, and going through the items to finding expired items, yet, that's its own sort of busy-working data structure, in a world where items have apiece their own granular timeout lifetimes and intervals. It's similar for open connections and something like as sweeper/closer, with regards to protocol timeouts, socket timeouts, and these kinds of things, with regards to whatever streams are implemented in whatever system or user-space streams from sockets or datagrams. Something like XmlHttpRequest or whatwg fetch, runs in its own threads, sort of invisibly to a usual event-loop.