Deutsch English Français Italiano |
<v164bb$1cg4b$3@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!weretis.net!feeder8.news.weretis.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> Newsgroups: comp.lang.c++,comp.lang.c Subject: Re: Threads across programming languages Date: Sat, 4 May 2024 13:03:56 -0700 Organization: A noiseless patient Spider Lines: 109 Message-ID: <v164bb$1cg4b$3@dont-email.me> References: <GIL-20240429161553@ram.dialup.fu-berlin.de> <multithreading-20240430095639@ram.dialup.fu-berlin.de> <v14av0$10qlv$1@dont-email.me> <v14b45$10qlv$2@dont-email.me> <CsOcnXFwdvXoxKv7nZ2dnZfqnPidnZ2d@giganews.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Date: Sat, 04 May 2024 22:03:56 +0200 (CEST) Injection-Info: dont-email.me; posting-host="00566bb81b0a3452542610785f934900"; logging-data="1458315"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/m3w+f8IlfzFkEkZ7ohHJ1iF9x4DhMBDg=" User-Agent: Mozilla Thunderbird Cancel-Lock: sha1:tc8GQIVbgGb7sGAAquEiegSXRWk= Content-Language: en-US In-Reply-To: <CsOcnXFwdvXoxKv7nZ2dnZfqnPidnZ2d@giganews.com> Bytes: 6541 On 5/4/2024 8:51 AM, Ross Finlayson wrote: > On 05/03/2024 08:47 PM, Chris M. Thomasson wrote: >> On 5/3/2024 8:44 PM, Chris M. Thomasson wrote: >>> On 4/30/2024 2:04 AM, Stefan Ram wrote: >>>> ram@zedat.fu-berlin.de (Stefan Ram) wrote or quoted: >>>>> The GIL only prevents multiple Python statements from being >>>>> interpreted simultaneously, but if you're waiting on inputs (like >>>>> sockets), it's not active, so that could be distributed across >>>>> multiple cores. >>>> >>>> Disclaimer: This is not on-topic here as it discusses Python, >>>> not C or C++. >>>> >>>> FWIW, here's some multithreaded Python code modeled after what >>>> I use in an application. >>>> >>>> I am using Python to prepare a press review for me, getting article >>>> headers from several newssites, removing all headers matching a list >>>> of regexps, and integrating everything into a single HTML resource. >>>> (I do not like to read about Lindsay Lohan, for example, so articles >>>> with the text "Lindsay Lohan" will not show up on my HTML review.) >>>> >>>> I'm usually downloading all pages at once using Python threads, >>>> which will make sure that a thread uses the CPU while another >>>> thread is waiting for TCP/IP data. This is the code, taken from >>>> my Python program and a bit simplified: >>>> >>>> from multiprocessing.dummy import Pool >>>> >>>> ... >>>> >>>> with Pool( 9 if fast_internet else 1 )as pool: >>>> for i in range( 9 ): >>>> content[ i ] = pool.apply_async( fetch,[ uris[ i ] ]) >>>> pool.close() >>>> pool.join() >>>> >>>> . I'm using my "fetch" function to fetch a single URI, and the >>>> loop starts nine threads within a thread pool to fetch the >>>> content of those nine URIs "in parallel". This is observably >>>> faster than corresponding sequential code. >>>> >>>> (However, sometimes I have a slow connection and have to download >>>> sequentially in order not to overload the slow connection, which >>>> would result in stalled downloads. To accomplish this, I just >>>> change the "9" to "1" in the first line above.) >>>> >>>> In case you wonder about the "dummy": >>>> >>>> |The multiprocessing.dummy module module provides a wrapper >>>> |for the multiprocessing module, except implemented using >>>> |thread-based concurrency. >>>> | >>>> |It provides a drop-in replacement for multiprocessing, >>>> |allowing a program that uses the multiprocessing API to >>>> |switch to threads with a single change to import statements. >>>> >>>> . So, this is an area where multithreading the Python way is easy >>>> to use and enhances performance even in the presence of the GIL! >>> >>> Agreed. However, its a very small sample. Try to download 60,000 files >>> concurrently from different sources all at once. This can be where the >>> single lock messes with performance... >> >> Certain sources are faster than others. That's always fun... Think of >> timeout logic... ;^D > > In re-routines timeout logic is implemented because they eventually > come up and if expired then are retired. > > Now, using words like retire gets involved when it's contextual > all the way to mu-ops of the core processor pipeline and the > notions of the usual model of speculative execution in modern > chips about mu-ops, pipelines, caches, and the execution order > and memory barriers and ordering guarantees of instruction > according to the chip. > > Here though it means that implementing time out in open > items, gets involved checking each item at an interval > that represents the hard-timeout vis-a-vis the "it's expired" > timeout. > > So in re-routines is that there's simply enough an auxiliary > data structure a task-set besides a task-queue, and going > through the items to finding expired items, yet, that's its > own sort of busy-working data structure, in a world where > items have apiece their own granular timeout lifetimes and intervals. > > It's similar for open connections and something like as sweeper/closer, > with regards to protocol timeouts, socket timeouts, and these kinds > of things, with regards to whatever streams are implemented in > whatever system or user-space streams from sockets or datagrams. > > Something like XmlHttpRequest or whatwg fetch, runs in its > own threads, sort of invisibly to a usual event-loop. > > The timeout logic was fun to play with back when I was programming server code. A connection would come in, and be very fast get its job done, got its result: over and out. Now, when a connection would come in, do a little something then stall for a while... My time code would flag it as a potential stalled connection. The problem is a bad actor can make a connection, send some data, then stop. Make a thousand others that do it. Make another ten thousand connections that do it via infected proxy computers. I wrote a program that simulated these scenarios. The timeout code needed to refer to a little database the server had about prior "potential" bad actors. It's a touchy situation to say the least.