Deutsch English Français Italiano |
<v14b45$10qlv$2@dont-email.me> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> Newsgroups: comp.lang.c++,comp.lang.c Subject: Re: Threads across programming languages Date: Fri, 3 May 2024 20:47:17 -0700 Organization: A noiseless patient Spider Lines: 64 Message-ID: <v14b45$10qlv$2@dont-email.me> References: <GIL-20240429161553@ram.dialup.fu-berlin.de> <multithreading-20240430095639@ram.dialup.fu-berlin.de> <v14av0$10qlv$1@dont-email.me> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Date: Sat, 04 May 2024 05:47:17 +0200 (CEST) Injection-Info: dont-email.me; posting-host="00566bb81b0a3452542610785f934900"; logging-data="1075903"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18955drI7WgvqmEK+4qzkAa5zX/Fgd97oY=" User-Agent: Mozilla Thunderbird Cancel-Lock: sha1:lLcajrSY5i6FhZPS07Z8lGxRAE8= Content-Language: en-US In-Reply-To: <v14av0$10qlv$1@dont-email.me> Bytes: 4037 On 5/3/2024 8:44 PM, Chris M. Thomasson wrote: > On 4/30/2024 2:04 AM, Stefan Ram wrote: >> ram@zedat.fu-berlin.de (Stefan Ram) wrote or quoted: >>> The GIL only prevents multiple Python statements from being >>> interpreted simultaneously, but if you're waiting on inputs (like >>> sockets), it's not active, so that could be distributed across >>> multiple cores. >> >> Disclaimer: This is not on-topic here as it discusses Python, >> not C or C++. >> >> FWIW, here's some multithreaded Python code modeled after what >> I use in an application. >> >> I am using Python to prepare a press review for me, getting article >> headers from several newssites, removing all headers matching a list >> of regexps, and integrating everything into a single HTML resource. >> (I do not like to read about Lindsay Lohan, for example, so articles >> with the text "Lindsay Lohan" will not show up on my HTML review.) >> >> I'm usually downloading all pages at once using Python threads, >> which will make sure that a thread uses the CPU while another >> thread is waiting for TCP/IP data. This is the code, taken from >> my Python program and a bit simplified: >> >> from multiprocessing.dummy import Pool >> >> ... >> >> with Pool( 9 if fast_internet else 1 )as pool: >> for i in range( 9 ): >> content[ i ] = pool.apply_async( fetch,[ uris[ i ] ]) >> pool.close() >> pool.join() >> >> . I'm using my "fetch" function to fetch a single URI, and the >> loop starts nine threads within a thread pool to fetch the >> content of those nine URIs "in parallel". This is observably >> faster than corresponding sequential code. >> >> (However, sometimes I have a slow connection and have to download >> sequentially in order not to overload the slow connection, which >> would result in stalled downloads. To accomplish this, I just >> change the "9" to "1" in the first line above.) >> >> In case you wonder about the "dummy": >> >> |The multiprocessing.dummy module module provides a wrapper >> |for the multiprocessing module, except implemented using >> |thread-based concurrency. >> | >> |It provides a drop-in replacement for multiprocessing, >> |allowing a program that uses the multiprocessing API to >> |switch to threads with a single change to import statements. >> >> . So, this is an area where multithreading the Python way is easy >> to use and enhances performance even in the presence of the GIL! > > Agreed. However, its a very small sample. Try to download 60,000 files > concurrently from different sources all at once. This can be where the > single lock messes with performance... Certain sources are faster than others. That's always fun... Think of timeout logic... ;^D