Deutsch   English   Français   Italiano  
<v14b45$10qlv$2@dont-email.me>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com>
Newsgroups: comp.lang.c++,comp.lang.c
Subject: Re: Threads across programming languages
Date: Fri, 3 May 2024 20:47:17 -0700
Organization: A noiseless patient Spider
Lines: 64
Message-ID: <v14b45$10qlv$2@dont-email.me>
References: <GIL-20240429161553@ram.dialup.fu-berlin.de>
 <multithreading-20240430095639@ram.dialup.fu-berlin.de>
 <v14av0$10qlv$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 04 May 2024 05:47:17 +0200 (CEST)
Injection-Info: dont-email.me; posting-host="00566bb81b0a3452542610785f934900";
	logging-data="1075903"; mail-complaints-to="abuse@eternal-september.org";	posting-account="U2FsdGVkX18955drI7WgvqmEK+4qzkAa5zX/Fgd97oY="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:lLcajrSY5i6FhZPS07Z8lGxRAE8=
Content-Language: en-US
In-Reply-To: <v14av0$10qlv$1@dont-email.me>
Bytes: 4037

On 5/3/2024 8:44 PM, Chris M. Thomasson wrote:
> On 4/30/2024 2:04 AM, Stefan Ram wrote:
>> ram@zedat.fu-berlin.de (Stefan Ram) wrote or quoted:
>>> The GIL only prevents multiple Python statements from being
>>> interpreted simultaneously, but if you're waiting on inputs (like
>>> sockets), it's not active, so that could be distributed across
>>> multiple cores.
>>
>>    Disclaimer: This is not on-topic here as it discusses Python,
>>    not C or C++.
>>
>>    FWIW, here's some multithreaded Python code modeled after what
>>    I use in an application.
>>
>>    I am using Python to prepare a press review for me, getting article
>>    headers from several newssites, removing all headers matching a list
>>    of regexps, and integrating everything into a single HTML resource.
>>    (I do not like to read about Lindsay Lohan, for example, so articles
>>    with the text "Lindsay Lohan" will not show up on my HTML review.)
>>
>>    I'm usually downloading all pages at once using Python threads,
>>    which will make sure that a thread uses the CPU while another
>>    thread is waiting for TCP/IP data. This is the code, taken from
>>    my Python program and a bit simplified:
>>
>> from multiprocessing.dummy import Pool
>>
>> ...
>>
>> with Pool( 9 if fast_internet else 1 )as pool:
>>      for i in range( 9 ):
>>          content[ i ] = pool.apply_async( fetch,[ uris[ i ] ])
>>      pool.close()
>>      pool.join()
>>
>>    . I'm using my "fetch" function to fetch a single URI, and the
>>    loop starts nine threads within a thread pool to fetch the
>>    content of those nine URIs "in parallel". This is observably
>>    faster than corresponding sequential code.
>>
>>    (However, sometimes I have a slow connection and have to download
>>    sequentially in order not to overload the slow connection, which
>>    would result in stalled downloads. To accomplish this, I just
>>    change the "9" to "1" in the first line above.)
>>
>>    In case you wonder about the "dummy":
>>
>> |The multiprocessing.dummy module module provides a wrapper
>> |for the multiprocessing module, except implemented using
>> |thread-based concurrency.
>> |
>> |It provides a drop-in replacement for multiprocessing,
>> |allowing a program that uses the multiprocessing API to
>> |switch to threads with a single change to import statements.
>>
>>    . So, this is an area where multithreading the Python way is easy
>>    to use and enhances performance even in the presence of the GIL!
> 
> Agreed. However, its a very small sample. Try to download 60,000 files 
> concurrently from different sources all at once. This can be where the 
> single lock messes with performance...

Certain sources are faster than others. That's always fun... Think of 
timeout logic... ;^D