Path: ...!fu-berlin.de!uni-berlin.de!not-for-mail
From: ram@zedat.fu-berlin.de (Stefan Ram)
Newsgroups: comp.lang.c++,comp.lang.c
Subject: Re: Threads across programming languages
Date: 30 Apr 2024 09:04:48 GMT
Organization: Stefan Ram
Lines: 55
Expires: 1 Feb 2025 11:59:58 GMT
Message-ID: <multithreading-20240430095639@ram.dialup.fu-berlin.de>
References: <GIL-20240429161553@ram.dialup.fu-berlin.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Trace: news.uni-berlin.de YzVniy7QGjnCDQnsqIdP9Qq2m7efUcjSjUrcbKbyWb+rXg
Cancel-Lock: sha1:jYX34LR4HlT87Jj1mkrI4nyuCiE= sha256:OA7Q7c/KBaQAF6VZEDZJSMLREWeq+0ZIg5kfeWIejBc=
X-Copyright: (C) Copyright 2024 Stefan Ram. All rights reserved.
	Distribution through any means other than regular usenet
	channels is forbidden. It is forbidden to publish this
	article in the Web, to change URIs of this article into links,
        and to transfer the body without this notice, but quotations
        of parts in other Usenet posts are allowed.
X-No-Archive: Yes
Archive: no
X-No-Archive-Readme: "X-No-Archive" is set, because this prevents some
	services to mirror the article in the web. But the article may
	be kept on a Usenet archive server with only NNTP access.
X-No-Html: yes
Content-Language: en-US
Bytes: 3652

ram@zedat.fu-berlin.de (Stefan Ram) wrote or quoted:
>The GIL only prevents multiple Python statements from being
>interpreted simultaneously, but if you're waiting on inputs (like
>sockets), it's not active, so that could be distributed across
>multiple cores. 

  Disclaimer: This is not on-topic here as it discusses Python,
  not C or C++.

  FWIW, here's some multithreaded Python code modeled after what
  I use in an application.

  I am using Python to prepare a press review for me, getting article
  headers from several newssites, removing all headers matching a list
  of regexps, and integrating everything into a single HTML resource.
  (I do not like to read about Lindsay Lohan, for example, so articles
  with the text "Lindsay Lohan" will not show up on my HTML review.)

  I'm usually downloading all pages at once using Python threads,
  which will make sure that a thread uses the CPU while another 
  thread is waiting for TCP/IP data. This is the code, taken from
  my Python program and a bit simplified:

from multiprocessing.dummy import Pool

....

with Pool( 9 if fast_internet else 1 )as pool:
    for i in range( 9 ): 
        content[ i ] = pool.apply_async( fetch,[ uris[ i ] ])
    pool.close()
    pool.join()

  . I'm using my "fetch" function to fetch a single URI, and the
  loop starts nine threads within a thread pool to fetch the
  content of those nine URIs "in parallel". This is observably
  faster than corresponding sequential code.

  (However, sometimes I have a slow connection and have to download
  sequentially in order not to overload the slow connection, which
  would result in stalled downloads. To accomplish this, I just
  change the "9" to "1" in the first line above.)

  In case you wonder about the "dummy":

|The multiprocessing.dummy module module provides a wrapper
|for the multiprocessing module, except implemented using
|thread-based concurrency.
|
|It provides a drop-in replacement for multiprocessing,
|allowing a program that uses the multiprocessing API to
|switch to threads with a single change to import statements.

  . So, this is an area where multithreading the Python way is easy
  to use and enhances performance even in the presence of the GIL!