Deutsch   English   Français   Italiano  
<mailman.8.1727715637.3018.python-list@python.org>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!3.eu.feeder.erje.net!feeder.erje.net!fu-berlin.de!uni-berlin.de!not-for-mail
From: Chris Angelico <rosuav@gmail.com>
Newsgroups: comp.lang.python
Subject: Re: Help with Streaming and Chunk Processing for Large JSON Data (60
 GB) from Kenna API
Date: Tue, 1 Oct 2024 03:00:21 +1000
Lines: 27
Message-ID: <mailman.8.1727715637.3018.python-list@python.org>
References: <CADrxXXmHUwsQbWqNrwzyKWLyTK0J3Hf0z8hAhGwKYoF2PwK7QA@mail.gmail.com>
 <082705B5-7C14-4D33-BF38-73F9CB166293@barrys-emacs.org>
 <9dfcd123-c31d-4207-869c-d5466487cba4@tompassin.net>
 <CAPTjJmqoRQM81Xb0GxNXm65pNQf32YH2h1bTfSYFB0J=FPbDJQ@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
X-Trace: news.uni-berlin.de vnB8u7BwhKXd9UCEt+LR2w1jmMpLfxgcdetwxZM0aLeQ==
Cancel-Lock: sha1:SbP+K1dIkub+higIcfDjWQojIoE= sha256:/d8XNXtmzBYwMv++akyLMxHjkTDwv7SA1S8oXX5/76I=
Return-Path: <rosuav@gmail.com>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=pass
 reason="2048-bit key; unprotected key"
 header.d=gmail.com header.i=@gmail.com header.b=L8lTKlV2;
 dkim-adsp=pass; dkim-atps=neutral
X-Spam-Status: OK 0.029
X-Spam-Evidence: '*H*': 0.94; '*S*': 0.00; 'subject:API': 0.07;
 'memory.': 0.09; 'import': 0.15; '2024': 0.16; 'barry': 0.16;
 'chrisa': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris
 angelico': 0.16; 'janhangeer': 0.16; 'wrote:': 0.16;
 'subject:Help': 0.17; 'tue,': 0.19; 'to:addr:python-list': 0.20;
 'thinking': 0.28; 'computer': 0.29; 'whole': 0.30; 'am,': 0.31;
 'python-list': 0.32; 'sep': 0.32; 'message-id:@mail.gmail.com':
 0.32; 'unless': 0.32; 'subject:for': 0.33; 'header:In-Reply-To:1':
 0.34; 'received:google.com': 0.34; 'from:addr:gmail.com': 0.35;
 'subject:from': 0.37; 'file': 0.38; 'still': 0.40; 'once': 0.63;
 'perfectly': 0.69; 'subject:Data': 0.71; 'receive': 0.71;
 'larger,': 0.84; 'subject: \n ': 0.84
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=gmail.com; s=20230601; t=1727715634; x=1728320434; darn=python.org;
 h=to:subject:message-id:date:from:in-reply-to:references:mime-version
 :from:to:cc:subject:date:message-id:reply-to;
 bh=JP81o284opXXq6javp+3Iyr7qW3JA0qujk/9vpe/I4s=;
 b=L8lTKlV2uYPkS1qgX1l3vtX4anvzVQg6k88Pg5+0USbGIaV3C3hRsVmLKJDlP2GEKq
 Sx5zS8IGD0N1KDIZEoUNHQQ6wO2bWsulVyKqBt9dMaqxpdgC+Kk/+7lMU+KOXbAMfMbo
 ZXVFL+l8O4EyWXmaAo/CvNm/9jzsZ5qHIbX0++YDxpbQ6kyr+J/O6ro3b9iAUwW2jjwc
 Huh1x7TdgVI8sT3h71aQDqITU2XxYBfloE9MzzmSnmnDl+72R7Vt4Ijrc5fUpcKJH3hD
 kGKEomh3TAg/KLtkY1gQpCcIyqWR0EWvGXmsT4QUPDs6iLoOY2KoXsvU15W02hYFoGhD
 VULg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20230601; t=1727715634; x=1728320434;
 h=to:subject:message-id:date:from:in-reply-to:references:mime-version
 :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
 bh=JP81o284opXXq6javp+3Iyr7qW3JA0qujk/9vpe/I4s=;
 b=c450UBMKpradfkPgUDz03XNeQK9UPBOjykhiKwQlv2LeIduzakzkUY5zWLgJo2k+ZK
 f/5DEWZ2JFnq2iin5TUCqDAmSZWdfPKoGUOWydPTfuupxOdxt3YwM7JfnpzXGXarh9Qe
 sY3EeFIT5mL6x2LPTynkxyki0lkggeQWHHgsoIGnei9npa/Hcbec40gv6erFwky7uEWt
 f0PG/mS6oAURE3Ue+z2mSzvX46+nnN0umIeV60b4BQfMSUcvzDQODgHf8SMLsUNGkNGS
 3qxfeblYWQ+kXnRwOozLNwXhbUUjUyJueu2u18IsS5mFsyqGdtL191PSxCfdcaxoYgcg
 gJWw==
X-Gm-Message-State: AOJu0YxtEwZanH+moolKHEvUgI7qSujHCI8j0Tc2n7SdOITK09mQmpTC
 C/pGBdZ/SN+fZX0d/9d5xlTTiDRrkcZgOOSHvZGMKTKA7B1jTG2Eld7LvQT0a6B+GMkhcwyVAzu
 XeVajB49LCMrPgv+7VWj2WPLff6N2UQ==
X-Google-Smtp-Source: AGHT+IHonQQusyg1xKp/ovAIIkEd2OH61zwN8PtCOGkEWVdU/yAgA5bzhNB4uui5W7ZJMVxCfzMpbyUGooOYjgJvnY8=
X-Received: by 2002:a05:6512:280e:b0:52c:fd46:bf07 with SMTP id
 2adb3069b0e04-5389fc7fb92mr6524393e87.49.1727715633357; Mon, 30 Sep 2024
 10:00:33 -0700 (PDT)
In-Reply-To: <9dfcd123-c31d-4207-869c-d5466487cba4@tompassin.net>
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
 <python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
 <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
 <mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <CAPTjJmqoRQM81Xb0GxNXm65pNQf32YH2h1bTfSYFB0J=FPbDJQ@mail.gmail.com>
X-Mailman-Original-References: <CADrxXXmHUwsQbWqNrwzyKWLyTK0J3Hf0z8hAhGwKYoF2PwK7QA@mail.gmail.com>
 <082705B5-7C14-4D33-BF38-73F9CB166293@barrys-emacs.org>
 <9dfcd123-c31d-4207-869c-d5466487cba4@tompassin.net>
Bytes: 5684

On Tue, 1 Oct 2024 at 02:20, Thomas Passin via Python-list
<python-list@python.org> wrote:
>
> On 9/30/2024 11:30 AM, Barry via Python-list wrote:
> >
> >
> >> On 30 Sep 2024, at 06:52, Abdur-Rahmaan Janhangeer via Python-list <python-list@python.org> wrote:
> >>
> >>
> >> import polars as pl
> >> pl.read_json("file.json")
> >>
> >>
> >
> > This is not going to work unless the computer has a lot more the 60GiB of RAM.
> >
> > As later suggested a streaming parser is required.
>
> Streaming won't work because the file is gzipped.  You have to receive
> the whole thing before you can unzip it. Once unzipped it will be even
> larger, and all in memory.

Streaming gzip is perfectly possible. You may be thinking of PKZip
which has its EOCD at the end of the file (although it may still be
possible to stream-decompress if you work at it).

ChrisA