Deutsch   English   Français   Italiano  
<mailman.12.1727722015.3018.python-list@python.org>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!fu-berlin.de!uni-berlin.de!not-for-mail
From: Thomas Passin <list1@tompassin.net>
Newsgroups: comp.lang.python
Subject: Re: Help with Streaming and Chunk Processing for Large JSON Data (60
 GB) from Kenna API
Date: Mon, 30 Sep 2024 13:57:05 -0400
Lines: 31
Message-ID: <mailman.12.1727722015.3018.python-list@python.org>
References: <CADrxXXmHUwsQbWqNrwzyKWLyTK0J3Hf0z8hAhGwKYoF2PwK7QA@mail.gmail.com>
 <082705B5-7C14-4D33-BF38-73F9CB166293@barrys-emacs.org>
 <9dfcd123-c31d-4207-869c-d5466487cba4@tompassin.net>
 <CAPTjJmqoRQM81Xb0GxNXm65pNQf32YH2h1bTfSYFB0J=FPbDJQ@mail.gmail.com>
 <848d6843-d919-4a43-80e1-768fb8da2139@tompassin.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: news.uni-berlin.de PdAzWEZHEag/cmofsS5OcgF6Jl94YUpopSodVNT+JzbQ==
Cancel-Lock: sha1:tRPTYA0PHAySw9gJYdd1hsFCLj4= sha256:DjThz2dLajsCkGMz6OBlo0VEaKtf7wvExRs3Uh6a1Nc=
Return-Path: <list1@tompassin.net>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=pass
 reason="2048-bit key; unprotected key"
 header.d=tompassin.net header.i=@tompassin.net header.b=ffKQC68m;
 dkim-adsp=pass; dkim-atps=neutral
X-Spam-Status: OK 0.011
X-Spam-Evidence: '*H*': 0.98; '*S*': 0.00; 'subject:API': 0.07;
 'angelico': 0.09; 'memory.': 0.09; 'import': 0.15; '2024': 0.16;
 '>>>>': 0.16; 'barry': 0.16; 'chrisa': 0.16; 'janhangeer': 0.16;
 'received:10.0.0': 0.16; 'received:64.90': 0.16;
 'received:64.90.62': 0.16; 'received:64.90.62.162': 0.16;
 'received:dreamhost.com': 0.16; 'wrote:': 0.16; 'subject:Help':
 0.17; 'pm,': 0.19; 'tue,': 0.19; 'to:addr:python-list': 0.20;
 '>>>': 0.28; 'chris': 0.28; 'thinking': 0.28; 'computer': 0.29;
 'header:User-Agent:1': 0.30; 'whole': 0.30; 'am,': 0.31; 'python-
 list': 0.32; 'received:10.0': 0.32; 'received:mailchannels.net':
 0.32; 'received:relay.mailchannels.net': 0.32; 'right,': 0.32;
 'sep': 0.32; 'unless': 0.32; 'subject:for': 0.33; 'header:In-
 Reply-To:1': 0.34; 'subject:from': 0.37; 'file': 0.38;
 'received:100': 0.39; 'still': 0.40; 'once': 0.63;
 'header:Received:6': 0.67; 'received:64': 0.67; 'perfectly': 0.69;
 'subject:Data': 0.71; 'receive': 0.71; 'larger,': 0.84; 'subject:
 \n ': 0.84
X-Sender-Id: dreamhost|x-authsender|tpassin@tompassin.net
ARC-Seal: i=1; s=arc-2022; d=mailchannels.net; t=1727719026; a=rsa-sha256;
 cv=none;
 b=DMWpD8cqw6QIRuCPOPey019UmMelcCgHBlj3KH0ZKKyhIEV3TgI9U5ZOALFwtdA1EqGCOC
 7ikRmkNk16qWhTURXhT9MPKK73YeoujK2tR8QBa/qjXLoDmKBT7WQYWbtXVmlEdrf5GLaI
 +gsehA64nKVepCylMpq403p9AFxYvslTPmzRGip13J3+KJW/OROfgVQm0UM2tOcCoo98NA
 d7hQoXovleLz0pSrqvO0FY6jako+H12MwP/Ix24Mhb9dN9XlpxPLeqUwmBOCbtTqLPQ1MD
 MI5kFHcovfIa25Xg92sjWxiPJMckANr3d6zWnk+PvCGwbQl7QdzTpBKmqJkTXg==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed;
 d=mailchannels.net; s=arc-2022; t=1727719026;
 h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
 to:to:cc:mime-version:mime-version:content-type:content-type:
 content-transfer-encoding:content-transfer-encoding:
 in-reply-to:in-reply-to:references:references:dkim-signature;
 bh=A8iSIdB/7lEqU/YQGUyzu2jt2joQqmedpNkQ8/pzC+4=;
 b=g4yVz1qd8VzwB50X9oSdNNjXw95GxkwyLK1b8ImaOvfS2Bsv4YLotsn2eJmtN2wVtypKmX
 iRt2FFFhT3eIghpRFmdGbVztN9jZ8GjYmag76h9677yI5MTrgYoGMip5BlOKjios/W2SPe
 AV4p2iLCNekMcdC5p+WA2XZtsnWwqmw0OvzhBrx25lUvWr/1hctC09c8wx8REymIckubgH
 K44n2dp+/NnHUN/1DxV/atflbPYS0WKVVW5PBcFK2TobxxTNG4+im/JglFHYk5PzLFBuxG
 Rc3vNYryEnJeIDHulfb78SBta8bVSx2WU6cizzdCe30DD5h/Qx0sJ4jEIuVufw==
ARC-Authentication-Results: i=1; rspamd-657f47799c-jlm8v;
 auth=pass smtp.auth=dreamhost smtp.mailfrom=list1@tompassin.net
X-Sender-Id: dreamhost|x-authsender|tpassin@tompassin.net
X-MC-Relay: Neutral
X-MailChannels-SenderId: dreamhost|x-authsender|tpassin@tompassin.net
X-MailChannels-Auth-Id: dreamhost
X-Harbor-Whimsical: 2040fab6183a0c53_1727719026922_1983242883
X-MC-Loop-Signature: 1727719026922:3510390117
X-MC-Ingress-Time: 1727719026921
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tompassin.net;
 s=dreamhost; t=1727719026;
 bh=A8iSIdB/7lEqU/YQGUyzu2jt2joQqmedpNkQ8/pzC+4=;
 h=Date:Subject:To:From:Content-Type:Content-Transfer-Encoding;
 b=ffKQC68mndTNnYri7waLHEAg2IBDKbvtsM/jquM0wtg5FQYnQ4V/9GW4UxN0y7J+d
 F27+1q9oRH1m3skf2aSlohPQMQYPYm7pI28dXPmSRALozbRtMGxUFjX7iWKRWJMRON
 TQdQYdQodH7PXrOehbPegvyXdj7rFGPQ0WiDKMksABlW+sugZN8ccfmxtRUEryl+Gn
 1+BWZTtGiOZE4mnrEJb4a7t516cq2v1sC5MJeDKTR55x8MiTrPYDUZ4INVhoWaFuam
 ahKCkisnW3nD9bqECCjCb5IAFZxf9Bg09u4KpEZuTuuKgrxPsfYmWW6n7aMACKzASF
 3K9ZazLrvfVnw==
User-Agent: Mozilla Thunderbird
Content-Language: en-US
In-Reply-To: <CAPTjJmqoRQM81Xb0GxNXm65pNQf32YH2h1bTfSYFB0J=FPbDJQ@mail.gmail.com>
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
 <python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
 <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
 <mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <848d6843-d919-4a43-80e1-768fb8da2139@tompassin.net>
X-Mailman-Original-References: <CADrxXXmHUwsQbWqNrwzyKWLyTK0J3Hf0z8hAhGwKYoF2PwK7QA@mail.gmail.com>
 <082705B5-7C14-4D33-BF38-73F9CB166293@barrys-emacs.org>
 <9dfcd123-c31d-4207-869c-d5466487cba4@tompassin.net>
 <CAPTjJmqoRQM81Xb0GxNXm65pNQf32YH2h1bTfSYFB0J=FPbDJQ@mail.gmail.com>
Bytes: 6823

On 9/30/2024 1:00 PM, Chris Angelico via Python-list wrote:
> On Tue, 1 Oct 2024 at 02:20, Thomas Passin via Python-list
> <python-list@python.org> wrote:
>>
>> On 9/30/2024 11:30 AM, Barry via Python-list wrote:
>>>
>>>
>>>> On 30 Sep 2024, at 06:52, Abdur-Rahmaan Janhangeer via Python-list <python-list@python.org> wrote:
>>>>
>>>>
>>>> import polars as pl
>>>> pl.read_json("file.json")
>>>>
>>>>
>>>
>>> This is not going to work unless the computer has a lot more the 60GiB of RAM.
>>>
>>> As later suggested a streaming parser is required.
>>
>> Streaming won't work because the file is gzipped.  You have to receive
>> the whole thing before you can unzip it. Once unzipped it will be even
>> larger, and all in memory.
> 
> Streaming gzip is perfectly possible. You may be thinking of PKZip
> which has its EOCD at the end of the file (although it may still be
> possible to stream-decompress if you work at it).
> 
> ChrisA

You're right, that's what I was thinking of.