Deutsch English Français Italiano |
<mailman.12.1727722015.3018.python-list@python.org> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!fu-berlin.de!uni-berlin.de!not-for-mail From: Thomas Passin <list1@tompassin.net> Newsgroups: comp.lang.python Subject: Re: Help with Streaming and Chunk Processing for Large JSON Data (60 GB) from Kenna API Date: Mon, 30 Sep 2024 13:57:05 -0400 Lines: 31 Message-ID: <mailman.12.1727722015.3018.python-list@python.org> References: <CADrxXXmHUwsQbWqNrwzyKWLyTK0J3Hf0z8hAhGwKYoF2PwK7QA@mail.gmail.com> <082705B5-7C14-4D33-BF38-73F9CB166293@barrys-emacs.org> <9dfcd123-c31d-4207-869c-d5466487cba4@tompassin.net> <CAPTjJmqoRQM81Xb0GxNXm65pNQf32YH2h1bTfSYFB0J=FPbDJQ@mail.gmail.com> <848d6843-d919-4a43-80e1-768fb8da2139@tompassin.net> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Trace: news.uni-berlin.de PdAzWEZHEag/cmofsS5OcgF6Jl94YUpopSodVNT+JzbQ== Cancel-Lock: sha1:tRPTYA0PHAySw9gJYdd1hsFCLj4= sha256:DjThz2dLajsCkGMz6OBlo0VEaKtf7wvExRs3Uh6a1Nc= Return-Path: <list1@tompassin.net> X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org Authentication-Results: mail.python.org; dkim=pass reason="2048-bit key; unprotected key" header.d=tompassin.net header.i=@tompassin.net header.b=ffKQC68m; dkim-adsp=pass; dkim-atps=neutral X-Spam-Status: OK 0.011 X-Spam-Evidence: '*H*': 0.98; '*S*': 0.00; 'subject:API': 0.07; 'angelico': 0.09; 'memory.': 0.09; 'import': 0.15; '2024': 0.16; '>>>>': 0.16; 'barry': 0.16; 'chrisa': 0.16; 'janhangeer': 0.16; 'received:10.0.0': 0.16; 'received:64.90': 0.16; 'received:64.90.62': 0.16; 'received:64.90.62.162': 0.16; 'received:dreamhost.com': 0.16; 'wrote:': 0.16; 'subject:Help': 0.17; 'pm,': 0.19; 'tue,': 0.19; 'to:addr:python-list': 0.20; '>>>': 0.28; 'chris': 0.28; 'thinking': 0.28; 'computer': 0.29; 'header:User-Agent:1': 0.30; 'whole': 0.30; 'am,': 0.31; 'python- list': 0.32; 'received:10.0': 0.32; 'received:mailchannels.net': 0.32; 'received:relay.mailchannels.net': 0.32; 'right,': 0.32; 'sep': 0.32; 'unless': 0.32; 'subject:for': 0.33; 'header:In- Reply-To:1': 0.34; 'subject:from': 0.37; 'file': 0.38; 'received:100': 0.39; 'still': 0.40; 'once': 0.63; 'header:Received:6': 0.67; 'received:64': 0.67; 'perfectly': 0.69; 'subject:Data': 0.71; 'receive': 0.71; 'larger,': 0.84; 'subject: \n ': 0.84 X-Sender-Id: dreamhost|x-authsender|tpassin@tompassin.net ARC-Seal: i=1; s=arc-2022; d=mailchannels.net; t=1727719026; a=rsa-sha256; cv=none; b=DMWpD8cqw6QIRuCPOPey019UmMelcCgHBlj3KH0ZKKyhIEV3TgI9U5ZOALFwtdA1EqGCOC 7ikRmkNk16qWhTURXhT9MPKK73YeoujK2tR8QBa/qjXLoDmKBT7WQYWbtXVmlEdrf5GLaI +gsehA64nKVepCylMpq403p9AFxYvslTPmzRGip13J3+KJW/OROfgVQm0UM2tOcCoo98NA d7hQoXovleLz0pSrqvO0FY6jako+H12MwP/Ix24Mhb9dN9XlpxPLeqUwmBOCbtTqLPQ1MD MI5kFHcovfIa25Xg92sjWxiPJMckANr3d6zWnk+PvCGwbQl7QdzTpBKmqJkTXg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=mailchannels.net; s=arc-2022; t=1727719026; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=A8iSIdB/7lEqU/YQGUyzu2jt2joQqmedpNkQ8/pzC+4=; b=g4yVz1qd8VzwB50X9oSdNNjXw95GxkwyLK1b8ImaOvfS2Bsv4YLotsn2eJmtN2wVtypKmX iRt2FFFhT3eIghpRFmdGbVztN9jZ8GjYmag76h9677yI5MTrgYoGMip5BlOKjios/W2SPe AV4p2iLCNekMcdC5p+WA2XZtsnWwqmw0OvzhBrx25lUvWr/1hctC09c8wx8REymIckubgH K44n2dp+/NnHUN/1DxV/atflbPYS0WKVVW5PBcFK2TobxxTNG4+im/JglFHYk5PzLFBuxG Rc3vNYryEnJeIDHulfb78SBta8bVSx2WU6cizzdCe30DD5h/Qx0sJ4jEIuVufw== ARC-Authentication-Results: i=1; rspamd-657f47799c-jlm8v; auth=pass smtp.auth=dreamhost smtp.mailfrom=list1@tompassin.net X-Sender-Id: dreamhost|x-authsender|tpassin@tompassin.net X-MC-Relay: Neutral X-MailChannels-SenderId: dreamhost|x-authsender|tpassin@tompassin.net X-MailChannels-Auth-Id: dreamhost X-Harbor-Whimsical: 2040fab6183a0c53_1727719026922_1983242883 X-MC-Loop-Signature: 1727719026922:3510390117 X-MC-Ingress-Time: 1727719026921 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tompassin.net; s=dreamhost; t=1727719026; bh=A8iSIdB/7lEqU/YQGUyzu2jt2joQqmedpNkQ8/pzC+4=; h=Date:Subject:To:From:Content-Type:Content-Transfer-Encoding; b=ffKQC68mndTNnYri7waLHEAg2IBDKbvtsM/jquM0wtg5FQYnQ4V/9GW4UxN0y7J+d F27+1q9oRH1m3skf2aSlohPQMQYPYm7pI28dXPmSRALozbRtMGxUFjX7iWKRWJMRON TQdQYdQodH7PXrOehbPegvyXdj7rFGPQ0WiDKMksABlW+sugZN8ccfmxtRUEryl+Gn 1+BWZTtGiOZE4mnrEJb4a7t516cq2v1sC5MJeDKTR55x8MiTrPYDUZ4INVhoWaFuam ahKCkisnW3nD9bqECCjCb5IAFZxf9Bg09u4KpEZuTuuKgrxPsfYmWW6n7aMACKzASF 3K9ZazLrvfVnw== User-Agent: Mozilla Thunderbird Content-Language: en-US In-Reply-To: <CAPTjJmqoRQM81Xb0GxNXm65pNQf32YH2h1bTfSYFB0J=FPbDJQ@mail.gmail.com> X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: General discussion list for the Python programming language <python-list.python.org> List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> List-Archive: <https://mail.python.org/pipermail/python-list/> List-Post: <mailto:python-list@python.org> List-Help: <mailto:python-list-request@python.org?subject=help> List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe> X-Mailman-Original-Message-ID: <848d6843-d919-4a43-80e1-768fb8da2139@tompassin.net> X-Mailman-Original-References: <CADrxXXmHUwsQbWqNrwzyKWLyTK0J3Hf0z8hAhGwKYoF2PwK7QA@mail.gmail.com> <082705B5-7C14-4D33-BF38-73F9CB166293@barrys-emacs.org> <9dfcd123-c31d-4207-869c-d5466487cba4@tompassin.net> <CAPTjJmqoRQM81Xb0GxNXm65pNQf32YH2h1bTfSYFB0J=FPbDJQ@mail.gmail.com> Bytes: 6823 On 9/30/2024 1:00 PM, Chris Angelico via Python-list wrote: > On Tue, 1 Oct 2024 at 02:20, Thomas Passin via Python-list > <python-list@python.org> wrote: >> >> On 9/30/2024 11:30 AM, Barry via Python-list wrote: >>> >>> >>>> On 30 Sep 2024, at 06:52, Abdur-Rahmaan Janhangeer via Python-list <python-list@python.org> wrote: >>>> >>>> >>>> import polars as pl >>>> pl.read_json("file.json") >>>> >>>> >>> >>> This is not going to work unless the computer has a lot more the 60GiB of RAM. >>> >>> As later suggested a streaming parser is required. >> >> Streaming won't work because the file is gzipped. You have to receive >> the whole thing before you can unzip it. Once unzipped it will be even >> larger, and all in memory. > > Streaming gzip is perfectly possible. You may be thinking of PKZip > which has its EOCD at the end of the file (although it may still be > possible to stream-decompress if you work at it). > > ChrisA You're right, that's what I was thinking of.