Deutsch English Français Italiano |
<mailman.8.1727715637.3018.python-list@python.org> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!3.eu.feeder.erje.net!feeder.erje.net!fu-berlin.de!uni-berlin.de!not-for-mail From: Chris Angelico <rosuav@gmail.com> Newsgroups: comp.lang.python Subject: Re: Help with Streaming and Chunk Processing for Large JSON Data (60 GB) from Kenna API Date: Tue, 1 Oct 2024 03:00:21 +1000 Lines: 27 Message-ID: <mailman.8.1727715637.3018.python-list@python.org> References: <CADrxXXmHUwsQbWqNrwzyKWLyTK0J3Hf0z8hAhGwKYoF2PwK7QA@mail.gmail.com> <082705B5-7C14-4D33-BF38-73F9CB166293@barrys-emacs.org> <9dfcd123-c31d-4207-869c-d5466487cba4@tompassin.net> <CAPTjJmqoRQM81Xb0GxNXm65pNQf32YH2h1bTfSYFB0J=FPbDJQ@mail.gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" X-Trace: news.uni-berlin.de vnB8u7BwhKXd9UCEt+LR2w1jmMpLfxgcdetwxZM0aLeQ== Cancel-Lock: sha1:SbP+K1dIkub+higIcfDjWQojIoE= sha256:/d8XNXtmzBYwMv++akyLMxHjkTDwv7SA1S8oXX5/76I= Return-Path: <rosuav@gmail.com> X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org Authentication-Results: mail.python.org; dkim=pass reason="2048-bit key; unprotected key" header.d=gmail.com header.i=@gmail.com header.b=L8lTKlV2; dkim-adsp=pass; dkim-atps=neutral X-Spam-Status: OK 0.029 X-Spam-Evidence: '*H*': 0.94; '*S*': 0.00; 'subject:API': 0.07; 'memory.': 0.09; 'import': 0.15; '2024': 0.16; 'barry': 0.16; 'chrisa': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'janhangeer': 0.16; 'wrote:': 0.16; 'subject:Help': 0.17; 'tue,': 0.19; 'to:addr:python-list': 0.20; 'thinking': 0.28; 'computer': 0.29; 'whole': 0.30; 'am,': 0.31; 'python-list': 0.32; 'sep': 0.32; 'message-id:@mail.gmail.com': 0.32; 'unless': 0.32; 'subject:for': 0.33; 'header:In-Reply-To:1': 0.34; 'received:google.com': 0.34; 'from:addr:gmail.com': 0.35; 'subject:from': 0.37; 'file': 0.38; 'still': 0.40; 'once': 0.63; 'perfectly': 0.69; 'subject:Data': 0.71; 'receive': 0.71; 'larger,': 0.84; 'subject: \n ': 0.84 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1727715634; x=1728320434; darn=python.org; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :from:to:cc:subject:date:message-id:reply-to; bh=JP81o284opXXq6javp+3Iyr7qW3JA0qujk/9vpe/I4s=; b=L8lTKlV2uYPkS1qgX1l3vtX4anvzVQg6k88Pg5+0USbGIaV3C3hRsVmLKJDlP2GEKq Sx5zS8IGD0N1KDIZEoUNHQQ6wO2bWsulVyKqBt9dMaqxpdgC+Kk/+7lMU+KOXbAMfMbo ZXVFL+l8O4EyWXmaAo/CvNm/9jzsZ5qHIbX0++YDxpbQ6kyr+J/O6ro3b9iAUwW2jjwc Huh1x7TdgVI8sT3h71aQDqITU2XxYBfloE9MzzmSnmnDl+72R7Vt4Ijrc5fUpcKJH3hD kGKEomh3TAg/KLtkY1gQpCcIyqWR0EWvGXmsT4QUPDs6iLoOY2KoXsvU15W02hYFoGhD VULg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727715634; x=1728320434; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=JP81o284opXXq6javp+3Iyr7qW3JA0qujk/9vpe/I4s=; b=c450UBMKpradfkPgUDz03XNeQK9UPBOjykhiKwQlv2LeIduzakzkUY5zWLgJo2k+ZK f/5DEWZ2JFnq2iin5TUCqDAmSZWdfPKoGUOWydPTfuupxOdxt3YwM7JfnpzXGXarh9Qe sY3EeFIT5mL6x2LPTynkxyki0lkggeQWHHgsoIGnei9npa/Hcbec40gv6erFwky7uEWt f0PG/mS6oAURE3Ue+z2mSzvX46+nnN0umIeV60b4BQfMSUcvzDQODgHf8SMLsUNGkNGS 3qxfeblYWQ+kXnRwOozLNwXhbUUjUyJueu2u18IsS5mFsyqGdtL191PSxCfdcaxoYgcg gJWw== X-Gm-Message-State: AOJu0YxtEwZanH+moolKHEvUgI7qSujHCI8j0Tc2n7SdOITK09mQmpTC C/pGBdZ/SN+fZX0d/9d5xlTTiDRrkcZgOOSHvZGMKTKA7B1jTG2Eld7LvQT0a6B+GMkhcwyVAzu XeVajB49LCMrPgv+7VWj2WPLff6N2UQ== X-Google-Smtp-Source: AGHT+IHonQQusyg1xKp/ovAIIkEd2OH61zwN8PtCOGkEWVdU/yAgA5bzhNB4uui5W7ZJMVxCfzMpbyUGooOYjgJvnY8= X-Received: by 2002:a05:6512:280e:b0:52c:fd46:bf07 with SMTP id 2adb3069b0e04-5389fc7fb92mr6524393e87.49.1727715633357; Mon, 30 Sep 2024 10:00:33 -0700 (PDT) In-Reply-To: <9dfcd123-c31d-4207-869c-d5466487cba4@tompassin.net> X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: General discussion list for the Python programming language <python-list.python.org> List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> List-Archive: <https://mail.python.org/pipermail/python-list/> List-Post: <mailto:python-list@python.org> List-Help: <mailto:python-list-request@python.org?subject=help> List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe> X-Mailman-Original-Message-ID: <CAPTjJmqoRQM81Xb0GxNXm65pNQf32YH2h1bTfSYFB0J=FPbDJQ@mail.gmail.com> X-Mailman-Original-References: <CADrxXXmHUwsQbWqNrwzyKWLyTK0J3Hf0z8hAhGwKYoF2PwK7QA@mail.gmail.com> <082705B5-7C14-4D33-BF38-73F9CB166293@barrys-emacs.org> <9dfcd123-c31d-4207-869c-d5466487cba4@tompassin.net> Bytes: 5684 On Tue, 1 Oct 2024 at 02:20, Thomas Passin via Python-list <python-list@python.org> wrote: > > On 9/30/2024 11:30 AM, Barry via Python-list wrote: > > > > > >> On 30 Sep 2024, at 06:52, Abdur-Rahmaan Janhangeer via Python-list <python-list@python.org> wrote: > >> > >> > >> import polars as pl > >> pl.read_json("file.json") > >> > >> > > > > This is not going to work unless the computer has a lot more the 60GiB of RAM. > > > > As later suggested a streaming parser is required. > > Streaming won't work because the file is gzipped. You have to receive > the whole thing before you can unzip it. Once unzipped it will be even > larger, and all in memory. Streaming gzip is perfectly possible. You may be thinking of PKZip which has its EOCD at the end of the file (although it may still be possible to stream-decompress if you work at it). ChrisA