Deutsch English Français Italiano |
<mailman.13.1727724684.3018.python-list@python.org> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!2.eu.feeder.erje.net!feeder.erje.net!news.swapon.de!fu-berlin.de!uni-berlin.de!not-for-mail From: Left Right <olegsivokon@gmail.com> Newsgroups: comp.lang.python Subject: Re: Help with Streaming and Chunk Processing for Large JSON Data (60 GB) from Kenna API Date: Mon, 30 Sep 2024 21:30:06 +0200 Lines: 34 Message-ID: <mailman.13.1727724684.3018.python-list@python.org> References: <CADrxXXmHUwsQbWqNrwzyKWLyTK0J3Hf0z8hAhGwKYoF2PwK7QA@mail.gmail.com> <082705B5-7C14-4D33-BF38-73F9CB166293@barrys-emacs.org> <9dfcd123-c31d-4207-869c-d5466487cba4@tompassin.net> <CAJQBtgkLVyNK+vw4u3bFCFEQDH8T3rpyTL+ERyyYHZJskQR6PQ@mail.gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Trace: news.uni-berlin.de DyTEs2L+VBqzPzU4lU6Hyg2hMe/9KBHv3Et7lNUZp7iA== Cancel-Lock: sha1:pCNsydqfGw0JrstttNWkFJ9kb00= sha256:taxgqJTrj4W8GxXztNuwa8fX4qZlz6BR0KGn5yaa/vE= Return-Path: <olegsivokon@gmail.com> X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org Authentication-Results: mail.python.org; dkim=pass reason="2048-bit key; unprotected key" header.d=gmail.com header.i=@gmail.com header.b=KX6GMYkO; dkim-adsp=pass; dkim-atps=neutral X-Spam-Status: OK 0.005 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'pypi': 0.05; 'subject:API': 0.07; 'cc:addr:python-list': 0.09; 'memory.': 0.09; 'url-ip:151.101.0.223/32': 0.09; 'url-ip:151.101.128.223/32': 0.09; 'url-ip:151.101.192.223/32': 0.09; 'url- ip:151.101.64.223/32': 0.09; 'cc:no real name:2**0': 0.14; 'import': 0.15; 'url:mailman': 0.15; '2024': 0.16; 'barry': 0.16; 'janhangeer': 0.16; 'url:project': 0.16; 'url:pypi': 0.16; 'wrote:': 0.16; 'problem': 0.16; 'subject:Help': 0.17; 'cc:addr:python.org': 0.20; 'url-ip:188.166.95.178/32': 0.25; 'url-ip:188.166.95/24': 0.25; 'url:listinfo': 0.25; 'cc:2**0': 0.25; 'url-ip:188.166/16': 0.25; 'computer': 0.29; 'whole': 0.30; 'am,': 0.31; 'url-ip:188/8': 0.31; 'python-list': 0.32; 'sep': 0.32; 'message-id:@mail.gmail.com': 0.32; 'unless': 0.32; 'but': 0.32; 'subject:for': 0.33; 'header:In-Reply-To:1': 0.34; 'received:google.com': 0.34; 'from:addr:gmail.com': 0.35; 'mon,': 0.36; 'subject:from': 0.37; 'file': 0.38; 'search': 0.61; 'url- ip:151.101.0/24': 0.62; 'url-ip:151.101.128/24': 0.62; 'url- ip:151.101.192/24': 0.62; 'url-ip:151.101.64/24': 0.62; 'once': 0.63; 'subject:Data': 0.71; 'receive': 0.71; 'quick': 0.77; 'larger,': 0.84; 'revealed': 0.84; 'subject: \n ': 0.84 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1727724617; x=1728329417; darn=python.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=q/++m9ZHJ6Q+95VvxiC3mvkxG1YiQ92ilO0ApBbIxqM=; b=KX6GMYkO4ghkjPb2QAxSSwiWthg+euOJ9yiAftrZ7vMXXtsdah2thLuzIEenZlsEqt O62zN72wnThD2ojgnteJDRcYAnLs9zf5BP4Dd1bjz+HC5ZMiSz2T6n318ZK47uEJvMCX QTqDv9WhCeN9rQZGUUUL1BVIMksiD4DCljEW8I28aoca0VWPS72irq+FMbxbnxoGiCAf ludVg407on5fOQ7ol0fa56Ly9XZKmR1i1togsSGe4ZJsH3pDIzir3tFoGML5wbdk9T8M dIk6rqDQWwE/t9hImwoFcYNdF7slEDduUk93RYEZlKSOfSVnDLeCjno+o2FRAWa/uQnK lvDg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727724617; x=1728329417; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=q/++m9ZHJ6Q+95VvxiC3mvkxG1YiQ92ilO0ApBbIxqM=; b=ZcNPC8h1NfoSNb2s4Me/LFtr5FDSBSj78I0SeUGkWeKly71AUV1NsoU2h5qgD1PUfn +ifYRNAeR0vIRphQAuM4lnj4cEkaQvrwEzebsFi1BCi1KlVQ7onE1pVark8uityY4+p3 TS73AfO52TpimHykc5daZul3hUUsbwCRlhHyxHyOdr3BRtd5Cxpowc1PGw8V2ks6YLNp KHS6seEZWfwwOoNPDy3meED+TfWik35VDbogDz4msZRQ0QOa4Q2arrZAxiCNcW8aGn7A 426WZhB6bvBv749Jjjg7VhVlwSI4cWUrMojSHJE7LapHRZIi4IUBHOfwP4gDBLbn+OEU wCEA== X-Gm-Message-State: AOJu0Yy+8zUFdQrMqMMSvqJyku6gkDlgr+ZxiZyGXY9Ha+yY8os7Aumb vWgUFzvd1Lowuu6dvR65RwkMEMZAw0xjxqY9KJkUKONqIupEHaZg/SbPma7M+DVHUen7Lgdu0lC i4PXZtb0IqHcXwygjHBoa4zvxL7X3No4z X-Google-Smtp-Source: AGHT+IG1QM0POxdC8C4Wk+cnEPtdzC0dU9MYNkT89jIKKiAhSLs+AAWX5EaUZAqAI+GJJ/XARtr48AJIOQj4mNpYTdA= X-Received: by 2002:a05:6902:2602:b0:e13:d23d:425 with SMTP id 3f1490d57ef6-e2604b5f2cbmr10349197276.1.1727724617251; Mon, 30 Sep 2024 12:30:17 -0700 (PDT) In-Reply-To: <9dfcd123-c31d-4207-869c-d5466487cba4@tompassin.net> X-Mailman-Approved-At: Mon, 30 Sep 2024 15:31:23 -0400 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: General discussion list for the Python programming language <python-list.python.org> List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> List-Archive: <https://mail.python.org/pipermail/python-list/> List-Post: <mailto:python-list@python.org> List-Help: <mailto:python-list-request@python.org?subject=help> List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe> X-Mailman-Original-Message-ID: <CAJQBtgkLVyNK+vw4u3bFCFEQDH8T3rpyTL+ERyyYHZJskQR6PQ@mail.gmail.com> X-Mailman-Original-References: <CADrxXXmHUwsQbWqNrwzyKWLyTK0J3Hf0z8hAhGwKYoF2PwK7QA@mail.gmail.com> <082705B5-7C14-4D33-BF38-73F9CB166293@barrys-emacs.org> <9dfcd123-c31d-4207-869c-d5466487cba4@tompassin.net> Bytes: 6671 > Streaming won't work because the file is gzipped. You have to receive > the whole thing before you can unzip it. Once unzipped it will be even > larger, and all in memory. GZip is specifically designed to be streamed. So, that's not a problem (in principle), but you would need to have a streaming GZip parser, quick search in PyPI revealed this package: https://pypi.org/project/gzip-stream/ . On Mon, Sep 30, 2024 at 6:20=E2=80=AFPM Thomas Passin via Python-list <python-list@python.org> wrote: > > On 9/30/2024 11:30 AM, Barry via Python-list wrote: > > > > > >> On 30 Sep 2024, at 06:52, Abdur-Rahmaan Janhangeer via Python-list <py= thon-list@python.org> wrote: > >> > >> > >> import polars as pl > >> pl.read_json("file.json") > >> > >> > > > > This is not going to work unless the computer has a lot more the 60GiB = of RAM. > > > > As later suggested a streaming parser is required. > > Streaming won't work because the file is gzipped. You have to receive > the whole thing before you can unzip it. Once unzipped it will be even > larger, and all in memory. > -- > https://mail.python.org/mailman/listinfo/python-list