Deutsch   English   Français   Italiano  
<mailman.13.1727724684.3018.python-list@python.org>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!2.eu.feeder.erje.net!feeder.erje.net!news.swapon.de!fu-berlin.de!uni-berlin.de!not-for-mail
From: Left Right <olegsivokon@gmail.com>
Newsgroups: comp.lang.python
Subject: Re: Help with Streaming and Chunk Processing for Large JSON Data (60
 GB) from Kenna API
Date: Mon, 30 Sep 2024 21:30:06 +0200
Lines: 34
Message-ID: <mailman.13.1727724684.3018.python-list@python.org>
References: <CADrxXXmHUwsQbWqNrwzyKWLyTK0J3Hf0z8hAhGwKYoF2PwK7QA@mail.gmail.com>
 <082705B5-7C14-4D33-BF38-73F9CB166293@barrys-emacs.org>
 <9dfcd123-c31d-4207-869c-d5466487cba4@tompassin.net>
 <CAJQBtgkLVyNK+vw4u3bFCFEQDH8T3rpyTL+ERyyYHZJskQR6PQ@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Trace: news.uni-berlin.de DyTEs2L+VBqzPzU4lU6Hyg2hMe/9KBHv3Et7lNUZp7iA==
Cancel-Lock: sha1:pCNsydqfGw0JrstttNWkFJ9kb00= sha256:taxgqJTrj4W8GxXztNuwa8fX4qZlz6BR0KGn5yaa/vE=
Return-Path: <olegsivokon@gmail.com>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=pass
 reason="2048-bit key; unprotected key"
 header.d=gmail.com header.i=@gmail.com header.b=KX6GMYkO;
 dkim-adsp=pass; dkim-atps=neutral
X-Spam-Status: OK 0.005
X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'pypi': 0.05;
 'subject:API': 0.07; 'cc:addr:python-list': 0.09; 'memory.': 0.09;
 'url-ip:151.101.0.223/32': 0.09; 'url-ip:151.101.128.223/32':
 0.09; 'url-ip:151.101.192.223/32': 0.09; 'url-
 ip:151.101.64.223/32': 0.09; 'cc:no real name:2**0': 0.14;
 'import': 0.15; 'url:mailman': 0.15; '2024': 0.16; 'barry': 0.16;
 'janhangeer': 0.16; 'url:project': 0.16; 'url:pypi': 0.16;
 'wrote:': 0.16; 'problem': 0.16; 'subject:Help': 0.17;
 'cc:addr:python.org': 0.20; 'url-ip:188.166.95.178/32': 0.25;
 'url-ip:188.166.95/24': 0.25; 'url:listinfo': 0.25; 'cc:2**0':
 0.25; 'url-ip:188.166/16': 0.25; 'computer': 0.29; 'whole': 0.30;
 'am,': 0.31; 'url-ip:188/8': 0.31; 'python-list': 0.32; 'sep':
 0.32; 'message-id:@mail.gmail.com': 0.32; 'unless': 0.32; 'but':
 0.32; 'subject:for': 0.33; 'header:In-Reply-To:1': 0.34;
 'received:google.com': 0.34; 'from:addr:gmail.com': 0.35; 'mon,':
 0.36; 'subject:from': 0.37; 'file': 0.38; 'search': 0.61; 'url-
 ip:151.101.0/24': 0.62; 'url-ip:151.101.128/24': 0.62; 'url-
 ip:151.101.192/24': 0.62; 'url-ip:151.101.64/24': 0.62; 'once':
 0.63; 'subject:Data': 0.71; 'receive': 0.71; 'quick': 0.77;
 'larger,': 0.84; 'revealed': 0.84; 'subject: \n ': 0.84
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=gmail.com; s=20230601; t=1727724617; x=1728329417; darn=python.org;
 h=content-transfer-encoding:cc:to:subject:message-id:date:from
 :in-reply-to:references:mime-version:from:to:cc:subject:date
 :message-id:reply-to;
 bh=q/++m9ZHJ6Q+95VvxiC3mvkxG1YiQ92ilO0ApBbIxqM=;
 b=KX6GMYkO4ghkjPb2QAxSSwiWthg+euOJ9yiAftrZ7vMXXtsdah2thLuzIEenZlsEqt
 O62zN72wnThD2ojgnteJDRcYAnLs9zf5BP4Dd1bjz+HC5ZMiSz2T6n318ZK47uEJvMCX
 QTqDv9WhCeN9rQZGUUUL1BVIMksiD4DCljEW8I28aoca0VWPS72irq+FMbxbnxoGiCAf
 ludVg407on5fOQ7ol0fa56Ly9XZKmR1i1togsSGe4ZJsH3pDIzir3tFoGML5wbdk9T8M
 dIk6rqDQWwE/t9hImwoFcYNdF7slEDduUk93RYEZlKSOfSVnDLeCjno+o2FRAWa/uQnK
 lvDg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20230601; t=1727724617; x=1728329417;
 h=content-transfer-encoding:cc:to:subject:message-id:date:from
 :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
 :subject:date:message-id:reply-to;
 bh=q/++m9ZHJ6Q+95VvxiC3mvkxG1YiQ92ilO0ApBbIxqM=;
 b=ZcNPC8h1NfoSNb2s4Me/LFtr5FDSBSj78I0SeUGkWeKly71AUV1NsoU2h5qgD1PUfn
 +ifYRNAeR0vIRphQAuM4lnj4cEkaQvrwEzebsFi1BCi1KlVQ7onE1pVark8uityY4+p3
 TS73AfO52TpimHykc5daZul3hUUsbwCRlhHyxHyOdr3BRtd5Cxpowc1PGw8V2ks6YLNp
 KHS6seEZWfwwOoNPDy3meED+TfWik35VDbogDz4msZRQ0QOa4Q2arrZAxiCNcW8aGn7A
 426WZhB6bvBv749Jjjg7VhVlwSI4cWUrMojSHJE7LapHRZIi4IUBHOfwP4gDBLbn+OEU
 wCEA==
X-Gm-Message-State: AOJu0Yy+8zUFdQrMqMMSvqJyku6gkDlgr+ZxiZyGXY9Ha+yY8os7Aumb
 vWgUFzvd1Lowuu6dvR65RwkMEMZAw0xjxqY9KJkUKONqIupEHaZg/SbPma7M+DVHUen7Lgdu0lC
 i4PXZtb0IqHcXwygjHBoa4zvxL7X3No4z
X-Google-Smtp-Source: AGHT+IG1QM0POxdC8C4Wk+cnEPtdzC0dU9MYNkT89jIKKiAhSLs+AAWX5EaUZAqAI+GJJ/XARtr48AJIOQj4mNpYTdA=
X-Received: by 2002:a05:6902:2602:b0:e13:d23d:425 with SMTP id
 3f1490d57ef6-e2604b5f2cbmr10349197276.1.1727724617251; Mon, 30 Sep 2024
 12:30:17 -0700 (PDT)
In-Reply-To: <9dfcd123-c31d-4207-869c-d5466487cba4@tompassin.net>
X-Mailman-Approved-At: Mon, 30 Sep 2024 15:31:23 -0400
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
 <python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
 <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
 <mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <CAJQBtgkLVyNK+vw4u3bFCFEQDH8T3rpyTL+ERyyYHZJskQR6PQ@mail.gmail.com>
X-Mailman-Original-References: <CADrxXXmHUwsQbWqNrwzyKWLyTK0J3Hf0z8hAhGwKYoF2PwK7QA@mail.gmail.com>
 <082705B5-7C14-4D33-BF38-73F9CB166293@barrys-emacs.org>
 <9dfcd123-c31d-4207-869c-d5466487cba4@tompassin.net>
Bytes: 6671

> Streaming won't work because the file is gzipped.  You have to receive
> the whole thing before you can unzip it. Once unzipped it will be even
> larger, and all in memory.

GZip is specifically designed to be streamed.  So, that's not a
problem (in principle), but you would need to have a streaming GZip
parser, quick search in PyPI revealed this package:
https://pypi.org/project/gzip-stream/ .

On Mon, Sep 30, 2024 at 6:20=E2=80=AFPM Thomas Passin via Python-list
<python-list@python.org> wrote:
>
> On 9/30/2024 11:30 AM, Barry via Python-list wrote:
> >
> >
> >> On 30 Sep 2024, at 06:52, Abdur-Rahmaan Janhangeer via Python-list <py=
thon-list@python.org> wrote:
> >>
> >>
> >> import polars as pl
> >> pl.read_json("file.json")
> >>
> >>
> >
> > This is not going to work unless the computer has a lot more the 60GiB =
of RAM.
> >
> > As later suggested a streaming parser is required.
>
> Streaming won't work because the file is gzipped.  You have to receive
> the whole thing before you can unzip it. Once unzipped it will be even
> larger, and all in memory.
> --
> https://mail.python.org/mailman/listinfo/python-list