Path: ...!3.eu.feeder.erje.net!feeder.erje.net!fu-berlin.de!uni-berlin.de!not-for-mail From: Thomas Passin Newsgroups: comp.lang.python Subject: Re: Help with Streaming and Chunk Processing for Large JSON Data (60 GB) from Kenna API Date: Mon, 30 Sep 2024 14:05:36 -0400 Lines: 20 Message-ID: References: <082705B5-7C14-4D33-BF38-73F9CB166293@barrys-emacs.org> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Trace: news.uni-berlin.de /60RLTrkhN2h1rQKjsNa/AIpf3WGYQEVdJvGFUE+5u8Q== Cancel-Lock: sha1:Iav2gYYgBYlTQ2bYSvYFJm+mggQ= sha256:pXCa/1OLLXyxLD7ZTiXna6zp3sS/1Q8yBy1F/ZGQQ0s= Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org Authentication-Results: mail.python.org; dkim=pass reason="2048-bit key; unprotected key" header.d=tompassin.net header.i=@tompassin.net header.b=nd1fdIjJ; dkim-adsp=pass; dkim-atps=neutral X-Spam-Status: OK 0.003 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'pypi': 0.05; 'subject:API': 0.07; 'library,': 0.09; 'url-ip:151.101.0.223/32': 0.09; 'url-ip:151.101.128.223/32': 0.09; 'url- ip:151.101.192.223/32': 0.09; 'url-ip:151.101.64.223/32': 0.09; 'import': 0.15; 'barry': 0.16; 'janhangeer': 0.16; 'received:10.0.0': 0.16; 'received:64.90': 0.16; 'received:64.90.62': 0.16; 'received:64.90.62.162': 0.16; 'received:dreamhost.com': 0.16; 'url:project': 0.16; 'url:pypi': 0.16; 'wrote:': 0.16; 'subject:Help': 0.17; 'to:addr:python-list': 0.20; 'computer': 0.29; 'header:User-Agent:1': 0.30; 'am,': 0.31; 'python-list': 0.32; 'received:10.0': 0.32; 'received:mailchannels.net': 0.32; 'received:relay.mailchannels.net': 0.32; 'sep': 0.32; 'unless': 0.32; 'subject:for': 0.33; 'there': 0.33; 'header:In-Reply-To:1': 0.34; 'subject:from': 0.37; 'received:100': 0.39; 'url- ip:151.101.0/24': 0.62; 'url-ip:151.101.128/24': 0.62; 'url- ip:151.101.192/24': 0.62; 'url-ip:151.101.64/24': 0.62; 'header:Received:6': 0.67; 'received:64': 0.67; 'subject:Data': 0.71; 'subject: \n ': 0.84 X-Sender-Id: dreamhost|x-authsender|tpassin@tompassin.net ARC-Seal: i=1; s=arc-2022; d=mailchannels.net; t=1727719538; a=rsa-sha256; cv=none; b=sp3BkEKJfpv32D7lDRUXr3McOwsAq0KMCPW/3Q4xV56nNu7GY+AP1FkGnSMTPaUAvaA5os zfgrFXp0wz1u1rn9e3T0Mzn0IqJ2cyFRQirw9RE/kEEr/acAkrnxo0ParcHmycIEF3QeyS 09RnCl5AAZ8NKAncqXmxqu/l+3v+nl5nZcuD/xc0ECRe5NJ/c9Lwz3e/FjgeFXsBsQZVEz Fv3CoQ7rQ0gCG18HDRA6MaPqwceUQz/J28TbaVf1VtfHbk+RovcFU20P3gsKm86KE9f0ie JzPYKKOR8lWVKfnoEuuIFFGlG1wuP2FdU4shZZ+iOU4gn+N02/WDVFtPZ148vw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=mailchannels.net; s=arc-2022; t=1727719538; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=GSPPqe4R36OE6Hmc84KqldpV/ci9NIi0/0U4lGfscRI=; b=T72yLLCyJBEmXw9U9tudAHpa9ogvzXqaCH3+lh2a+B+4N5aj4mcmO7iR24R3qAyR/q4Qhp aQg/8xuMuPCol7fC4/73xmv0HLXqn5AAJ0i8D15kmI24rGlKk8kyWcHOyElYKUX9g5eJhL pPqbGEv5m97G7yO7k3y28I6yZM/6egFXRe4QnZAXRXMLxIPXMCOJt8lN9LmSYc7Oq/y314 BAiLGRBAHc3S5ut2xsD+I+HMKmRBtGIdcYnWijUoCiWALjjE8P77MEI8LbqyDhpslzQOXe 179U+BsvKmiL4Yl+OJeVMlPn7YCB9qTpqSJ0znThR5BS3AfhOAxNlfYzqAjlcA== ARC-Authentication-Results: i=1; rspamd-657f47799c-klltc; auth=pass smtp.auth=dreamhost smtp.mailfrom=list1@tompassin.net X-Sender-Id: dreamhost|x-authsender|tpassin@tompassin.net X-MC-Relay: Neutral X-MailChannels-SenderId: dreamhost|x-authsender|tpassin@tompassin.net X-MailChannels-Auth-Id: dreamhost X-Wide-Eyed-Minister: 12f10cfa5f6cb535_1727719538197_600720402 X-MC-Loop-Signature: 1727719538197:3335633094 X-MC-Ingress-Time: 1727719538197 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tompassin.net; s=dreamhost; t=1727719537; bh=GSPPqe4R36OE6Hmc84KqldpV/ci9NIi0/0U4lGfscRI=; h=Date:Subject:To:From:Content-Type:Content-Transfer-Encoding; b=nd1fdIjJIOFP/MJ8xxxqiemyuWHZdvFPGkHo0dORiyRs7w7vvBwVriuJYV2ghHxtD 2JVP6aSI/N9A6QOdYm6Z1ZpByNFZ622l+jdNaL4aM1a1d6/g6J2FyQFHfn0042dy8Y OyIq3OGwzh13yK/IB++mDpbGXHXc55QPryJRhIhJ/5rSZnwo6N1FLsFZXlvWXwRfoz SHfZDTyBpqw+S30FmFnjlU/enK8JNpEwRice3xRmX41CaOWVnLudymcdQEmWnm03L7 IL8sfLOlivfxepv9y3pxxrOeIpvTkRhoYmhOLV+OmBuymzcDmR/nOlIKBzTyTuirdF Bs70g0DTkeMxw== User-Agent: Mozilla Thunderbird Content-Language: en-US In-Reply-To: <082705B5-7C14-4D33-BF38-73F9CB166293@barrys-emacs.org> X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Mailman-Original-Message-ID: X-Mailman-Original-References: <082705B5-7C14-4D33-BF38-73F9CB166293@barrys-emacs.org> Bytes: 6130 On 9/30/2024 11:30 AM, Barry via Python-list wrote: > > >> On 30 Sep 2024, at 06:52, Abdur-Rahmaan Janhangeer via Python-list wrote: >> >> >> import polars as pl >> pl.read_json("file.json") >> >> > > This is not going to work unless the computer has a lot more the 60GiB of RAM. > > As later suggested a streaming parser is required. There is also the json-stream library, on PyPi at https://pypi.org/project/json-stream/