Warning: mysqli::__construct(): (HY000/1203): User howardkn already has more than 'max_user_connections' active connections in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\includes\artfuncs.php on line 21
Failed to connect to MySQL: (1203) User howardkn already has more than 'max_user_connections' active connections
Warning: mysqli::query(): Couldn't fetch mysqli in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\index.php on line 66
Article <mailman.1.1727675375.3018.python-list@python.org>
Deutsch   English   Français   Italiano  
<mailman.1.1727675375.3018.python-list@python.org>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!weretis.net!feeder8.news.weretis.net!fu-berlin.de!uni-berlin.de!not-for-mail
From: Abdur-Rahmaan Janhangeer <arj.python@gmail.com>
Newsgroups: comp.lang.python
Subject: Re: Help with Streaming and Chunk Processing for Large JSON Data (60
 GB) from Kenna API
Date: Mon, 30 Sep 2024 09:49:21 +0400
Lines: 51
Message-ID: <mailman.1.1727675375.3018.python-list@python.org>
References: <CA+hg4RiGjXw3am1s=zVLDpcA-VGS+cWNp_YEyzvS+j2MyDE2Cg@mail.gmail.com>
 <CADrxXXmHUwsQbWqNrwzyKWLyTK0J3Hf0z8hAhGwKYoF2PwK7QA@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Trace: news.uni-berlin.de x03iIIF20YvrFJ7wrdHDUA/mDQRgqM6otaxiRacjCZWA==
Cancel-Lock: sha1:nw+i+LhAVdq+HfjV2hlBjPX/LXQ= sha256:gmivHVymgG0j6v9XpZCMPWv92t/TYtUmZBkzfqvZz7U=
Return-Path: <arj.python@gmail.com>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=pass
 reason="2048-bit key; unprotected key"
 header.d=gmail.com header.i=@gmail.com header.b=gcdf0x+I;
 dkim-adsp=pass; dkim-atps=neutral
X-Spam-Status: OK 0.001
X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'url-ip:140.82/16': 0.03;
 'stream': 0.04; 'subject:API': 0.07; 'python.': 0.08; 'cc:addr
 :python-list': 0.09; 'email addr:python.org>': 0.09; 'json': 0.09;
 'url:reference': 0.09; 'cc:no real name:2**0': 0.14; 'url:github':
 0.14; 'import': 0.15; 'url:mailman': 0.15; 'memory': 0.15; 'url-
 ip:140/8': 0.15; '2024': 0.16; 'ali': 0.16; 'dataset': 0.16;
 'efficiently': 0.16; 'endpoint': 0.16; 'endpoints': 0.16;
 'from:addr:arj.python': 0.16; 'from:name:abdur-rahmaan
 janhangeer': 0.16; 'help!': 0.16; 'janhangeer': 0.16; 'mauritius':
 0.16; 'single,': 0.16; 'size.': 0.16; 'wrote:': 0.16; 'python':
 0.16; 'api': 0.17; 'github': 0.17; 'pull': 0.17; 'subject:Help':
 0.17; 'guidance': 0.19; 'libraries': 0.19; 'cc:addr:python.org':
 0.20; 'skip:p 30': 0.23; 'url-ip:188.166.95.178/32': 0.25; 'url-
 ip:188.166.95/24': 0.25; 'url:listinfo': 0.25; 'cc:2**0': 0.25;
 'url-ip:188.166/16': 0.25; 'anyone': 0.25; 'seems': 0.26; 'tried':
 0.26; 'library': 0.26; 'greatly': 0.28; 'email
 addr:python.org>': 0.28; 'requests': 0.28; 'blog': 0.30; 'url-
 ip:188/8': 0.31; 'wondering': 0.31; 'format,': 0.32; 'manner.':
 0.32; 'python-list': 0.32; 'retrieve': 0.32; 'sep': 0.32;
 'message-id:@mail.gmail.com': 0.32; 'but': 0.32; 'subject:for':
 0.33; 'appreciated.': 0.34; 'header:In-Reply-To:1': 0.34;
 'received:google.com': 0.34; 'handling': 0.35;
 'from:addr:gmail.com': 0.35; 'cases': 0.36; 'mon,': 0.36;
 'subject:from': 0.37; 'using': 0.37; 'file': 0.38; 'way': 0.38;
 'quite': 0.39; 'use': 0.39; 'data.': 0.40; 'best': 0.61; 'dear':
 0.62; 'here': 0.62; 'experience': 0.64; 'your': 0.64; 'similar':
 0.65; 'well': 0.65; 'export': 0.69; 'terms': 0.70; 'subject:Data':
 0.71; 'offer': 0.71; 'relevant': 0.73; 'email name:<python-
 list': 0.84; 'management.': 0.84; 'massive': 0.84; 'proving':
 0.84; 'subject: \n ': 0.84
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=gmail.com; s=20230601; t=1727675372; x=1728280172; darn=python.org;
 h=cc:to:subject:message-id:date:from:in-reply-to:references
 :mime-version:from:to:cc:subject:date:message-id:reply-to;
 bh=J8lWIn6ZZ6cAWncF7UZJXPDpitrhv0RETrcL0mdgvYc=;
 b=gcdf0x+IK7A0kDUG4nOrTTriebG3Z5rwi/fSM5wpwWTIqCAIBYM145bA8xsJgjTrx0
 eEhh9halQhwLj9AWLLP0ZZHg3NZp7dPD4dIkJUguZKOXFvqX8zqnng3H797tlIDNdJ4t
 iOhsWcAKpiyuEv2WeRuu1WKh66XmSuFxcwmuB6dI1KH9R6Qi+p5uPme8k1+bM3AD7I6i
 StrA0sTwDsA9n06DEj3t6HH8AGEyygOqLH4W1DRzkLa+ZLcxd5ZlYc8+hfzlf8fSS487
 EKvP0lqpMIOo/bsifSgTom/FlFjjgPU1WG9R+MHtNt2+puezL4yYXJBboIq8ylZMo0Gh
 CT8g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20230601; t=1727675372; x=1728280172;
 h=cc:to:subject:message-id:date:from:in-reply-to:references
 :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id
 :reply-to;
 bh=J8lWIn6ZZ6cAWncF7UZJXPDpitrhv0RETrcL0mdgvYc=;
 b=FD3J1ouwLypZBCsrXsDb/lmSTcHx3/wLW71UyZmxoc84l6aPPUCwcOfrr9uh0wTH9b
 zQEOdVs3vs0BrP5DmxtmgPMkZpHgfhCN+J48PzP4ro94zy8YjXXUHrk9mRQ6HUiDLklV
 2pBz2IwvTKCeG+hogR1fOhMFG1wK8oWvA0xh/9yYPOBaBxJCLBXUYuILabTQsUvPpJwn
 qHemsP5+602Wg7JMBJP17Bg+GpzXLW3pfWrdjAEoF56hzAtQuQuXMD8ZzIzspC5BeQZD
 LQxR4r5QPrwzQ/o3XqmmboATOxbdDeIHFq7EbYYL2EGmbevwxwuIatqqd8tjYsttcpVH
 PgHg==
X-Gm-Message-State: AOJu0YxsvROh1s47f4F77BchlmbMLTztV3StCKJcbmYMDnBVv3lt/Emr
 KG64R8B/Xfg2ccTZB+A0XKnL+u7xsAiLrDQ5yY9MNonEN21UC9mSzfTOEHoS5ORQ56GxRMYEk//
 Su2Ul3Mlu+eA8PtxTBoBB9iYqvGNp1yErz+k=
X-Google-Smtp-Source: AGHT+IG9LSVw7iJtISNeoRwV2Zkk2ZpMYjRXEzD11qlc4JCkTO2so14aeLnYCqHmNZ0KYiDGLl4/aLNWiaehOgIjE/o=
X-Received: by 2002:a05:690c:ec9:b0:6e2:3f8c:8fe2 with SMTP id
 00721157ae682-6e2474f43dcmr88864917b3.4.1727675372272; Sun, 29 Sep 2024
 22:49:32 -0700 (PDT)
In-Reply-To: <CA+hg4RiGjXw3am1s=zVLDpcA-VGS+cWNp_YEyzvS+j2MyDE2Cg@mail.gmail.com>
X-Content-Filtered-By: Mailman/MimeDel 2.1.39
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
 <python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
 <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
 <mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <CADrxXXmHUwsQbWqNrwzyKWLyTK0J3Hf0z8hAhGwKYoF2PwK7QA@mail.gmail.com>
X-Mailman-Original-References: <CA+hg4RiGjXw3am1s=zVLDpcA-VGS+cWNp_YEyzvS+j2MyDE2Cg@mail.gmail.com>
Bytes: 7455

Idk if you tried Polars, but it seems to work well with JSON data

import polars as pl
pl.read_json("file.json")

Kind Regards,

Abdur-Rahmaan Janhangeer
about <https://compileralchemy.github.io/> | blog
<https://www.pythonkitchen.com>
github <https://github.com/Abdur-RahmaanJ>
Mauritius


On Mon, Sep 30, 2024 at 8:00=E2=80=AFAM Asif Ali Hirekumbi via Python-list =
<
python-list@python.org> wrote:

> Dear Python Experts,
>
> I am working with the Kenna Application's API to retrieve vulnerability
> data. The API endpoint provides a single, massive JSON file in gzip forma=
t,
> approximately 60 GB in size. Handling such a large dataset in one go is
> proving to be quite challenging, especially in terms of memory management=
..
>
> I am looking for guidance on how to efficiently stream this data and
> process it in chunks using Python. Specifically, I am wondering if there=
=E2=80=99s
> a way to use the requests library or any other libraries that would allow
> us to pull data from the API endpoint in a memory-efficient manner.
>
> Here are the relevant API endpoints from Kenna:
>
>    - Kenna API Documentation
>    <https://apidocs.kennasecurity.com/reference/welcome>
>    - Kenna Vulnerabilities Export
>    <https://apidocs.kennasecurity.com/reference/retrieve-data-export>
>
> If anyone has experience with similar use cases or can offer any advice, =
it
> would be greatly appreciated.
>
> Thank you in advance for your help!
>
> Best regards
> Asif Ali
> --
> https://mail.python.org/mailman/listinfo/python-list
>