Path: ...!3.eu.feeder.erje.net!feeder.erje.net!fu-berlin.de!uni-berlin.de!not-for-mail From: Asif Ali Hirekumbi Newsgroups: comp.lang.python Subject: Help with Streaming and Chunk Processing for Large JSON Data (60 GB) from Kenna API Date: Fri, 27 Sep 2024 11:47:12 +0530 Lines: 27 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Trace: news.uni-berlin.de 8LtqYXnlFh6E/HIdpZMHrAGi1MIZz/VfM4VG8rjUbDcA== Cancel-Lock: sha1:oTy3s6TfCc4IHkFQcBI6cMEEijo= sha256:FHzzKT9JyaiS+tJlWuWRwdPjAsvvdkK284ZS/5/fwqg= Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org Authentication-Results: mail.python.org; dkim=pass reason="2048-bit key; unprotected key" header.d=gmail.com header.i=@gmail.com header.b=FDHBr9bt; dkim-adsp=pass; dkim-atps=neutral X-Spam-Status: OK 0.015 X-Spam-Evidence: '*H*': 0.97; '*S*': 0.00; 'stream': 0.04; 'subject:API': 0.07; 'python.': 0.08; 'json': 0.09; 'url:reference': 0.09; 'memory': 0.15; 'ali': 0.16; 'dataset': 0.16; 'efficiently': 0.16; 'endpoint': 0.16; 'endpoints': 0.16; 'help!': 0.16; 'received:mail-oi1-x22a.google.com': 0.16; 'single,': 0.16; 'size.': 0.16; 'python': 0.16; 'api': 0.17; 'pull': 0.17; 'subject:Help': 0.17; 'guidance': 0.19; 'libraries': 0.19; 'to:addr:python-list': 0.20; 'anyone': 0.25; 'library': 0.26; 'greatly': 0.28; 'requests': 0.28; 'wondering': 0.31; 'format,': 0.32; 'manner.': 0.32; 'retrieve': 0.32; 'message- id:@mail.gmail.com': 0.32; 'subject:for': 0.33; 'appreciated.': 0.34; 'received:google.com': 0.34; 'handling': 0.35; 'from:addr:gmail.com': 0.35; 'cases': 0.36; 'subject:from': 0.37; 'using': 0.37; 'file': 0.38; 'way': 0.38; 'quite': 0.39; 'use': 0.39; 'data.': 0.40; 'best': 0.61; 'dear': 0.62; 'here': 0.62; 'experience': 0.64; 'your': 0.64; 'similar': 0.65; 'export': 0.69; 'terms': 0.70; 'subject:Data': 0.71; 'offer': 0.71; 'relevant': 0.73; 'management.': 0.84; 'massive': 0.84; 'proving': 0.84 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1727417844; x=1728022644; darn=python.org; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=YSyWwZBoqxfUMiKM0rzkrqyNPT5aJkDA3V34+3M6jxI=; b=FDHBr9btXnWjZWb/z4IWcOesIOBYk1H1QoMWFCKOCjzllxGeapdi812ex9Fb8hGDVk ixxha2JRnGDNLILt44vBlahPMi5KczfIDOs74W3/ogc1pauk/MI0YSA17iVZQeW8YgVP GaDe3S1lKxiAPu0xWLilKOsZZ/cXf3HHi9u5YQnv384ZJSE4hexrV1kKqy8PnOUttW96 9/LnL8t/5HQTA0ya0DIVGSTQSU1KQOIUBMqPcoUGsMPHX0Pg6JralMjjAe15mEyY+Ejc 0p+5Adzt6VKC3ojGTx5FkPmcUr+CTTjxMdwmEiI2hJFDXYxWcPFnFv20HzCeXnNbcBdd X/2Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727417844; x=1728022644; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=YSyWwZBoqxfUMiKM0rzkrqyNPT5aJkDA3V34+3M6jxI=; b=jlHJ51hz9oTCuWPqLiPOjHWk5okVgiUUg3RAOTlXY3GMOuUhqRcsv7S6BQFychOYTV Of/PiNnb9deTMA4LGtM+L/V5mobjdxdFtchjbu6b/wEOh6T9EFXqtygWtOrNUHNwnJ5q au6GWKYtuxmu3IxVp24y8EnuBDECKT4rtBZoFEf3J7s+AMHp2q20FnsFI5k72R0u6dam WZprUqNSEUTOsd+YSpbPjS/vak/TSyeiO2ip/ai6J2Rg1+LcpFn0MjqnTB6s4gxkW0GJ Glm09e20EHsmezEQmOdL8IlbZN7RtQ3R2zhr8qXNp5P4QC+rFYEwHwCFEa+GADUM5WZd NcDQ== X-Gm-Message-State: AOJu0Yw9z7fXXBqzCHyfd33XXJfimn9r/nxJCxE1FMDRTEdQfoVex8dh z5vv9kyBdDe4NnSnM9I9IrIcx+mW7G/pRGnJWRJW6zcUCBEsqKp7Yy3dcovutYqm8CZHnY/6mCO wr//lz2W79gxR9/gQWfeb/THuOuwQKo6O2UE= X-Google-Smtp-Source: AGHT+IGGduACiEIOQpmwhyH8/CBCs9X+E557BYnWVGnwJ4/uveeC5IcfnE80OUTK4MWgmrc4/LU7giOX/hU6E6tbDyA= X-Received: by 2002:a05:6808:3024:b0:3e0:4646:aa94 with SMTP id 5614622812f47-3e393962187mr1558923b6e.18.1727417843748; Thu, 26 Sep 2024 23:17:23 -0700 (PDT) X-Mailman-Approved-At: Mon, 30 Sep 2024 00:00:48 -0400 X-Content-Filtered-By: Mailman/MimeDel 2.1.39 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Mailman-Original-Message-ID: Bytes: 5901 Dear Python Experts, I am working with the Kenna Application's API to retrieve vulnerability data. The API endpoint provides a single, massive JSON file in gzip format, approximately 60 GB in size. Handling such a large dataset in one go is proving to be quite challenging, especially in terms of memory management. I am looking for guidance on how to efficiently stream this data and process it in chunks using Python. Specifically, I am wondering if there=E2= =80=99s a way to use the requests library or any other libraries that would allow us to pull data from the API endpoint in a memory-efficient manner. Here are the relevant API endpoints from Kenna: - Kenna API Documentation - Kenna Vulnerabilities Export If anyone has experience with similar use cases or can offer any advice, it would be greatly appreciated. Thank you in advance for your help! Best regards Asif Ali