Warning: mysqli::__construct(): (HY000/1203): User howardkn already has more than 'max_user_connections' active connections in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\includes\artfuncs.php on line 21
Failed to connect to MySQL: (1203) User howardkn already has more than 'max_user_connections' active connections
Warning: mysqli::query(): Couldn't fetch mysqli in D:\Inetpub\vhosts\howardknight.net\al.howardknight.net\index.php on line 66
Article <mailman.9.1727720926.3018.python-list@python.org>
Deutsch   English   Français   Italiano  
<mailman.9.1727720926.3018.python-list@python.org>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!weretis.net!feeder9.news.weretis.net!2.eu.feeder.erje.net!3.eu.feeder.erje.net!feeder.erje.net!fu-berlin.de!uni-berlin.de!not-for-mail
From: 2QdxY4RzWzUUiLuE@potatochowder.com
Newsgroups: comp.lang.python
Subject: Re: Help with Streaming and Chunk Processing for Large JSON Data (60
 GB) from Kenna API
Date: Mon, 30 Sep 2024 14:28:33 -0400
Lines: 34
Message-ID: <mailman.9.1727720926.3018.python-list@python.org>
References: <CA+hg4RiGjXw3am1s=zVLDpcA-VGS+cWNp_YEyzvS+j2MyDE2Cg@mail.gmail.com>
 <CADrxXXmHUwsQbWqNrwzyKWLyTK0J3Hf0z8hAhGwKYoF2PwK7QA@mail.gmail.com>
 <CA+hg4Rhn8iX7rp0uC=MbOi+8g73wQ4y4=uV0dU0jHdDUz3jk4w@mail.gmail.com>
 <CAJQBtgk122sHzs+=MumYM1HW2DwKm1+i02bqgBKh4oUJYievCg@mail.gmail.com>
 <4XHQPG4LzsznVwM@mail.python.org> <Zvrt0RJe5omaFkQq@anomaly>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Trace: news.uni-berlin.de s3fglwQNFXtUYos2gW8lNANZsIoe4rhBBAf2af9+5+7w==
Cancel-Lock: sha1:vT25r4d9H9Z0E8zDfjpDqXCBX2k= sha256:3VXhoOf7N2FVzZc0y8TMT80Gijis9z0WUCtkodIsmP0=
Return-Path: <2QdxY4RzWzUUiLuE@potatochowder.com>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=pass
 reason="2048-bit key; unprotected key"
 header.d=potatochowder.com header.i=@potatochowder.com
 header.b=V7kHCcSC; dkim-adsp=pass; dkim-atps=neutral
X-Spam-Status: OK 0.003
X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'stream': 0.04;
 'subject:API': 0.07; 'cases.': 0.09; 'general,': 0.09; 'json':
 0.09; 'language,': 0.09; 'originally': 0.09; 'parse': 0.09;
 'received:78': 0.09; 'memory': 0.15; '(it': 0.16; 'flip': 0.16;
 'from:addr:2qdxy4rzwzuuilue': 0.16; 'from:addr:potatochowder.com':
 0.16; 'interesting.': 0.16; 'missing?': 0.16; 'oh,': 0.16;
 'received:136.243': 0.16; 'received:172.58': 0.16;
 'received:78.46': 0.16; 'received:78.46.172': 0.16;
 'received:www458.your-server.de': 0.16; 'received:your-server.de':
 0.16; 'structure.': 0.16; 'wrote:': 0.16; 'problem': 0.16;
 'subject:Help': 0.17; 'grant': 0.17; 'instead': 0.17; "can't":
 0.17; 'to:addr:python-list': 0.20; 'language': 0.21; 'written':
 0.22; 'run': 0.23; 'received:de': 0.23; '(and': 0.25; 'depends':
 0.25; 'cannot': 0.25; "doesn't": 0.32; 'python-list': 0.32;
 'received:136': 0.32; 'but': 0.32; 'subject:for': 0.33; "didn't":
 0.34; 'header:In-Reply-To:1': 0.34; 'special': 0.37;
 'subject:from': 0.37; 'file': 0.38; 'way': 0.38; 'least': 0.39;
 'single': 0.39; 'handle': 0.39; 'valid': 0.39; 'still': 0.40;
 'match': 0.40; 'should': 0.40; 'skip:h 10': 0.61; 'imagine': 0.64;
 'numbers': 0.67; 'back': 0.67; 'that,': 0.67; 'right': 0.68;
 'order': 0.69; '13th': 0.69; 'century': 0.69; 'subject:Data':
 0.71; 'degree': 0.76; 'limits': 0.76; 'significant': 0.78; 'left':
 0.83; 'be).': 0.84; 'billion': 0.84; 'subject: \n ': 0.84;
 'anticipated': 0.91
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
 d=potatochowder.com; s=default2305; h=In-Reply-To:Content-Type:MIME-Version:
 References:Message-ID:Subject:To:From:Date:Sender:Reply-To:Cc:
 Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date:
 Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID;
 bh=eFm92K+ige4xyTDLREIJ/tKr9E+RB9YOg8fCGPMRtCs=; b=V7kHCcSC2meM7NpSmjCaimyCiD
 7oav/S2hf27dSi5EYPzt1DRZqI08tzBy7VDknycfn+931RhzN8h/Aq/9G3RIOI4d2jgpr5bYfSRRn
 JMcsRqGpBzjosoheB/iELyf39JQKs2eBHP1dK7cvkaGgHq3bQEY9Gm2m6QHheYn1ddHfNMv5t4SOH
 7ZeJKQUOkKtRsoowjzueuU0rHT8GdbP0XwBH85qoZ+IlZ8V8MBtMfNa+sdL/otfvtAs6gpfk1GvXc
 B7mfoUoHN/R0oPR5x+mXm2M7wF0RQVU0/dvZ4lvp5n/gcutin0/RE443mrJiBf+J4w02FcZs4Vraf
 Nphdbqmg==;
Mail-Followup-To: python-list@python.org
Content-Disposition: inline
In-Reply-To: <4XHQPG4LzsznVwM@mail.python.org>
X-Authenticated-Sender: 2QdxY4RzWzUUiLuE@potatochowder.com
X-Virus-Scanned: Clear (ClamAV 0.103.10/27413/Mon Sep 30 10:48:24 2024)
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
 <python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
 <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
 <mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <Zvrt0RJe5omaFkQq@anomaly>
X-Mailman-Original-References: <CA+hg4RiGjXw3am1s=zVLDpcA-VGS+cWNp_YEyzvS+j2MyDE2Cg@mail.gmail.com>
 <CADrxXXmHUwsQbWqNrwzyKWLyTK0J3Hf0z8hAhGwKYoF2PwK7QA@mail.gmail.com>
 <CA+hg4Rhn8iX7rp0uC=MbOi+8g73wQ4y4=uV0dU0jHdDUz3jk4w@mail.gmail.com>
 <CAJQBtgk122sHzs+=MumYM1HW2DwKm1+i02bqgBKh4oUJYievCg@mail.gmail.com>
 <4XHQPG4LzsznVwM@mail.python.org>
Bytes: 6628

On 2024-09-30 at 11:44:50 -0400,
Grant Edwards via Python-list <python-list@python.org> wrote:

> On 2024-09-30, Left Right via Python-list <python-list@python.org> wrote:
> > Whether and to what degree you can stream JSON depends on JSON
> > structure. In general, however, JSON cannot be streamed (but commonly
> > it can be).
> >
> > Imagine a pathological case of this shape: 1... <60GB of digits>. This
> > is still a valid JSON (it doesn't have any limits on how many digits a
> > number can have). And you cannot parse this number in a streaming way
> > because in order to do that, you need to start with the least
> > significant digit.
> 
> Which is how arabic numbers were originally parsed, but when
> westerners adopted them from a R->L written language, thet didn't flip
> them around to match the L->R written language into which they were
> being adopted.

Interesting.

> So now long numbers can't be parsed as a stream in software. They
> should have anticipated this problem back in the 13th century and
> flipped the numbers around.

What am I missing?  Handwavingly, start with the first digit, and as
long as the next character is a digit, multipliy the accumulated result
by 10 (or the appropriate base) and add the next value.  Oh, and handle
scientific notation as a special case, and perhaps fail spectacularly
instead of recovering gracefully in certain edge cases.  And in the
pathological case of a single number with 60 billion digits, run out of
memory (and complain loudly to the person who claimed that the file
contained a "dataset").  But why do I need to start with the least
significant digit?