Deutsch English Français Italiano |
<mailman.6.1727711092.3018.python-list@python.org> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!3.eu.feeder.erje.net!feeder.erje.net!fu-berlin.de!uni-berlin.de!not-for-mail From: Grant Edwards <grant.b.edwards@gmail.com> Newsgroups: comp.lang.python Subject: Re: Help with Streaming and Chunk Processing for Large JSON Data (60 GB) from Kenna API Date: Mon, 30 Sep 2024 11:44:50 -0400 (EDT) Lines: 23 Message-ID: <mailman.6.1727711092.3018.python-list@python.org> References: <CA+hg4RiGjXw3am1s=zVLDpcA-VGS+cWNp_YEyzvS+j2MyDE2Cg@mail.gmail.com> <CADrxXXmHUwsQbWqNrwzyKWLyTK0J3Hf0z8hAhGwKYoF2PwK7QA@mail.gmail.com> <CA+hg4Rhn8iX7rp0uC=MbOi+8g73wQ4y4=uV0dU0jHdDUz3jk4w@mail.gmail.com> <CAJQBtgk122sHzs+=MumYM1HW2DwKm1+i02bqgBKh4oUJYievCg@mail.gmail.com> <4XHQPG4LzsznVwM@mail.python.org> X-Trace: news.uni-berlin.de y+IUxxXQaceagOBN5GeZoQ4HdFht8XOnL66IV7VEHwWg== Cancel-Lock: sha1:L9o4SdobHkLwnmZYJYp50MgbCY0= sha256:n0iBHVdakrf+Iv8U7iNk2KQyhRVWpMvOe+fHAUDUD5w= Return-Path: <grant.b.edwards@gmail.com> X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org Authentication-Results: mail.python.org; dkim=none reason="no signature"; dkim-adsp=none (unprotected policy); dkim-atps=neutral X-Spam-Status: OK 0.040 X-Spam-Evidence: '*H*': 0.92; '*S*': 0.00; 'stream': 0.04; 'subject:API': 0.07; 'general,': 0.09; 'json': 0.09; 'language,': 0.09; 'originally': 0.09; 'parse': 0.09; '(it': 0.16; 'flip': 0.16; 'from:addr:grant.b.edwards': 0.16; 'from:name:grant edwards': 0.16; 'structure.': 0.16; 'wrote:': 0.16; 'problem': 0.16; 'subject:Help': 0.17; "can't": 0.17; 'to:addr:python-list': 0.20; 'language': 0.21; 'written': 0.22; 'depends': 0.25; 'cannot': 0.25; 'header:User-Agent:1': 0.30; "doesn't": 0.32; 'python-list': 0.32; 'but': 0.32; 'subject:for': 0.33; "didn't": 0.34; 'from:addr:gmail.com': 0.35; 'subject:from': 0.37; 'way': 0.38; 'least': 0.39; 'valid': 0.39; 'still': 0.40; 'match': 0.40; 'should': 0.40; 'imagine': 0.64; 'numbers': 0.67; 'back': 0.67; 'that,': 0.67; 'message-id:invalid': 0.68; 'right': 0.68; 'order': 0.69; '13th': 0.69; 'century': 0.69; 'subject:Data': 0.71; 'degree': 0.76; 'limits': 0.76; 'significant': 0.78; 'left': 0.83; 'be).': 0.84; 'anticipated': 0.91 User-Agent: slrn/1.0.3 (Linux) X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: General discussion list for the Python programming language <python-list.python.org> List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> List-Archive: <https://mail.python.org/pipermail/python-list/> List-Post: <mailto:python-list@python.org> List-Help: <mailto:python-list-request@python.org?subject=help> List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe> X-Mailman-Original-Message-ID: <4XHQPG4LzsznVwM@mail.python.org> X-Mailman-Original-References: <CA+hg4RiGjXw3am1s=zVLDpcA-VGS+cWNp_YEyzvS+j2MyDE2Cg@mail.gmail.com> <CADrxXXmHUwsQbWqNrwzyKWLyTK0J3Hf0z8hAhGwKYoF2PwK7QA@mail.gmail.com> <CA+hg4Rhn8iX7rp0uC=MbOi+8g73wQ4y4=uV0dU0jHdDUz3jk4w@mail.gmail.com> <CAJQBtgk122sHzs+=MumYM1HW2DwKm1+i02bqgBKh4oUJYievCg@mail.gmail.com> Bytes: 4198 On 2024-09-30, Left Right via Python-list <python-list@python.org> wrote: > Whether and to what degree you can stream JSON depends on JSON > structure. In general, however, JSON cannot be streamed (but commonly > it can be). > > Imagine a pathological case of this shape: 1... <60GB of digits>. This > is still a valid JSON (it doesn't have any limits on how many digits a > number can have). And you cannot parse this number in a streaming way > because in order to do that, you need to start with the least > significant digit. Which is how arabic numbers were originally parsed, but when westerners adopted them from a R->L written language, thet didn't flip them around to match the L->R written language into which they were being adopted. So now long numbers can't be parsed as a stream in software. They should have anticipated this problem back in the 13th century and flipped the numbers around.