Deutsch English Français Italiano |
<mailman.27.1727877147.3018.python-list@python.org> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!fu-berlin.de!uni-berlin.de!not-for-mail From: Left Right <olegsivokon@gmail.com> Newsgroups: comp.lang.python Subject: Re: Help with Streaming and Chunk Processing for Large JSON Data (60 GB) from Kenna API Date: Wed, 2 Oct 2024 08:05:02 +0200 Lines: 19 Message-ID: <mailman.27.1727877147.3018.python-list@python.org> References: <CADrxXXmHUwsQbWqNrwzyKWLyTK0J3Hf0z8hAhGwKYoF2PwK7QA@mail.gmail.com> <082705B5-7C14-4D33-BF38-73F9CB166293@barrys-emacs.org> <9dfcd123-c31d-4207-869c-d5466487cba4@tompassin.net> <CAJQBtgkLVyNK+vw4u3bFCFEQDH8T3rpyTL+ERyyYHZJskQR6PQ@mail.gmail.com> <CAJQBtgnpNkpg-mF2yFCS4P4GYAYsKQ9nEw3Xygja=SE3-=N2Dw@mail.gmail.com> <mailman.19.1727796506.3018.python-list@python.org> <lm391bFu38hU1@mid.individual.net> <CAJQBtgmZehSeBu0y73ALdVq00LHi-R_KKS893FwJkEjkLnsXtA@mail.gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" X-Trace: news.uni-berlin.de q+tFNDFgoQTqoxUgrsCSjALQDzccW1ODqqfKzcMCgFnQ== Cancel-Lock: sha1:8nJ4utvnYbIpCLygjPqRl/kK3Z0= sha256:SKuPVMRfJvze8CnY4XrAA955yeD6EKKZHqhQv0aEzvs= Return-Path: <olegsivokon@gmail.com> X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org Authentication-Results: mail.python.org; dkim=pass reason="2048-bit key; unprotected key" header.d=gmail.com header.i=@gmail.com header.b=R76SfGtL; dkim-adsp=pass; dkim-atps=neutral X-Spam-Status: OK 0.044 X-Spam-Evidence: '*H*': 0.91; '*S*': 0.00; 'class.': 0.07; 'subject:API': 0.07; 'cc:addr:python-list': 0.09; 'json': 0.09; 'theory': 0.09; 'typically': 0.09; 'cc:no real name:2**0': 0.14; 'entirety': 0.16; 'hand,': 0.16; 'parsing': 0.16; 'practice,': 0.16; 'received:mail-qv1-xf2e.google.com': 0.16; 'subject:Help': 0.17; 'figure': 0.19; 'cc:addr:python.org': 0.20; 'languages': 0.22; 'examples': 0.25; 'stuff': 0.25; 'cannot': 0.25; 'cc:2**0': 0.25; 'output': 0.28; "doesn't": 0.32; 'words,': 0.32; 'message- id:@mail.gmail.com': 0.32; 'but': 0.32; "i'm": 0.33; 'subject:for': 0.33; 'there': 0.33; 'able': 0.34; 'same': 0.34; 'mean': 0.34; 'header:In-Reply-To:1': 0.34; 'received:google.com': 0.34; 'from:addr:gmail.com': 0.35; 'cases': 0.36; 'subject:from': 0.37; "it's": 0.37; 'though': 0.37; 'read': 0.38; 'hand': 0.40; 'something': 0.40; 'want': 0.40; 'should': 0.40; 'sorry': 0.60; 'gave': 0.61; 'come': 0.62; 'ever': 0.63; 'email': 0.63; 'everything': 0.63; "you'd": 0.64; 'definition': 0.64; 'well': 0.65; 'exactly': 0.68; 'and,': 0.69; 'piece': 0.69; 'subject:Data': 0.71; 'study': 0.82; 'subject: \n ': 0.84 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1727849114; x=1728453914; darn=python.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=+41u2NIzn2+NBc+TmUWPhFuWQIkiMECqgtcmhEmn9qc=; b=R76SfGtLeK2/+8iX72n/G8mh0z92kMns9YSKncJ2IDqgeXh8e4wGaKS+D82KKMNw3A tROiT8TZJvE3FirMivlppsPbGEz3qxrsobMi9FW1DLei4s7m0dLgKIAm7sjWtjLGp3wg zxgy9o+4VHwk1nnxzJglsooDsW+n3oCW7pXejf30s8aoy3sw+JaibROrBfWzKy/P5mc8 pEkQWbAt1vNolueyWSB9mmXTuqV/+/15t2lwAqg81seq4GBfQ97b7gDueXrZmWKQIR9Z Us/OlWz0iHqPaOA65dqCMFdcdNKZ7F5ji32bfhNFxjDmTik19HKKfkVLzJF6WSHCQPok MQcw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727849114; x=1728453914; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=+41u2NIzn2+NBc+TmUWPhFuWQIkiMECqgtcmhEmn9qc=; b=Aub+DR5mg69VwOiDofWqlHI+e0XlNSjMrNB1dAZWKWcbLhyGvJosweA4kDOoqz6NwY Zymxb2j8qIhlS5T7Yq5/UWP8V/GxpU74utXm75pD2jKyXLWWvfWNCgNXV8d0y1nKXQcY jC3g3rId1OcpNuz9Ihcg89Q6qJP7olndQajkDU8IjEWYKH/AmR0Y/FKtrF7N/AI7mkqC 8oSxmVs16JaZunwa4RF4JQMgI04mqiLNbr2P8cPhyl5nfssy+KfPBjJFCrfCuQTtnc+F x4xYyqzhqRPvoM28ou9lqvtFjqV65tNUves72eTV3M9fNhg0Zdjy46IGWqP/Q/GeYW7o 4QYw== X-Gm-Message-State: AOJu0Yw/E+c9XsY3UPp161CfQ4djOwALKhZydescgMimtnI0RQpAY/Dl xYamAKcYnlH9EE4BP9A7ErytXONJZcWAK/HjrJh2BzVJTOxHv5+6KlDVmYgXcAhuTyRVCjMffbs 0UwIf5DAygu+UOL9CqpxnIzlum3k= X-Google-Smtp-Source: AGHT+IEmbJTTqxiL+2zAC5FalLCONqzD9x7KpiSW8CUpsaGj4dXFWBwIl3DuiX0oWrCaPQIf/Ahmht2MNxM0dabX5sI= X-Received: by 2002:a05:6214:5503:b0:6cb:4c23:6576 with SMTP id 6a1803df08f44-6cb81a62007mr26325716d6.37.1727849114065; Tue, 01 Oct 2024 23:05:14 -0700 (PDT) In-Reply-To: <lm391bFu38hU1@mid.individual.net> X-Mailman-Approved-At: Wed, 02 Oct 2024 09:52:25 -0400 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: General discussion list for the Python programming language <python-list.python.org> List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> List-Archive: <https://mail.python.org/pipermail/python-list/> List-Post: <mailto:python-list@python.org> List-Help: <mailto:python-list-request@python.org?subject=help> List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe> X-Mailman-Original-Message-ID: <CAJQBtgmZehSeBu0y73ALdVq00LHi-R_KKS893FwJkEjkLnsXtA@mail.gmail.com> X-Mailman-Original-References: <CADrxXXmHUwsQbWqNrwzyKWLyTK0J3Hf0z8hAhGwKYoF2PwK7QA@mail.gmail.com> <082705B5-7C14-4D33-BF38-73F9CB166293@barrys-emacs.org> <9dfcd123-c31d-4207-869c-d5466487cba4@tompassin.net> <CAJQBtgkLVyNK+vw4u3bFCFEQDH8T3rpyTL+ERyyYHZJskQR6PQ@mail.gmail.com> <CAJQBtgnpNkpg-mF2yFCS4P4GYAYsKQ9nEw3Xygja=SE3-=N2Dw@mail.gmail.com> <mailman.19.1727796506.3018.python-list@python.org> <lm391bFu38hU1@mid.individual.net> Bytes: 6724 > By that definition of "streaming", no parser can ever be streaming, > because there will be some constructs that must be read in their > entirety before a suitably-structured piece of output can be > emitted. In the same email you replied to, I gave examples of languages for which parsers can be streaming (in general): SCSI or IP. For some languages (eg. everything in the context-free family) streaming parsers are _in general_ impossible, because there are pathological cases like the one with parsing numbers. But this doesn't mean that you cannot come up with a parser that is only useful _sometimes_. And, in practice, languages like XML or JSON do well with streaming, even though in general it's impossible. I'm sorry if this comes as a surprise. On one hand I don't want to sound condescending, on the other hand, this is something that you'd typically study in automata theory class. Well, not exactly in the very same words, but you should be able to figure this stuff out if you had that class.