Deutsch   English   Français   Italiano  
<mailman.30.1727920574.3018.python-list@python.org>

View for Bookmarking (what is this?)
Look up another Usenet article

Path: ...!fu-berlin.de!uni-berlin.de!not-for-mail
From: Left Right <olegsivokon@gmail.com>
Newsgroups: comp.lang.python
Subject: Re: Help with Streaming and Chunk Processing for Large JSON Data (60
 GB) from Kenna API
Date: Thu, 3 Oct 2024 00:48:10 +0200
Lines: 39
Message-ID: <mailman.30.1727920574.3018.python-list@python.org>
References: <CADrxXXmHUwsQbWqNrwzyKWLyTK0J3Hf0z8hAhGwKYoF2PwK7QA@mail.gmail.com>
 <082705B5-7C14-4D33-BF38-73F9CB166293@barrys-emacs.org>
 <9dfcd123-c31d-4207-869c-d5466487cba4@tompassin.net>
 <CAJQBtgkLVyNK+vw4u3bFCFEQDH8T3rpyTL+ERyyYHZJskQR6PQ@mail.gmail.com>
 <CAJQBtgnpNkpg-mF2yFCS4P4GYAYsKQ9nEw3Xygja=SE3-=N2Dw@mail.gmail.com>
 <mailman.19.1727796506.3018.python-list@python.org>
 <lm391bFu38hU1@mid.individual.net>
 <CAJQBtgmZehSeBu0y73ALdVq00LHi-R_KKS893FwJkEjkLnsXtA@mail.gmail.com>
 <CAPTjJmq6QUcBgkNcn50VzyyHoDAEE1JLPgPU+segiEykcieVSw@mail.gmail.com>
 <CAJQBtgkWcDH-7c8xTF84bxfbkvOURTBd80A6JBkEKn-f6Xvnew@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
X-Trace: news.uni-berlin.de iYXmtaGixpgIznZyoW9NvAR9cEfKkxMcxzuR6VXgi2uA==
Cancel-Lock: sha1:lmvXxYbsQ2uA0rDxgIs19AmxrgE= sha256:MIzvQQysZhy5BeCROn/sTwH4OAEnSLm1J1Q1wLwNo0U=
Return-Path: <olegsivokon@gmail.com>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=pass
 reason="2048-bit key; unprotected key"
 header.d=gmail.com header.i=@gmail.com header.b=DzMx20wQ;
 dkim-adsp=pass; dkim-atps=neutral
X-Spam-Status: OK 0.011
X-Spam-Evidence: '*H*': 0.98; '*S*': 0.00; '(for': 0.05;
 'approximate': 0.05; 'else.': 0.07; 'subject:API': 0.07; 'cc:addr
 :python-list': 0.09; 'fact,': 0.09; 'language,': 0.09;
 'reference:': 0.09; 'cc:no real name:2**0': 0.14; 'alphabet':
 0.16; 'encounter': 0.16; 'examples,': 0.16; 'languages.': 0.16;
 'length.': 0.16; 'mastered': 0.16; 'overlooked': 0.16; 'packets':
 0.16; 'subject,': 0.16; 'subject:Help': 0.17; "can't": 0.17;
 'cc:addr:python.org': 0.20; 'language': 0.21; 'written': 0.22;
 'languages': 0.22; 'saying': 0.25; 'cc:2**0': 0.25; 'seems': 0.26;
 'bit': 0.27; 'sense': 0.28; 'seem': 0.31; 'think': 0.32;
 'question': 0.32; 'language.': 0.32; 'validate': 0.32; 'message-
 id:@mail.gmail.com': 0.32; 'subject:for': 0.33; 'hold': 0.33;
 "didn't": 0.34; 'header:In-Reply-To:1': 0.34;
 'received:google.com': 0.34; 'one.': 0.35; 'words': 0.35;
 'from:addr:gmail.com': 0.35; 'subject:from': 0.37; "it's": 0.37;
 'students': 0.38; 'way': 0.38; 'enough': 0.39; 'use': 0.39;
 'still': 0.40; 'something': 0.40; 'should': 0.40; 'tell': 0.60;
 "there's": 0.61; 'come': 0.62; 'between': 0.63; 'about.': 0.64;
 'your': 0.64; 'discussing': 0.69; 'interesting': 0.71;
 'subject:Data': 0.71; 'future': 0.72; 'little': 0.73; 'follow-up':
 0.84; 'characters': 0.84; 'subject: \n ': 0.84; 'truth': 0.86;
 'implied': 0.93
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=gmail.com; s=20230601; t=1727909302; x=1728514102; darn=python.org;
 h=cc:to:subject:message-id:date:from:in-reply-to:references
 :mime-version:from:to:cc:subject:date:message-id:reply-to;
 bh=JOGLHMkOiHEq980pd1EEhj8hY0Z1mfOwsXI6+XSrzaU=;
 b=DzMx20wQoRV3eIfJBHDWkN7Lbv7phebUttYsxp1kLc4yTH/jMjskiEpUUqH+zVs90B
 iirW1kQPSPYYfPvz2mOxl/nnUOMa88HcPnAg7qmZdWiD4SqbBvev5YaAUtvTmCs3BDOl
 KyVUcGRoYIJIH20nOt06GKBM7WdCiGG/fgtScN8F7mB5uz/SoaoMvH5OPkXCrBp6qrY8
 APTK1a7PbINwHaRekBzaR82N0ZYYyrpguSxo4RSUtmSX8bFZZfdm0b8BHdai+81eXMfv
 DOp1xiG2f15qcCyJQLWUegA9oNBng0B6zPWAxBUPHaJdRNN4PSKJLCKvkD4iG48jD9Ew
 pXWg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20230601; t=1727909302; x=1728514102;
 h=cc:to:subject:message-id:date:from:in-reply-to:references
 :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id
 :reply-to;
 bh=JOGLHMkOiHEq980pd1EEhj8hY0Z1mfOwsXI6+XSrzaU=;
 b=ddfQIPkqcMKHM95eyUBwRTFssUfqpFVtS3bYoXPE9niMQz8vPTTW+qffp5Pk1XpkbA
 wZ02xCzccpRQxF3DOacAMLGffTYFfsqbGjVnYMvwXBglnvrbp1gT0sood3Fso+4F4yc1
 l684SDSn2nIhxNjjSfgfhguFEd+07qvJvtALuOigS2nGyOW7CeZB9zs2Z92V7n10O9o/
 nnFRqdu0s4M0YjB8Ft3k/rsIrij59kbmmC9YGgOZzQVTwb7C48oCxu2LHyOutU5bP2/o
 3Uz2Deu61JG4vlJKTouN2+vefyOm44v1l5wEjDE3wsSghxA0nXhhiLvm07PnBaBZveNy
 L7Vw==
X-Gm-Message-State: AOJu0YzXF3mmjpC9EX9M6KTIRDmpsURmn8yjAN0zVp0TzVgYJzcm9vYM
 +LOS9mTUGC0AMT8CsGd+Ntpf0glu4cpfWuZLltz0b4pc/qAvrL0piRHoMlS1y2jiOtiy+UBWecB
 QtO0hRqx92oe4KGdIo50mek4NedY=
X-Google-Smtp-Source: AGHT+IELldilDFZIaInwPHyTQmJIUQCRj8KsZYxm+GLhiJ6QFNZvPyYFwU1Aj3BTCMl9ETwuRF4MJwUR1luFSInNWVo=
X-Received: by 2002:a05:6214:4521:b0:6cb:3925:ec95 with SMTP id
 6a1803df08f44-6cb81bb4d5emr65008466d6.53.1727909301826; Wed, 02 Oct 2024
 15:48:21 -0700 (PDT)
In-Reply-To: <CAPTjJmq6QUcBgkNcn50VzyyHoDAEE1JLPgPU+segiEykcieVSw@mail.gmail.com>
X-Mailman-Approved-At: Wed, 02 Oct 2024 21:56:13 -0400
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
 <python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
 <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
 <mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <CAJQBtgkWcDH-7c8xTF84bxfbkvOURTBd80A6JBkEKn-f6Xvnew@mail.gmail.com>
X-Mailman-Original-References: <CADrxXXmHUwsQbWqNrwzyKWLyTK0J3Hf0z8hAhGwKYoF2PwK7QA@mail.gmail.com>
 <082705B5-7C14-4D33-BF38-73F9CB166293@barrys-emacs.org>
 <9dfcd123-c31d-4207-869c-d5466487cba4@tompassin.net>
 <CAJQBtgkLVyNK+vw4u3bFCFEQDH8T3rpyTL+ERyyYHZJskQR6PQ@mail.gmail.com>
 <CAJQBtgnpNkpg-mF2yFCS4P4GYAYsKQ9nEw3Xygja=SE3-=N2Dw@mail.gmail.com>
 <mailman.19.1727796506.3018.python-list@python.org>
 <lm391bFu38hU1@mid.individual.net>
 <CAJQBtgmZehSeBu0y73ALdVq00LHi-R_KKS893FwJkEjkLnsXtA@mail.gmail.com>
 <CAPTjJmq6QUcBgkNcn50VzyyHoDAEE1JLPgPU+segiEykcieVSw@mail.gmail.com>
Bytes: 8180

> You can't validate an IP packet without having all of it. Your notion
> of "streaming" is nonsensical.

Whoa, whoa, hold your horses! "nonsensical" needs a little bit of
justification :)

It seems you don't understand the difference between words and
languages! In my examples, IP _protocol_ is the language, sequences of
IP packets are the words in the language. A language is amenable to
streaming if the words of the language are repetition of sequences of
symbols of the alphabet of fixed length.  This is, essentially, like
saying that the words themselves are regular.

So, the follow-up question from you to me should be: how come strictly
context-free languages can still be parsed with streaming parsers? --
And the answer to that is it's possible to approximate context-free
languages with regular languages.  In fact, this is a very interesting
subject, which unfortunately is usually overlooked in automata
classes.  It's interesting in a sense that it's very accessible to the
students who already mastered the understanding of regular and
context-free formalisms.

So, streaming parsers (eg. SAX) are written for a regular language
that approximates XML.  This is because in practice we will almost
never encounter more than N nesting levels in an XML, more than N
characters in an element name etc. (for some large enough N).
Something which allows us to create a regular language from a
context-free one.

NB. "Nonsensical" has a very precise meaning, when it comes to
discussing the truth value of a proposition, which I think you also
somehow didn't know about.  You seem to use "nonsensical" as a synonym
to "wrong".  But, unbeknownst to you, you said something else.  You
actually implied that there's no way to tell if my notion of streaming
is correct or not.

But, for the future reference: my notion of streaming is correct, and
you would do better learning some materials about it before jumping to
conclusions.