Deutsch English Français Italiano |
<mailman.30.1727920574.3018.python-list@python.org> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!fu-berlin.de!uni-berlin.de!not-for-mail From: Left Right <olegsivokon@gmail.com> Newsgroups: comp.lang.python Subject: Re: Help with Streaming and Chunk Processing for Large JSON Data (60 GB) from Kenna API Date: Thu, 3 Oct 2024 00:48:10 +0200 Lines: 39 Message-ID: <mailman.30.1727920574.3018.python-list@python.org> References: <CADrxXXmHUwsQbWqNrwzyKWLyTK0J3Hf0z8hAhGwKYoF2PwK7QA@mail.gmail.com> <082705B5-7C14-4D33-BF38-73F9CB166293@barrys-emacs.org> <9dfcd123-c31d-4207-869c-d5466487cba4@tompassin.net> <CAJQBtgkLVyNK+vw4u3bFCFEQDH8T3rpyTL+ERyyYHZJskQR6PQ@mail.gmail.com> <CAJQBtgnpNkpg-mF2yFCS4P4GYAYsKQ9nEw3Xygja=SE3-=N2Dw@mail.gmail.com> <mailman.19.1727796506.3018.python-list@python.org> <lm391bFu38hU1@mid.individual.net> <CAJQBtgmZehSeBu0y73ALdVq00LHi-R_KKS893FwJkEjkLnsXtA@mail.gmail.com> <CAPTjJmq6QUcBgkNcn50VzyyHoDAEE1JLPgPU+segiEykcieVSw@mail.gmail.com> <CAJQBtgkWcDH-7c8xTF84bxfbkvOURTBd80A6JBkEKn-f6Xvnew@mail.gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" X-Trace: news.uni-berlin.de iYXmtaGixpgIznZyoW9NvAR9cEfKkxMcxzuR6VXgi2uA== Cancel-Lock: sha1:lmvXxYbsQ2uA0rDxgIs19AmxrgE= sha256:MIzvQQysZhy5BeCROn/sTwH4OAEnSLm1J1Q1wLwNo0U= Return-Path: <olegsivokon@gmail.com> X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org Authentication-Results: mail.python.org; dkim=pass reason="2048-bit key; unprotected key" header.d=gmail.com header.i=@gmail.com header.b=DzMx20wQ; dkim-adsp=pass; dkim-atps=neutral X-Spam-Status: OK 0.011 X-Spam-Evidence: '*H*': 0.98; '*S*': 0.00; '(for': 0.05; 'approximate': 0.05; 'else.': 0.07; 'subject:API': 0.07; 'cc:addr :python-list': 0.09; 'fact,': 0.09; 'language,': 0.09; 'reference:': 0.09; 'cc:no real name:2**0': 0.14; 'alphabet': 0.16; 'encounter': 0.16; 'examples,': 0.16; 'languages.': 0.16; 'length.': 0.16; 'mastered': 0.16; 'overlooked': 0.16; 'packets': 0.16; 'subject,': 0.16; 'subject:Help': 0.17; "can't": 0.17; 'cc:addr:python.org': 0.20; 'language': 0.21; 'written': 0.22; 'languages': 0.22; 'saying': 0.25; 'cc:2**0': 0.25; 'seems': 0.26; 'bit': 0.27; 'sense': 0.28; 'seem': 0.31; 'think': 0.32; 'question': 0.32; 'language.': 0.32; 'validate': 0.32; 'message- id:@mail.gmail.com': 0.32; 'subject:for': 0.33; 'hold': 0.33; "didn't": 0.34; 'header:In-Reply-To:1': 0.34; 'received:google.com': 0.34; 'one.': 0.35; 'words': 0.35; 'from:addr:gmail.com': 0.35; 'subject:from': 0.37; "it's": 0.37; 'students': 0.38; 'way': 0.38; 'enough': 0.39; 'use': 0.39; 'still': 0.40; 'something': 0.40; 'should': 0.40; 'tell': 0.60; "there's": 0.61; 'come': 0.62; 'between': 0.63; 'about.': 0.64; 'your': 0.64; 'discussing': 0.69; 'interesting': 0.71; 'subject:Data': 0.71; 'future': 0.72; 'little': 0.73; 'follow-up': 0.84; 'characters': 0.84; 'subject: \n ': 0.84; 'truth': 0.86; 'implied': 0.93 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1727909302; x=1728514102; darn=python.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=JOGLHMkOiHEq980pd1EEhj8hY0Z1mfOwsXI6+XSrzaU=; b=DzMx20wQoRV3eIfJBHDWkN7Lbv7phebUttYsxp1kLc4yTH/jMjskiEpUUqH+zVs90B iirW1kQPSPYYfPvz2mOxl/nnUOMa88HcPnAg7qmZdWiD4SqbBvev5YaAUtvTmCs3BDOl KyVUcGRoYIJIH20nOt06GKBM7WdCiGG/fgtScN8F7mB5uz/SoaoMvH5OPkXCrBp6qrY8 APTK1a7PbINwHaRekBzaR82N0ZYYyrpguSxo4RSUtmSX8bFZZfdm0b8BHdai+81eXMfv DOp1xiG2f15qcCyJQLWUegA9oNBng0B6zPWAxBUPHaJdRNN4PSKJLCKvkD4iG48jD9Ew pXWg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727909302; x=1728514102; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=JOGLHMkOiHEq980pd1EEhj8hY0Z1mfOwsXI6+XSrzaU=; b=ddfQIPkqcMKHM95eyUBwRTFssUfqpFVtS3bYoXPE9niMQz8vPTTW+qffp5Pk1XpkbA wZ02xCzccpRQxF3DOacAMLGffTYFfsqbGjVnYMvwXBglnvrbp1gT0sood3Fso+4F4yc1 l684SDSn2nIhxNjjSfgfhguFEd+07qvJvtALuOigS2nGyOW7CeZB9zs2Z92V7n10O9o/ nnFRqdu0s4M0YjB8Ft3k/rsIrij59kbmmC9YGgOZzQVTwb7C48oCxu2LHyOutU5bP2/o 3Uz2Deu61JG4vlJKTouN2+vefyOm44v1l5wEjDE3wsSghxA0nXhhiLvm07PnBaBZveNy L7Vw== X-Gm-Message-State: AOJu0YzXF3mmjpC9EX9M6KTIRDmpsURmn8yjAN0zVp0TzVgYJzcm9vYM +LOS9mTUGC0AMT8CsGd+Ntpf0glu4cpfWuZLltz0b4pc/qAvrL0piRHoMlS1y2jiOtiy+UBWecB QtO0hRqx92oe4KGdIo50mek4NedY= X-Google-Smtp-Source: AGHT+IELldilDFZIaInwPHyTQmJIUQCRj8KsZYxm+GLhiJ6QFNZvPyYFwU1Aj3BTCMl9ETwuRF4MJwUR1luFSInNWVo= X-Received: by 2002:a05:6214:4521:b0:6cb:3925:ec95 with SMTP id 6a1803df08f44-6cb81bb4d5emr65008466d6.53.1727909301826; Wed, 02 Oct 2024 15:48:21 -0700 (PDT) In-Reply-To: <CAPTjJmq6QUcBgkNcn50VzyyHoDAEE1JLPgPU+segiEykcieVSw@mail.gmail.com> X-Mailman-Approved-At: Wed, 02 Oct 2024 21:56:13 -0400 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.39 Precedence: list List-Id: General discussion list for the Python programming language <python-list.python.org> List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> List-Archive: <https://mail.python.org/pipermail/python-list/> List-Post: <mailto:python-list@python.org> List-Help: <mailto:python-list-request@python.org?subject=help> List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe> X-Mailman-Original-Message-ID: <CAJQBtgkWcDH-7c8xTF84bxfbkvOURTBd80A6JBkEKn-f6Xvnew@mail.gmail.com> X-Mailman-Original-References: <CADrxXXmHUwsQbWqNrwzyKWLyTK0J3Hf0z8hAhGwKYoF2PwK7QA@mail.gmail.com> <082705B5-7C14-4D33-BF38-73F9CB166293@barrys-emacs.org> <9dfcd123-c31d-4207-869c-d5466487cba4@tompassin.net> <CAJQBtgkLVyNK+vw4u3bFCFEQDH8T3rpyTL+ERyyYHZJskQR6PQ@mail.gmail.com> <CAJQBtgnpNkpg-mF2yFCS4P4GYAYsKQ9nEw3Xygja=SE3-=N2Dw@mail.gmail.com> <mailman.19.1727796506.3018.python-list@python.org> <lm391bFu38hU1@mid.individual.net> <CAJQBtgmZehSeBu0y73ALdVq00LHi-R_KKS893FwJkEjkLnsXtA@mail.gmail.com> <CAPTjJmq6QUcBgkNcn50VzyyHoDAEE1JLPgPU+segiEykcieVSw@mail.gmail.com> Bytes: 8180 > You can't validate an IP packet without having all of it. Your notion > of "streaming" is nonsensical. Whoa, whoa, hold your horses! "nonsensical" needs a little bit of justification :) It seems you don't understand the difference between words and languages! In my examples, IP _protocol_ is the language, sequences of IP packets are the words in the language. A language is amenable to streaming if the words of the language are repetition of sequences of symbols of the alphabet of fixed length. This is, essentially, like saying that the words themselves are regular. So, the follow-up question from you to me should be: how come strictly context-free languages can still be parsed with streaming parsers? -- And the answer to that is it's possible to approximate context-free languages with regular languages. In fact, this is a very interesting subject, which unfortunately is usually overlooked in automata classes. It's interesting in a sense that it's very accessible to the students who already mastered the understanding of regular and context-free formalisms. So, streaming parsers (eg. SAX) are written for a regular language that approximates XML. This is because in practice we will almost never encounter more than N nesting levels in an XML, more than N characters in an element name etc. (for some large enough N). Something which allows us to create a regular language from a context-free one. NB. "Nonsensical" has a very precise meaning, when it comes to discussing the truth value of a proposition, which I think you also somehow didn't know about. You seem to use "nonsensical" as a synonym to "wrong". But, unbeknownst to you, you said something else. You actually implied that there's no way to tell if my notion of streaming is correct or not. But, for the future reference: my notion of streaming is correct, and you would do better learning some materials about it before jumping to conclusions.