| Deutsch English Français Italiano |
|
<parsing-20250608013417@ram.dialup.fu-berlin.de> View for Bookmarking (what is this?) Look up another Usenet article |
Path: ...!news.roellig-ltd.de!open-news-network.org!weretis.net!feeder8.news.weretis.net!fu-berlin.de!uni-berlin.de!not-for-mail
From: ram@zedat.fu-berlin.de (Stefan Ram)
Newsgroups: comp.misc
Subject: Wrong ideas about chatbots
Date: 8 Jun 2025 00:39:21 GMT
Organization: Stefan Ram
Lines: 241
Expires: 1 Jun 2026 11:59:58 GMT
Message-ID: <parsing-20250608013417@ram.dialup.fu-berlin.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Trace: news.uni-berlin.de +2sUOXY+kpLJlgMvFL8n0wcllNctxRrxlvsQ6j/Xzy4wlB
Cancel-Lock: sha1:qxqbt9tJU6/P3n+sEyMaj96QzNA= sha256:6RPW+O4/gr5BxhVevTD1+rPvoIyobrx6Kp8CllxMKRc=
X-Copyright: (C) Copyright 2025 Stefan Ram. All rights reserved.
Distribution through any means other than regular usenet
channels is forbidden. It is forbidden to publish this
article in the Web, to change URIs of this article into links,
and to transfer the body without this notice, but quotations
of parts in other Usenet posts are allowed.
X-No-Archive: Yes
Archive: no
X-No-Archive-Readme: "X-No-Archive" is set, because this prevents some
services to mirror the article in the web. But the article may
be kept on a Usenet archive server with only NNTP access.
X-No-Html: yes
Content-Language: en-US
Bytes: 10100
Ben Collver <bencollver@tilde.pink> wrote or quoted:
| you'd start with an enormous quantity of text, then do a lot
|of computationally-intense statistical analysis to map out which
|words and phrases are most likely to appear near to one another.
I had already explained why that was off, but let me give you
all a recent example from a recent chat of mine with a chatbot.
I asked the chatbot to write a program for left-associative parsing
of English. He must have mixed up my "left-associative" with the
more common "left-recursive" or figured I just said it wrong.
He clearly did not know about this specific left-associative
parsing method for natural languages. Even after I gave him
the exact source, he still did not get it right.
Then I laid it out for him [1]. After that, he wrote a program [2]
for left-associative parsing of English. Here is what he produced:
[3]. I also asked him for an explanation of the approach to laymen,
so if you want to learn more about it, see [4].
So tell me, how is real understanding like this supposed
to happen if chatbots just work based on "a statistical
analysis of which words often show up together"?
[1] How I explained it to him, after his first program was
not what I wanted
Maybe you're mislead applying standard terms to the special
NEWCAT approach.
It starts by an empty object and then always appends the next
word until the end of text is reached. No big recursion there.
We get: empty + "the"
Now, we need a grammar rule to see if that combination is
possible. For this purpose each of "empty" and "the" have
a category which is a data structure with their attributes,
and the grammar rule than checks whether two things with these
categories can be combined and if so, it creates the new
sentence start "empty + 'the'" with a new category given by
the rule.
Then we try to add "cat" to the sentence start. So the sentence
is build left-to-right, that's what's "left-associative" about it.
Finally, we add the "." after "fish" and then we may have
a complete sentence if everything was allowed by the rules.
This parsing is trivial. The crucial thing is to write the
categories and the rules so that all continuations of a
sentence start that are legal in English are allowed by the
rules and all others are rejected.
[2] The parser for a tiny subset of English he wrote then,
according to my above explanation
import builtins
import sys
import time
# Toy lexicon
LEXICON = {
'the': ('DET', {}),
'cat': ('N', {'number': 'sg'}),
'cats': ('N', {'number': 'pl'}),
'dog': ('N', {'number': 'sg'}),
'dogs': ('N', {'number': 'pl'}),
'and': ('CONJ', {}),
'eats': ('V', {'number': 'sg'}),
'eat': ('V', {'number': 'pl'}),
}
def get_category(word):
if word not in LEXICON:
raise ValueError(f"Unknown word: {word}")
return LEXICON[word]
def combine(state, next_token):
cat, features = next_token
print(f" [combine] State: {state}, Next: ({cat}, {features})")
# If state is None, we're at the start
if state is None:
if cat == 'DET':
print(" [combine] Start DET")
return ('subj', {'number': None, 'is_complete': False, 'needs_conj': False, 'pending_det': True})
elif cat == 'N':
print(" [combine] Start bare N (fail)")
return None
else:
print(" [combine] Start fail")
return None
# If we're building a subject
if state[0] == 'subj':
subj = state[1]
# --- Coordination context: CONJ + DET + N ---
if subj.get('needs_conj') and subj.get('pending_det') and cat == 'N':
print(" [combine] CONJ + DET + N -> coordinated NP (plural)")
return ('subj', {'number': 'pl', 'is_complete': True, 'needs_conj': False, 'pending_det': False})
# --- Plain DET + N (not in coordination) ---
if subj.get('pending_det') and cat == 'N':
print(" [combine] DET + N -> NP")
return ('subj', {'number': features['number'], 'is_complete': True, 'needs_conj': False, 'pending_det': False})
# --- NP + CONJ ---
if subj.get('is_complete') and cat == 'CONJ':
print(" [combine] NP + CONJ")
return ('subj', {'number': subj['number'], 'is_complete': False, 'needs_conj': True, 'pending_det': False})
# --- CONJ + DET ---
if subj.get('needs_conj') and cat == 'DET':
print(" [combine] CONJ + DET")
return ('subj', {'number': None, 'is_complete': False, 'needs_conj': True, 'pending_det': True, 'prev_number': subj['number']})
# --- NP + V (agreement check) ---
if subj.get('is_complete') and cat == 'V':
print(f" [combine] NP + V: subj number {subj['number']}, verb number {features['number']}")
if subj['number'] == features['number']:
print(" [combine] Agreement OK")
return ('sent', {'number': subj['number']})
else:
print(" [combine] Agreement FAIL")
return None
print(" [combine] No rule applies (fail)")
return None
# If we already have a sentence, nothing more allowed
if state[0] == 'sent':
print(" [combine] Already sent (fail)")
return None
print(" [combine] No rule applies (fail)")
return None
def parse(sentence):
words = sentence.lower().split()
state = None
print(f"\nParsing: {sentence}")
for word in words:
print(f" State: {state}, Next: {word}")
cat = get_category(word)
state = combine(state, cat)
if state is None:
print(f" Failed at: {word}")
return False
print(f" Final state: {state}")
return state is not None and state[0] == 'sent'
# Test sentences
sentences = [
"the cat eats", # ✔️ singular
"the cats eat", # ✔️ plural
"the cat and the dog eat", # ✔️ coordination (plural)
"the cat and the dogs eat", # ✔️ coordination (plural)
"the cat and the dog eats", # ❌ subject-verb disagreement
"the cats eats", # ❌ subject-verb disagreement
"the cat and eat", # ❌ incomplete NP
"cat eats", # ❌ missing DET
"the cat and the", # ❌ incomplete NP
]
for s in sentences:
result = parse(s)
print(f"Result: {'ACCEPT' if result else 'REJECT'}")
[3] the abridged output of the parser
Parsing: the cat eats
Result: ACCEPT
Parsing: the cats eat
Result: ACCEPT
Parsing: the cat and the dog eat
Result: ACCEPT
========== REMAINDER OF ARTICLE TRUNCATED ==========