Path: ...!2.eu.feeder.erje.net!feeder.erje.net!newsfeed.bofh.team!paganini.bofh.team!not-for-mail From: Waldek Hebisch Newsgroups: comp.lang.c Subject: Re: Top 10 most common hard skills listed on resumes... Date: Sat, 7 Sep 2024 03:13:46 -0000 (UTC) Organization: To protect and to server Message-ID: References: <871q27weeh.fsf@bsb.me.uk> <20240829083200.195@kylheku.com> <87v7zjuyd8.fsf@bsb.me.uk> <20240829084851.962@kylheku.com> <87mskvuxe9.fsf@bsb.me.uk> <875xrivrg0.fsf@bsb.me.uk> <20240829191404.887@kylheku.com> <86cylqw2f8.fsf@linuxsc.com> <871q2568vl.fsf@nosuchdomain.example.com> <87cylo494u.fsf@nosuchdomain.example.com> Injection-Date: Sat, 7 Sep 2024 03:13:46 -0000 (UTC) Injection-Info: paganini.bofh.team; logging-data="1973506"; posting-host="WwiNTD3IIceGeoS5hCc4+A.user.paganini.bofh.team"; mail-complaints-to="usenet@bofh.team"; posting-account="9dIQLXBM7WM9KzA+yjdR4A"; User-Agent: tin/2.6.2-20221225 ("Pittyvaich") (Linux/6.1.0-9-amd64 (x86_64)) X-Notice: Filtered by postfilter v. 0.9.3 Bytes: 5374 Lines: 76 Bart wrote: > > I can also say that the C grammar is buggy: > > assignment-expression: > conditional-expression > unary-expression asssignment-operator assignment-expression > > When attempting to parse an assignment-expression, do you go for a > conditional-expression or unary-expression? Both. The grammar clearly is not LL. In C90 grammar there was one issue which meant that grammar _alone_ could be not used for parsing: at one (or maybe two) places after recognizing a token parser had to look up in symbol table and possibly transform the token based on previous declarations. If you did that, then classic LR(1) parser would work. I think that changing the grammar to do the transformation by grammar rules alone changed grammar so that it was not longer LR(1). I think that after change it was LR(2), but I may be remembering things incorrectly. For _defining_ language syntax it is enough to have unabigious grammar which is much easier than LR(2). Or standard could give clear rules saying which of parse trees produced by ambigious grammar is the right one (AFAIK that is what C++ is doing). Anyway, any troubles in C grammar are _not_ due to part above, that part is handled fine by LALR(1) tools (LALR(1) is somewhat hard to define, but is subest of LR(1) which is easier to implement). BTW: LL(k) requires ability to decide which rule will apply after seeing k symbols after start of a construct. Clearly for nontrival grammar one needs at least one symbol, so LL(1) is easiet to parse. LR(k) requires ability to decide which rule will apply after seeing the whole construct and at most k following symbols. So, LR parser keeps reading symbols as long as it can not recognize any construct (in other words considers all alteratives as possible). In the process it collects information. Once it has enough information it replaces part of previously read input by left hand side of a rule (this is called reduction). After one reduction there may be several one, several constricts may and in the same place. Parser starts which smallest recognized part, than proceeds to bigger part. Usually LR parsers keep input and effects of reduction on a stack. Reading symbol puts it on the stack (this is called shift), reduction removes bunch of items from the stack and replaces them by a single (different item). At first glance control, that is deciding between shift and reduction and in case of reduction deciding which rule to use, may look complicated. But ignoring control, this is simple and quite efficient. In sixties it was discovered that in pinciple parser control may also be quite simple, it can be done by a finite automaton. In other word, there is state which contains collected information and tells you what to do (shift or reduction), you get new state by looking into table indexed by current state and read symbols (currenly considered symbol and lookahead symbols). At first there was doubt that needed tables may be too big and make this inpractical, but in seventies it was discoverd that for typical grammars of programming languages one can use resonably small tables. And there were tricks with compressing tables. So practically one get few, maybe tens kilobytes of tables. Access to compressed tables needs extra instructions and LR grammars tend to produce more reductions than LL ones, so LR parses tend to be slightly slower than LL ones: people got 500000 lines per minute for LR parsers and 900000 for LL paser on machines about 1000 slower than modern PC. Evan at that time speed difference in paser was not a deal breaker and LR parsing was preffered. BTW2: Before C90 there was doubt if sensible grammar for C is possible (earler grammar had serious prblems). But during standared process is was discovered that quite resonable grammar works. Current grammar is a developement of C90 grammar. And problem like you mention were understood long ago. -- Waldek Hebisch