Code Comments
Programming Forum and web based access to our favorite programming groups.What is the advantage of parsers that LALR that is greater than 1? I have a grammar that requires more than one token of look ahead, is there any way that it could be solved using yacc or Bison? Thanks [Some grammars are easier to express with more than one token of lookahead. You can rewrite gramars to LR(1), but sometimes at the cost of huge and ugly bloat. -John]
Post Follow-up to this message> [Some grammars are easier to express with more than one token of lookahead. > You can rewrite gramars to LR(1), but sometimes at the cost of huge and > ugly bloat. -John] Didn't Knuth prove that any LR(k) grammar can be rewritten to LR(1), albeit at a potential exponetial increase in the parse tables (number of distinct parse items). However, does this extend to an LALR(k) conversion to LALR(1)? > I have a grammar that requires more than one token of look ahead, is > there any way that it could be solved using yacc or Bison? I do have a suggestion here. I typically see compilers/interpreters make some syntactic concessions and offload extra checking to the semantic checker. The following page: http://java.sun.com/docs/books/jls/...tml/19.doc.html contains a neat explination of some of the hoops the designers went through to make the grammar LALR(1). It contains a list of five problems and shows the solution chosen. While the grammar for Java is probably irrelevant, the heuristics to convert an LALR(k) grammar to LALR(1) might prove useful to you. I certainly enjoyed the read. Regards, - Tim
Post Follow-up to this messageJohn Levine, news:comp.compilers moderator said: "Some grammars are easier to express with more than one token of lookahead. You can rewrite gramars to LR(1), [..]" At the risk of being so bold as contradicting some helpful advice from the expert moderator without actually adding anything useful myself I dare allege ... some languages are easier to express with more than one token of lookahead (there is more than one grammar for one language, but if you take a grammar and make alterations which still express the same language then you have a different grammar). Regards, Colin Paul Gloster [You're right, same language, different grammar. -John]
Post Follow-up to this message"Tim Bauer" <tbauer@cadrc.calpoly.edu> writes: > > Didn't Knuth prove that any LR(k) grammar can be rewritten to LR(1), albei t > at a potential exponetial increase in the parse tables (number of distinct > parse items). > However, does this extend to an LALR(k) conversion to LALR(1)? Every language for which a LR(1) grammar LR(1) exists has also an LALR(1) grammar. (Search for the archive of this group, I started a thread on the subject). > > I do have a suggestion here. I typically see compilers/interpreters > make some syntactic concessions and offload extra checking to the > semantic checker. The major raison I do this is that it is easier to give good error message and have some good error recovery. Yours, -- Jean-Marc
Post Follow-up to this messageJean-Marc Bourguet <jm@bourguet.org> wrote in message news:<04-08-073@comp.compilers>...[co lor=darkred] > "Tim Bauer" <tbauer@cadrc.calpoly.edu> writes: > >[/color] You reduce from LR(k) to LR(1), and then the LR(1) grammar is handled by the LALR(1) parser generator. > Every language for which a LR(1) grammar LR(1) exists has also an > LALR(1) grammar. (Search for the archive of this group, I started a > thread on the subject). > The BNF remains the same as in LR(1), but the number of parser states is reduced in an LALR(1) parser. Either Im making a mistake or the text above is misleading to indicate that the grammar needs to be changed when moving from Lr(1) to LALR(1). [The moderator may want to filter out erroneous statements]. By definition -no. Yacc [or its gnu equivalent Bison] does not support more than one lookahead. Feeding such a grammar to it will result in a reduce-reduce conflict. You need to modify your grammar to reduce those conflicts and yet achieve the purpose. regards -kamal
Post Follow-up to this messageOn 2004-08-16, Kamal R. Pra<kamalp@acm.org> wrote: > > By definition -no. Yacc [or its gnu equivalent Bison] does not support > more than one lookahead. Newer versions of Bison have support for GLR parsing, which can parse any context-free grammar. -Clint
Post Follow-up to this messageOr try Elkhound - http://www.cs.berkeley.edu/~smcpeak/elkhound/ Clint Olsen wrote: > > On 2004-08-16, Kamal R. Pra<kamalp@acm.org> wrote: > > Newer versions of Bison have support for GLR parsing, which can parse any > context-free grammar.
Post Follow-up to this messageKamal R. Prawrote: > Jean-Marc Bourguet <jm@bourguet.org> wrote > > > The BNF remains the same as in LR(1), but the number of parser states > is reduced in an LALR(1) parser. Either Im making a mistake or the > text above is misleading to indicate that the grammar needs to be > changed when moving from Lr(1) to LALR(1). > [The moderator may want to filter out erroneous statements]. The class of LALR(1) grammars is a proper subset of the class of LR(1) grammars, so yes, once you have obtained a LR(1) grammar for your language, you might have to modify your grammar a bit further to make it LALR(1). This generally involves introducing new rules to avoid some state merges done by the LR(0) automaton which underlies the LALR(1) parser. Regards, -- Sylvain
Post Follow-up to this messagekamalp@acm.org (Kamal R. Pra) wrote: > The author states that he wrote the GLR parser generator solely to > handle C++ language spec [and someone lapped it up to handle Java]. > What exactly is it about OO languages that an LALR(1) parser cannot > handle? It's nothing to do with OO languages; it's to do with the trainwreck that is C++ syntax. Lots of other OO languages can be parsed with much less trouble. Sean Case -- Sean Case gsc@zip.com.au Code is an illusion. Only assertions are real.
Post Follow-up to this message> The author states that he wrote the GLR parser generator solely to > handle C++ language spec [and someone lapped it up to handle Java]. > > What exactly is it about OO languages that an LALR(1) parser cannot > handle? As the moderator noted, there is nothing about "OO" languages that LALR(1) parsers cannot handle, but C++ itself is problematic. There are LALR(1) and LL grammars for Java. One of the problems with C++, is that expressions and declarations can look exactly the same (technically, any language containing those the or of the two productions is ambiguous) and C++ gets around that by saying, if it looks like a declaration, it is a declaration (forcing the "or" to be resolved in a particular declaration (and resolving the ambiguity). However, that resolution is not expressed gramatically, and one can not take two random context free rules and difference them and expect the result to be a context free language, which is what the C++ ambiguity resolution requires one to do. In contrast, GLR grammars are not required to be unambiguous. Any ambiguity is resolved by producing a resulting parse-forest that represents all the potential mabiguous choices and requiring a later "semantic" pass to choose which parse tree in the forst is the desired one. Thus, with a GLR parser, one can disambiguate the C++ problem by selecting the parse tree that treats all the ambiguous expression/ declaration sub-trees as declarations. The only problem with GLR as a technology is that are no "warnings" from the grammar processing tool that the language is ambiguous. Well, there are warnings that the language is not LR (or LALR) or whatever technology the GLR parser uses as a base. However, some of those grammars will actually not be ambiguous and some of the will be ambiguous. However, in any case, once your GLR generator has given a warning, one either must prove that the language actually isn't ambiguous or write your semantic phase assuming that the language is ambiguous and disambiguate the resulting forest. It is worth mentioning that there are other ways of handling ambiguous grammars. In particular, one can use predicates to resolve ambiguities. Predicates allow one to take the difference of two productions in a controlled manner. In particular, it is possible to write a syntactic rules that says, try to parse this as a declaration and if it isn't parse it as an expression. The difference between the predicated and the GLR solution is that predicated grammars are still deterministic. There are no hidden ambiguities in a predicated grammar. If your predicated parser generator gives you an error, you still have an unresolved ambiguity and if it doesn't the resulting parser will always construct a parse tree (and not a forest). I would be remiss if I also did not point out backtracking parsers, which are another solution to the problem. In fact, all the implementations of predicated parsers that I know of, use some form of backtracking in their implementation. General backtrakcing parsers share the characteristic with GLR parsers that they can parse ambiguous grammars. Backtracking parsers generally also produce a parse tree (although in theory they could also produce a forest). Backtracking parsers have their own deficits though. Many backtracking parsers will loop forever on some ambiguous grammars. (Predicated backtraking parsers do not generally have this problem, although they do not make the same linear time guarantees that pure LL and LR parsers do(see note)--of course, any parser generator that can handle a significant class of ambiguous must be inherently non-linear for some grammars, and GLR parsers have a cubic worst case, same as Earley parsers.) In addition, most backtracking parsers resolve ambiguities by selecting one parse tree out of the forest to return. This is generally done by the order of the rules in the grammar (which determines the order the rules are tried in in ambiguous cases). If one looks closely, this is very similar to using predicates "implicitly" in the grammar. The key difference being that the tool inserts the predicates rather than the user and does so without warning and usually without the run-time termination guarantees. I would like to mention that it is possible to build a predicated parser using GLR technology, although I don't know of anyone attempting to do so right now. From thought-experiments I have done considering whether to implement such a tool, it seems like there would be some advantages to building such a tool. Again, I do not want to imply that these are the only techniques for dealing with ambiguity. For example, Ralph Boland is pursing some generalization of LR technology that I gather will handle a wider class of languages and I don't think his technique is any of the above. Note: Bryan Ford recently published a paper on a "predicated" parsing technique that made extensive use of memoization and lazy evaluation to achieve (if I recall correctly) a linear time guarantee. His technique shares a characterisitic with general backtracking parsers in that the order of rules determines what is matched and the the entire tree is disambiguated that way. He uses an "ordered" or clause to implement this. Hope this helps, -Chris **************************************** ************************************ * Chris Clark Internet : compres@world.std.com Compiler Resources, Inc. Web Site : http://world.std.com/~compres 23 Bailey Rd voice : (508) 435-5016 Berlin, MA 01503 USA fax : (978) 838-0263 (24 hours) ---------------------------------------------------------------------------- --
Post Follow-up to this message
Show a Printable Version
Email This Page to Someone!
Receive updates to this thread
Powered by vBulletin
Copyright 2000-2006 Jelsoft Enterprises Limited.