For Programmers: Free Programming Magazines  


Home > Archive > Compilers > May 2005 > Earley parser









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Earley parser
Vladimir

2005-05-03, 4:01 pm

On 2005.05.01.21.43, arthur wrote:
| I have implemented a Earley's recognizer, now I am looking for any
| references to build a parse tree from it. I looked at the Aho/Ulman
| algorithm and the Grune/Jacob description. Does anybody have any
| others algorithms or references?

Hi Arthur,

Not so long ago I've introduced new algorithm, which was named as
Earley's Algorithm Adapted to build parse trees. The main problem with
Earley's algorithm is it does not correct build parse trees if the
input grammar is ambigous. That's the problem and answer why Earley
didn't provide parsing algorithm, only recognizing one. The approach
to build parse tree is following: we are extending Eraley's item,
which consists of three components [A-->alpha*beta, i, omega] where
A-->alpha*beta - input grammar's rule, i - number of Earley's state
(list) where this item was burn and omega - lookahead string. As it
was expained long before (in 70th) lookahead string is not needed here
because it only improves Predictor operation and only in few cases.
So, drop this and let Earley's item will consists of thow components
[A-->alpha*beta, i]. Now add two extended components to Earley's item:

1. The reference to item, which was the reason the current item was
added to this Earley's state. This is the item, which was the
parameter of Completer operation (I'll explain this later in details).
It is named as low-reference.

2. The reference to item within the same components but where the
point in the rule A-->alpha*beta is one simbol left (if we have item
[A-->ab*c,2], this is item [A-->a*bc,2]). It is clear the reference is
bull if the point is in the beginning of the rule. It is named as
left-reference.

So, here is the example. The grammar is
E-->E+n
E-->n
E - nonterminal, n, + - terminals.
The input string is "n".

The parsing is:
List 0:
1.[E-->*E+n,0,null,null]
2.[E-->*n,0,null,null]

List 1:
1.[E-->n*,0,null,0.2] (0 is the list's number, 2 is the nu,ber of
item).

So, we have parse tree:
E
|
n

But, such the references keeping may be incorrect if the grammar have
the rules with empty right part. I've intended keep the list of
low-references instead the single one. And, if there will be ambigous
parsing the list of low-references will reflect this in correct way.
So, the item now is [A-->alpha*beta,i,<low-ref>,left-ref]. Every time
Completer operation runs on item I, if it adds a new item I1 the
low-ref to I1 is added to the list of low-refs in item I.

This algorithm was a part of my PhD. It is in russian. If you know
this I can send you the text within the algorithm description and
correctness prove as well.

Vladimir.
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com