Home > Archive > Compilers > July 2006 > Edit distance between two unlabeled parse trees
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Edit distance between two unlabeled parse trees
|
|
| alpanad@gmail.com 2006-07-18, 7:03 pm |
| Hi,
Can anyone shed some light on the problem of finding out the distance
between two unlabeled parse trees (these trees are built by parsing a
sentence with two different grammars). Ths problem can be very well
mapped to the problem of finding out the edit distance between the
paranthesis structures imposed by the parse tree on the input
sentence.
For example suppose we have two trees as follows
A A
/ \ / | \
A A A A A
/ \ / \ / /\ \
t1 t2 t3 t4 t1 t2 t3 t4
The parenthesis structures imposed from above two tress are string1:
((t1 t2) (t3 t4)) and
string2: ( t1 (t2 t3) t4). In order to convert the string2 to string1
we need at least three operations; i.e. a deletion of a pair of
parenthesis (t2 t3) and addition of two pairs of parenthesis (t1 t2)
and (t3 t4).
Any good pointer which discusses the computaion of edit distance
between two parenthesis structures to solve the above problem will be
of much help.
Thanks in advance,
Alpana
| |
| Hans Aberg 2006-07-19, 7:02 pm |
| alpanad@gmail.com wrote:
> Can anyone shed some light on the problem of finding out the distance
> between two unlabeled parse trees (these trees are built by parsing a
> sentence with two different grammars). Ths problem can be very well
> mapped to the problem of finding out the edit distance between the
> paranthesis structures imposed by the parse tree on the input
> sentence.
> Any good pointer which discusses the computaion of edit distance
> between two parenthesis structures to solve the above problem will be
> of much help.
It is not clear to me exactly what kind of edit changes you want to admit.
For example, you seem to admit operators of different arity to be
equivalent.
But one idea: rewrite the strings into Lukasiewicz or RPN notation, which
do not require parenthesizes, and then apply some kind of Levenshtein
distance. See
http://en.wikipedia.org/wiki/Jan_Lukasiewicz
http://en.wikipedia.org/wiki/Polish_notation
http://en.wikipedia.org/wiki/Reverse_Polish_notation
http://en.wikipedia.org/wiki/Levenshtein_distance
--
Hans Aberg
|
|
|
|
|