For Programmers: Free Programming Magazines  


Home > Archive > Compilers > May 2005 > A Compiler for Natural Language (transator that translates from natural language to C









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author A Compiler for Natural Language (transator that translates from natural language to C
DeltaOne

2005-05-16, 4:00 pm

Hi,

I got very good response for the C++ intermediate representation. And
thanks to all the experts. Now i need some help for one project I am
doing as a part of my study. Well it may sound a bit off track but the
idea is to desing a compiler that learns language like human learn. I
am explaining my idea little bit. This compiler is a natural language
system That is near to an expert system. The aim of the system is to
convert natural language statements into C++ (or any other
language,but we need to feed that language structure into this
application). We train the compiler to the destination language to
which it should compile. The input natural language will be a
restricted NL. It will be a algorithmic language. Well its not
possible to give the full design here. But the main aim of the
language is to get a uniform interface for all kind of languages. The
compiler learns the language to which it should translate and does the
job. I am as of now planning to create and design a system that
translates natural language to C++ form.

Any help on this topic and ideas form all of the experts out there
will be of great help for me. My first priority is to tranlate NL to
C++ and I dont want any optimisation now.

If any one wants more of information of the design also i will provide.

~Thanks and Regards
Pappun (monstersinc@rediffmail.com)
[This strikes me as one of those initially appealing bad ideas that's
failed many times before, so I would start by looking at previous
efforts. If you restrict your natural language to something simple
enough to be translated mechanically into computer code, that's COBOL,
which isn't as awful as the academic comp sci folklore would have it,
but still isn't a direction that anyone is going now. Beyond that I'd
think that you'd run straight into the tarpit of trying to parse
natural language syntax which has lingered just beyond the state of
the art for the past fifty years. -John]

Nick Maclaren

2005-05-16, 8:59 pm

"DeltaOne" <shakti.misra@wipro.com> writes:
|>
|> I got very good response for the C++ intermediate representation. And
|> thanks to all the experts. Now i need some help for one project I am
|> doing as a part of my study. Well it may sound a bit off track but the
|> idea is to desing a compiler that learns language like human learn. I
|> am explaining my idea little bit. This compiler is a natural language
|> system That is near to an expert system. The aim of the system is to
|> convert natural language statements into C++ (or any other
|> language,but we need to feed that language structure into this
|> application). We train the compiler to the destination language to
|> which it should compile. ...
|>
|> [This strikes me as one of those initially appealing bad ideas that's
|> failed many times before, so I would start by looking at previous
|> efforts. If you restrict your natural language to something simple
|> enough to be translated mechanically into computer code, that's COBOL,
|> which isn't as awful as the academic comp sci folklore would have it,
|> but still isn't a direction that anyone is going now. Beyond that I'd
|> think that you'd run straight into the tarpit of trying to parse
|> natural language syntax which has lingered just beyond the state of
|> the art for the past fifty years. -John]

Ah, but just think how much more deeply we are now than way
back then :-)

Yes, absolutely. One of my ex-colleagues is a world expert on this,
and my last conversation with her on the matter (a decade back)
implied that she expected neither of us to live long enough to see a
solution. The experts had given up, and retreated to the task of
trying to understand how human language works - which they are still
working on.

Regards,
Nick Maclaren.
Chris F Clark

2005-05-16, 8:59 pm

"DeltaOne" <shakti.misra@wipro.com> writes:
> The input natural language will be a restricted NL. It will be a
> algorithmic language. Well its not possible to give the full design
> here. But the main aim of the language is to get a uniform interface
> for all kind of languages.


You can probably be somewhat successful at getting a nice "human
readable natural language" to C++ translator for some very small
restricted domain. That is doable. However, a uniform interface to a
wide variety of languages (even to a small variety of languages over a
wide domain) is so far beyond the state-of-the-art as to be
inconcievable. The problem is in the subtle differences in semantics.

To make this example concrete, I just recently did a port of the
output of Yacc++ from C++ to C#. That's a very restricted problem and
the languages are in many respects close together and I wasn't coming
from a vague natural language user input specification, but a precise
clear definition (in C++) that I was intimately familiar with having
written much of it myself. The quick "syntactic" conversion took a
matter of days. The harder semantic conversion where I made sure that
all the C++ features worked (not necessarily the same way, but in any
way) in the C# version still has some unattended to tasks.

Thus, if you want your project to succeed, you are going to need to
vastly restrict the input domain to a subset of semantics that are
likely to exist (and be similar or at least compatible) across your
entire language set, perhaps integer arithmetic, fixed size arrays.
Note that in the restricted set (character) strings and I/O are not
likely to make it, as they tend to be highly different in different
languages. Even with such a restricted set, you will find that for
some languages (if you make you language set wide enough) you will not
be able to create the idiomatic program for that language, you will
create some foreign looking program that just happens to run
correctly.

Now, if you have some specific idea as to how this can be done, I
don't want to discourage you. Many times great progress is made by
heretics, who simply refuse to listen to common wisdom, because they
know something can be solved (and know an easy way to solve it).
However, if you don't have an approach in mind, pick a smaller
problem. Translating natural language into C++ for some restricted
set of inputs is "hard enough" and will keep you adequately amused for
some time. And, perhaps as you work on that problem, you will
discover some principles that allow you to go much farther than
expected. Then, you will become a heretic naturally.

Hope this helps,
-Chris

****************************************
*************************************
Chris Clark Internet : compres@world.std.com
Compiler Resources, Inc. Web Site : http://world.std.com/~compres
23 Bailey Rd voice : (508) 435-5016
Berlin, MA 01503 USA fax : (978) 838-0263 (24 hours)
------------------------------------------------------------------------------
Randy

2005-05-16, 8:59 pm

DeltaOne wrote:
> I got very good response for the C++ intermediate representation. And
> thanks to all the experts. Now i need some help for one project I am
> doing as a part of my study. Well it may sound a bit off track but the
> idea is to desing a compiler that learns language like human learn. I
> am explaining my idea little bit. This compiler is a natural language
> system That is near to an expert system. The aim of the system is to
> convert natural language statements into C++ (or any other
> language,but we need to feed that language structure into this
> application). We train the compiler to the destination language to
> which it should compile. The input natural language will be a
> restricted NL. It will be a algorithmic language. Well its not
> possible to give the full design here. But the main aim of the
> language is to get a uniform interface for all kind of languages. The
> compiler learns the language to which it should translate and does the
> job. I am as of now planning to create and design a system that
> translates natural language to C++ form.
>
> Any help on this topic and ideas form all of the experts out there
> will be of great help for me. My first priority is to tranlate NL to
> C++ and I dont want any optimisation now.

....

Unlike your C++ IR question, I think you're not going to get as good a
response to this one. I'm not clear exactly what you're proposing,
but I think it's either pointless or hopeless.

If your target is a programming language, then it seems like your NL
input will be simply a one-to-one mapping of tokens into your C++
output. If so, that's just speech recognition. If your NL input is
much higher level, as in, "sort and collate the data, then print",
then the need to model semantics and plan recognition and plan
generation will open a rats nest of complexity. You do NOT want to go
there.

IMHO, our fearless moderator is right -- translation from high level
NL abstraction to meaningful C++ (or even native code) is hopeless
given today's state of the art in AI, and the translation from low
level NL to C++ such as "a equals b times c" is a job for a mere FSM
(e.g. Lex).

The history of a couple of relevant enterprises will illustrate the
difficulties inherent in high level NLP translation: Automatic
Programming and Machine Translation. Decades of effort have gone into
each, with precious little to show for it. MT has achieved some
practical success, but only because humans have laboriously and
manually edited the mapping of the source lexicon into the target in
order to build commercial MT applications (like SYSTRAN).
Unfortunately, the translation of programming directions into
instructions is harder still.

Finally, you will also find that there is no consensus that NL is even
context free, which has dire implications for implementing its
translation using an expert system. Yes, a context sensitive language
can be parsed/translated using a pushdown automaton as a CFG (since
real world grammars and sentences are finite), but the amount of work
involved in representing all possible NL grammatical constructions
using productions is WAY outside the bounds of practicality. Even
abstractions on productions like statistical parsing run into serious
limits, not to mention the inability to resolve ambiguity or achieve
real understanding of NL without employing a rich/deep knowledge base.

BTW, years ago I recall suggesting that the translation of natural and
programming languages might share technology. I was roundly disabused
of the notion by folks from both the compiler and NLP communities. It
wouldn't surprise me if never the twain shall meet.

Randy
--
Randy Crawford http://www.ruf.rice.edu/~rand rand AT rice DOT edu
[I thought one of the few points of agreement in the NL community was
that natural languages are context sensitive. They gave up on CFGs,
which they call phrase structure grammars, around 1960. -John]

kleinecke@astound.net

2005-05-18, 4:02 am

DeltaOne wrote:

> the
> idea is to desing a compiler that learns language like human learn. I
> am explaining my idea little bit. This compiler is a natural language
> system That is near to an expert system. The aim of the system is to
> convert natural language statements into C++ (or any other
> language,but we need to feed that language structure into this
> application).


You might be interested in what goes on in the newsgroup called
rec.games.int-fiction or its cousin rec.games.int-fiction. Here is a
readily accessible, provided you are computer literate, mass of work
by some very clever people. These people are actually interacting with
the computer in an approximation to natural language. Nothing
academic.

If you must try technical linguistics I suggest Gazdar, Klein, Pullum
and Sag "Generalized Phrase Structure Grammar". It came out in 1985
and is now considered completely obsolete. But it does not dismiss
things like you are interested in out of hand and it is informed about
computer science. My guess is that if you actually read GPSG you will
change your goals.

Hans-Peter Diettrich

2005-05-18, 4:02 am

Nick Maclaren wrote:

> The experts had given up, and retreated to the task of
> trying to understand how human language works - which they are still
> working on.


[Probably from Uli Stein]
"My husband gave up his research on artificial intelligence, for some
more realistic goal. Now he researches natural stupidity!"

<gd&r>

DoDi
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com