For Programmers: Free Programming Magazines  


Home > Archive > Prolog > May 2004 > Ignore punctuation









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Ignore punctuation
Mark

2004-05-12, 9:20 pm

Hi,
Im trying to create some code, for a college homework assignment, that
will be used to read in an english sentence and convert it to a prolog
list. This bit I have done successfully, however I would also like to
recognise punctuation but not add it to the list. The reason for this
is that punctuation is used to determine end of sentence but Im also
trying to parse the list using a dcg parser i have which I want to
ignore any punctuation.

I think I need to modify the readword statement, but any changes i
have made have been unsuccessful so far. I am fairly new to this so
any advice would be much appreciated.

Thanks

Code I have so far:

getsent([W|Ws]):-
get0(C),
readword(C,W,C1),
restsent(W,C1,Ws).

restsent(W,_,[]):-
lastword(W),!.

restsent(W,C,[W1|Ws]):-
readword(C,W1,C1),
restsent(W1,C1,Ws).

readword(C,W,C1):-
punctuation(C),!,
name(W,[C]),
get0(C1).

punctuation(33).
punctuation(44).
punctuation(46).
punctuation(58).
punctuation(59).
punctuation(63).
Martin Sondergaard

2004-05-12, 9:20 pm

"Mark" <coconutpete@hotmail.com> wrote in message
news:2f68fab2.0405100422.3cad36e@posting.google.com...
> Hi,
> Im trying to create some code, for a college homework assignment, that
> will be used to read in an english sentence and convert it to a prolog
> list. This bit I have done successfully, however I would also like to
> recognise punctuation but not add it to the list. The reason for this
> is that punctuation is used to determine end of sentence but Im also
> trying to parse the list using a dcg parser i have which I want to
> ignore any punctuation.
>
> I think I need to modify the readword statement, but any changes i
> have made have been unsuccessful so far. I am fairly new to this so
> any advice would be much appreciated.
>
> Thanks
>
> Code I have so far:
>
> getsent([W|Ws]):-
> get0(C),
> readword(C,W,C1),
> restsent(W,C1,Ws).
>
> restsent(W,_,[]):-
> lastword(W),!.
>
> restsent(W,C,[W1|Ws]):-
> readword(C,W1,C1),
> restsent(W1,C1,Ws).
>
> readword(C,W,C1):-
> punctuation(C),!,
> name(W,[C]),
> get0(C1).
>
> punctuation(33).
> punctuation(44).
> punctuation(46).
> punctuation(58).
> punctuation(59).
> punctuation(63).



Your rule "getsent" uses "get0" to read in one character.
What do you want it to do next?
Do you want it to read in more characters,
then add them together, and put them in "W" and "Ws" ?

Your rule "readword" takes the character read in using "get0",
and sees if it is a "punctuation" character.
If it is not, then the call to "punctuation" fails,
so "readsent" fails.

So "getsent" isn't behaving as you want it to.

You are trying to read in the text using the low-level predicate "get0",
and to analyse the text as you read it in.
This is quite difficult to do.
But nowadays, its not necessary to use this low-level predicate, "get0".
You can use a more high-level predicate instead.

I think the name of the predicate you need
depends on which brand of Prolog you are using.
(What brand is it?)
With SWI Prolog, you can use "read_line_to_codes",
to read in a line of text from the keyboard.

Use this to read in a line from the keyboard :
?- read_line_to_codes(user, L).

Next, you need to write a rule to take the list, L,
and remove the punctuation from it. Like this :

my_program :-
read_line_to_codes(user, L),
write( 'The list L is ' ), write(L),
remove_punctuation(L, L1),
write( 'Now the list is ' ), write( L1 ),
parse( ... ),
write( 'Now the parsed list is ' ), write( ... ),
... .

Wherever I wrote "...", you can add your own code.
The "write" statements are just to show you what is happening,
you can remove them from the program when it works correctly.

Now, you will need to write the rule "remove_punctuation".
You need it to behave like this :

?- remove_punctuation( [21, 22, 33, 21, 22], L1 ).
L1 = [21,22,21,22]
Yes

?-

Here's a clue that may help you.
Many rules to deal with lists look like this :

rule( [], [] ).
rule( [H|T], [...] ) :-
...
rule(T, T1).

In the case of "remove_punctuation", you will need
one clause to remove H from the list if H is punctuation,
and one clause that is used if H is not punctuation.
Something like this :

remove_punctuation( [], [] ).
remove_punctuation( [H|T], ... ) :-
punctuation(H),
...
remove_punctuation( [H|T], ... ) :-
...
....


I hope this helps.

--
Martin Sondergaard,
London.




P.S. the predicate "name" is old-fashioned.
Nowadays its more usual to use "atom_codes/2" instead.
This does the same thing, but has a better name.



Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com