For Programmers: Free Programming Magazines  


Home > Archive > Prolog > September 2004 > Make deterministic without using cut?









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Make deterministic without using cut?
Robert Oschler

2004-09-11, 3:56 am

I wrote the following function to split a string into a list of words, using
blanks as a delimiter:

% Tokenize a sentence into a list of words.
strtok([], [], _):-!.
strtok([], CharsAccum, [Word]):-
string_chars(Word, CharsAccum),
!.
strtok([32|T], CharsAccum, [Word | W_T]):-
!,
string_chars(Word, CharsAccum),
strtok(T, [], W_T).
strtok([C|T], CharsAccum, WordList):-
C \= 32,
!,
append(CharsAccum, [C], CharsAccum2),
strtok(T, CharsAccum2, WordList).

str_to_list(S, L):-
string_chars(S, Chars1),
strtok(Chars1, [], L).

What I don't like about what I've done is that I've used the cut frequently
to make the (overall) predicate deterministic, but I only want the predicate
to attempt to return one answer. Is there a more elegant way to do this?

BTW, I know the predicate could be more efficient if I used a difference
list for the CharsAccum list, but I was feeling lazy.

Thanks.


Nameless

2004-09-11, 3:56 pm

"Robert Oschler" wrote in message
news:l4mdnYOnQ8zWwN_cRVn-qA@adelphia.com...
> I wrote the following function to split a string into a list
> of words, using blanks as a delimiter:
>
> % Tokenize a sentence into a list of words.
> strtok([], [], _):-!.
> strtok([], CharsAccum, [Word]):-
> string_chars(Word, CharsAccum),
> !.
> strtok([32|T], CharsAccum, [Word | W_T]):-
> !,
> string_chars(Word, CharsAccum),
> strtok(T, [], W_T).
> strtok([C|T], CharsAccum, WordList):-
> C \= 32,
> !,
> append(CharsAccum, [C], CharsAccum2),
> strtok(T, CharsAccum2, WordList).
>
> str_to_list(S, L):-
> string_chars(S, Chars1),
> strtok(Chars1, [], L).
>
> What I don't like about what I've done is that I've used the
> cut frequently to make the (overall) predicate deterministic,
> but I only want the predicate to attempt to return one answer.
> Is there a more elegant way to do this?
>
> BTW, I know the predicate could be more efficient if I used a
> difference list for the CharsAccum list, but I was feeling lazy.


I posted the following Win-Prolog program almost a year
back (2003-09-26), which leads me to believe that you are
asking for help with a homework problem. This might not be
the case, however, but since the program is already in the
public domain, I suppose there's no reason to exhibit
scruples by not answering your request. ;) Of course, only
you can decide whether it is elegant enough for you.

/* Substrings extraction program */

% substrings_all(+String, -Substrings) is true if
% Substrings is a list of all substrings in String.

substrings_all(String, Substrings) :-
Delimiter = 32, % could specify a list of delimiters
substrings(String,Delimiter,Substrings).


% substrings(+String, +Delimiter, -Substrings)

substrings(InString, Delimiter, OutStrings) :-
(InString = []
-> OutStrings = []
; frontstring(InString,Delimiter,Substring
,Rest),
string_chars(S,Substring), % conv. chars. to string
OutStrings = [S|Substrings],
substrings(Rest,Delimiter,Substrings)
).


% frontstring(+String, +Delimiter, -FrontString, -Rest)

frontstring(InString, Delimiter, OutString, Rest) :-
(InString = [H|T]
-> (H == Delimiter % replace if more than one delimiter
-> OutString = [],
Rest = T
; OutString = [H|FrontString],
frontstring(T,Delimiter,FrontString,Rest
)
)
; OutString = [],
Rest = []
).

/* End of substrings extraction program */

Example goal:

| ?- substrings_all("Say hello, Molly", Substrings).
Substrings = ['Say','hello,','Molly']

Note that the comma in the input string is part of the
second word, not a substring (or token) or delimiter per
se. I have placed comments in substrings_all/2 and
frontstring/4 to indicate where to (start to) make changes
to enable a 'true' substrings extractor (or tokenizer).

Consider this program a framework for further development.
Good luck!

--
Mail sent to this email address is automatically deleted
(unread) on the server. Send replies to the newsgroup.


Salvador Fandiño

2004-09-11, 3:56 pm

Robert Oschler wrote:
> I wrote the following function to split a string into a list of words, using
> blanks as a delimiter:
>
> % Tokenize a sentence into a list of words.
> strtok([], [], _):-!.
> strtok([], CharsAccum, [Word]):-
> string_chars(Word, CharsAccum),
> !.
> strtok([32|T], CharsAccum, [Word | W_T]):-
> !,
> string_chars(Word, CharsAccum),
> strtok(T, [], W_T).
> strtok([C|T], CharsAccum, WordList):-
> C \= 32,
> !,
> append(CharsAccum, [C], CharsAccum2),
> strtok(T, CharsAccum2, WordList).
>
> str_to_list(S, L):-
> string_chars(S, Chars1),
> strtok(Chars1, [], L).
>
> What I don't like about what I've done is that I've used the cut frequently
> to make the (overall) predicate deterministic, but I only want the predicate
> to attempt to return one answer. Is there a more elegant way to do this?


and yet another way to do it:

str_to_lists(A, L) :-
atom_codes(A, AL),
strtok(AL, Acu, L1),
join_sublists([Acu|L1], L).

strtok([], [], []) :- !.
strtok([32|M], [], [Acu|L]) :-
!,
strtok(M, Acu, L).
strtok([C|M], [C|Acu], L) :-
C \= 32,
strtok(M, Acu, L).

join_sublists([], []) :- !.
join_sublists([H|T], [H1|T1]) :-
atom_codes(H1, H),
join_sublists(T, T1).


bye

- Salva
Torbjorn Lager

2004-09-11, 3:56 pm

DCG may sometimes be rather elegant (still using cuts, however):

words([Word|Words]) --> word(Word), !, words(Words).
words(Words) --> blank, words(Words).
words([]) --> "".

word(Word) --> char(Char), chars(Chars), {atom_chars(Word,[Char|Chars])}.

chars([Char|Chars]) --> char(Char), !, chars(Chars).
chars([]) --> "".

char(Char) --> [Char], {Char >= 65, Char =< 122}.

blank --> " ".

Use like so:

| ?- words(L,"this is a test","").
L = [this,is,a,test]

Cheers,
Torbjörn

"Robert Oschler" <no-mail-please@nospam.com> wrote in message
news:<l4mdnYOnQ8zWwN_cRVn-qA@adelphia.com>...
> I wrote the following function to split a string into a list of words, using
> blanks as a delimiter:
>
> % Tokenize a sentence into a list of words.
> strtok([], [], _):-!.
> strtok([], CharsAccum, [Word]):-
> string_chars(Word, CharsAccum),
> !.
> strtok([32|T], CharsAccum, [Word | W_T]):-
> !,
> string_chars(Word, CharsAccum),
> strtok(T, [], W_T).
> strtok([C|T], CharsAccum, WordList):-
> C \= 32,
> !,
> append(CharsAccum, [C], CharsAccum2),
> strtok(T, CharsAccum2, WordList).
>
> str_to_list(S, L):-
> string_chars(S, Chars1),
> strtok(Chars1, [], L).
>
> What I don't like about what I've done is that I've used the cut frequently
> to make the (overall) predicate deterministic, but I only want the predicate
> to attempt to return one answer. Is there a more elegant way to do this?
>
> BTW, I know the predicate could be more efficient if I used a difference
> list for the CharsAccum list, but I was feeling lazy.
>
> Thanks.

Dr Andrew R. Verden

2004-09-12, 8:56 am

Overall, I think using cut in this predicate is fine, it simply
expresses when a predicate commits to the clause. Prolog is always
a mixture of deterministic and non-deterministic predicates.

Personally, I use a predicate mode descriptor:

% strtok(+TokenList,+CharsAccum,-WordList)

to express input and output arguments.

But as I personally dont like to have to call strtok/3 from other clauses
with a [] as the start value of the accumulator, which somehow is information
that should be local to strtok/3, not to the calling environment, I would
add a clause:

strtok(TokenList,WordList) :-
strtok(TokenList,[],WordList).

and use strtok/2 from all calling locations. Export only strtok/2 from
the module.

> I wrote the following function to split a string into a list of words, using
> blanks as a delimiter:
>
> % Tokenize a sentence into a list of words.
> strtok([], [], _):-!.
> strtok([], CharsAccum, [Word]):-
> string_chars(Word, CharsAccum),
> !.


Here the !/0 should be placed before the call to string_chars/2.
Where you have it, it is a "red" cut, pruning possible alternatives
to string_chars/2 from inside strtok/3. Placing it before will make
it a "green" cut.

> strtok([32|T], CharsAccum, [Word | W_T]):-
> !,
> string_chars(Word, CharsAccum),
> strtok(T, [], W_T).
> strtok([C|T], CharsAccum, WordList):-
> C \= 32,
> !,
> append(CharsAccum, [C], CharsAccum2),
> strtok(T, CharsAccum2, WordList).


In this clause the !/0 is a "blue" cut as it prunes \=/2
which is a builtin. For me builtins are all ISO predicates
not library predicates like: string_chars/2

For a description of red green and blue cuts please see Richard
O'Keefe s book "The Craft of Prolog" it is excellent on this respect.

Of course, in IF/Prolog we just write:

atom_split(Atom,' ',SplitAtomList)

to split a string into a list of sub-strings....
it avoids all the unnecessary list creation in the
tokenisation.
reader

2004-09-16, 11:12 am

Robert Oschler wrote:

> I wrote the following function to split a string into a list of words, using
> blanks as a delimiter:
>
> % Tokenize a sentence into a list of words.
> strtok([], [], _):-!.
> strtok([], CharsAccum, [Word]):-
> string_chars(Word, CharsAccum),
> !.
> strtok([32|T], CharsAccum, [Word | W_T]):-
> !,
> string_chars(Word, CharsAccum),
> strtok(T, [], W_T).
> strtok([C|T], CharsAccum, WordList):-
> C \= 32,
> !,
> append(CharsAccum, [C], CharsAccum2),
> strtok(T, CharsAccum2, WordList).
>
> str_to_list(S, L):-
> string_chars(S, Chars1),
> strtok(Chars1, [], L).
>
> What I don't like about what I've done is that I've used the cut frequently
> to make the (overall) predicate deterministic, but I only want the predicate
> to attempt to return one answer. Is there a more elegant way to do this?
>
> BTW, I know the predicate could be more efficient if I used a difference
> list for the CharsAccum list, but I was feeling lazy.
>
> Thanks.
>
>


There is another way of looking at the question.

Suppose I had a predicate next_token(Str,Token) which
returned successive tokens contained in string Str, much in the way
member(X,L) returns successive members of list L.

Then, to tokenize any given string, I could just say

findall(Token,next_token(Str,Token),Toke
nList)

However, if I use member(X,L) as my model:

next_token(Str,Token) :-
scan_token(Str,_,Token).
next_token(Str,Token) :-
scan_token(Str,Str0,_),
next_token(Str,Token).

each token is scanned twice, first to get it, then to get past it.

This is where a neat trick I learned in this n.g. comes in handy:

next_token(Str,Token) :-
s_token(Str,Str0,CurrToken),
dispatch(Str0,CurrToken,Token).

dispatch(_Str0,CurrToken,Token) :-
Token = CurrToken.
dispatch(Str0,_,Token) :-
next_token(Str0,Token).

s_token([X|Str],Str0,Token) :-
X = 32,
!,
s_token(Str,Str0,Token).
s_token([X|Str],Str0,[X|TokenTail]) :-
\+ X = 32,
s_tokentail(Str,Str0,TokenTail).

s_tokentail([],[],[]).
s_tokentail([X|Str],Str,TokenTail) :-
X = 32,
!,
TokenTail=[].
s_tokentail([X|Str],Str0,[X|TokenTail]) :-
\+ X = 32,
s_tokentail(Str,Str0,TokenTail).

Now the scan resumes at the point where it left off after each
successive token is scanned, and

?- findall(Tok,next_token(" how now brown cow ",Tok),Tokl).

Tok = _G527
Tokl = [[104, 111, 119], [110, 111, 119], [98, 114, 111, 119, 110], [99,
111, 119]] ;

No
?-

Done in haste.

Bill
student

2004-09-16, 8:39 pm

reader wrote:

> There is another way of looking at the question.
>
> Suppose I had a predicate next_token(Str,Token) which
> returned successive tokens contained in string Str, much in the way
> member(X,L) returns successive members of list L.
>
> Then, to tokenize any given string, I could just say
>
> findall(Token,next_token(Str,Token),Toke
nList)
>
> However, if I use member(X,L) as my model:
>
> next_token(Str,Token) :-
> scan_token(Str,_,Token).
> next_token(Str,Token) :-
> scan_token(Str,Str0,_),
> next_token(Str,Token). <--------------- error
>
> each token is scanned twice, first to get it, then to get past it.
>


That should be

next_token(Str,Token) :-
scan_token(Str,_,Token).
next_token(Str,Token) :-
scan_token(Str,Str0,_),
next_token(Str0,Token).

--
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com