Home > Archive > Prolog > September 2004 > Make deterministic without using cut?
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Make deterministic without using cut?
|
|
| Robert Oschler 2004-09-11, 3:56 am |
| I wrote the following function to split a string into a list of words, using
blanks as a delimiter:
% Tokenize a sentence into a list of words.
strtok([], [], _):-!.
strtok([], CharsAccum, [Word]):-
string_chars(Word, CharsAccum),
!.
strtok([32|T], CharsAccum, [Word | W_T]):-
!,
string_chars(Word, CharsAccum),
strtok(T, [], W_T).
strtok([C|T], CharsAccum, WordList):-
C \= 32,
!,
append(CharsAccum, [C], CharsAccum2),
strtok(T, CharsAccum2, WordList).
str_to_list(S, L):-
string_chars(S, Chars1),
strtok(Chars1, [], L).
What I don't like about what I've done is that I've used the cut frequently
to make the (overall) predicate deterministic, but I only want the predicate
to attempt to return one answer. Is there a more elegant way to do this?
BTW, I know the predicate could be more efficient if I used a difference
list for the CharsAccum list, but I was feeling lazy.
Thanks.
| |
| Nameless 2004-09-11, 3:56 pm |
| "Robert Oschler" wrote in message
news:l4mdnYOnQ8zWwN_cRVn-qA@adelphia.com...
> I wrote the following function to split a string into a list
> of words, using blanks as a delimiter:
>
> % Tokenize a sentence into a list of words.
> strtok([], [], _):-!.
> strtok([], CharsAccum, [Word]):-
> string_chars(Word, CharsAccum),
> !.
> strtok([32|T], CharsAccum, [Word | W_T]):-
> !,
> string_chars(Word, CharsAccum),
> strtok(T, [], W_T).
> strtok([C|T], CharsAccum, WordList):-
> C \= 32,
> !,
> append(CharsAccum, [C], CharsAccum2),
> strtok(T, CharsAccum2, WordList).
>
> str_to_list(S, L):-
> string_chars(S, Chars1),
> strtok(Chars1, [], L).
>
> What I don't like about what I've done is that I've used the
> cut frequently to make the (overall) predicate deterministic,
> but I only want the predicate to attempt to return one answer.
> Is there a more elegant way to do this?
>
> BTW, I know the predicate could be more efficient if I used a
> difference list for the CharsAccum list, but I was feeling lazy.
I posted the following Win-Prolog program almost a year
back (2003-09-26), which leads me to believe that you are
asking for help with a homework problem. This might not be
the case, however, but since the program is already in the
public domain, I suppose there's no reason to exhibit
scruples by not answering your request. ;) Of course, only
you can decide whether it is elegant enough for you.
/* Substrings extraction program */
% substrings_all(+String, -Substrings) is true if
% Substrings is a list of all substrings in String.
substrings_all(String, Substrings) :-
Delimiter = 32, % could specify a list of delimiters
substrings(String,Delimiter,Substrings).
% substrings(+String, +Delimiter, -Substrings)
substrings(InString, Delimiter, OutStrings) :-
(InString = []
-> OutStrings = []
; frontstring(InString,Delimiter,Substring
,Rest),
string_chars(S,Substring), % conv. chars. to string
OutStrings = [S|Substrings],
substrings(Rest,Delimiter,Substrings)
).
% frontstring(+String, +Delimiter, -FrontString, -Rest)
frontstring(InString, Delimiter, OutString, Rest) :-
(InString = [H|T]
-> (H == Delimiter % replace if more than one delimiter
-> OutString = [],
Rest = T
; OutString = [H|FrontString],
frontstring(T,Delimiter,FrontString,Rest
)
)
; OutString = [],
Rest = []
).
/* End of substrings extraction program */
Example goal:
| ?- substrings_all("Say hello, Molly", Substrings).
Substrings = ['Say','hello,','Molly']
Note that the comma in the input string is part of the
second word, not a substring (or token) or delimiter per
se. I have placed comments in substrings_all/2 and
frontstring/4 to indicate where to (start to) make changes
to enable a 'true' substrings extractor (or tokenizer).
Consider this program a framework for further development.
Good luck!
--
Mail sent to this email address is automatically deleted
(unread) on the server. Send replies to the newsgroup.
| |
| Salvador Fandiño 2004-09-11, 3:56 pm |
| Robert Oschler wrote:
> I wrote the following function to split a string into a list of words, using
> blanks as a delimiter:
>
> % Tokenize a sentence into a list of words.
> strtok([], [], _):-!.
> strtok([], CharsAccum, [Word]):-
> string_chars(Word, CharsAccum),
> !.
> strtok([32|T], CharsAccum, [Word | W_T]):-
> !,
> string_chars(Word, CharsAccum),
> strtok(T, [], W_T).
> strtok([C|T], CharsAccum, WordList):-
> C \= 32,
> !,
> append(CharsAccum, [C], CharsAccum2),
> strtok(T, CharsAccum2, WordList).
>
> str_to_list(S, L):-
> string_chars(S, Chars1),
> strtok(Chars1, [], L).
>
> What I don't like about what I've done is that I've used the cut frequently
> to make the (overall) predicate deterministic, but I only want the predicate
> to attempt to return one answer. Is there a more elegant way to do this?
and yet another way to do it:
str_to_lists(A, L) :-
atom_codes(A, AL),
strtok(AL, Acu, L1),
join_sublists([Acu|L1], L).
strtok([], [], []) :- !.
strtok([32|M], [], [Acu|L]) :-
!,
strtok(M, Acu, L).
strtok([C|M], [C|Acu], L) :-
C \= 32,
strtok(M, Acu, L).
join_sublists([], []) :- !.
join_sublists([H|T], [H1|T1]) :-
atom_codes(H1, H),
join_sublists(T, T1).
bye
- Salva
| |
| Torbjorn Lager 2004-09-11, 3:56 pm |
| DCG may sometimes be rather elegant (still using cuts, however):
words([Word|Words]) --> word(Word), !, words(Words).
words(Words) --> blank, words(Words).
words([]) --> "".
word(Word) --> char(Char), chars(Chars), {atom_chars(Word,[Char|Chars])}.
chars([Char|Chars]) --> char(Char), !, chars(Chars).
chars([]) --> "".
char(Char) --> [Char], {Char >= 65, Char =< 122}.
blank --> " ".
Use like so:
| ?- words(L,"this is a test","").
L = [this,is,a,test]
Cheers,
Torbjörn
"Robert Oschler" <no-mail-please@nospam.com> wrote in message
news:<l4mdnYOnQ8zWwN_cRVn-qA@adelphia.com>...
> I wrote the following function to split a string into a list of words, using
> blanks as a delimiter:
>
> % Tokenize a sentence into a list of words.
> strtok([], [], _):-!.
> strtok([], CharsAccum, [Word]):-
> string_chars(Word, CharsAccum),
> !.
> strtok([32|T], CharsAccum, [Word | W_T]):-
> !,
> string_chars(Word, CharsAccum),
> strtok(T, [], W_T).
> strtok([C|T], CharsAccum, WordList):-
> C \= 32,
> !,
> append(CharsAccum, [C], CharsAccum2),
> strtok(T, CharsAccum2, WordList).
>
> str_to_list(S, L):-
> string_chars(S, Chars1),
> strtok(Chars1, [], L).
>
> What I don't like about what I've done is that I've used the cut frequently
> to make the (overall) predicate deterministic, but I only want the predicate
> to attempt to return one answer. Is there a more elegant way to do this?
>
> BTW, I know the predicate could be more efficient if I used a difference
> list for the CharsAccum list, but I was feeling lazy.
>
> Thanks.
| |
| Dr Andrew R. Verden 2004-09-12, 8:56 am |
| Overall, I think using cut in this predicate is fine, it simply
expresses when a predicate commits to the clause. Prolog is always
a mixture of deterministic and non-deterministic predicates.
Personally, I use a predicate mode descriptor:
% strtok(+TokenList,+CharsAccum,-WordList)
to express input and output arguments.
But as I personally dont like to have to call strtok/3 from other clauses
with a [] as the start value of the accumulator, which somehow is information
that should be local to strtok/3, not to the calling environment, I would
add a clause:
strtok(TokenList,WordList) :-
strtok(TokenList,[],WordList).
and use strtok/2 from all calling locations. Export only strtok/2 from
the module.
> I wrote the following function to split a string into a list of words, using
> blanks as a delimiter:
>
> % Tokenize a sentence into a list of words.
> strtok([], [], _):-!.
> strtok([], CharsAccum, [Word]):-
> string_chars(Word, CharsAccum),
> !.
Here the !/0 should be placed before the call to string_chars/2.
Where you have it, it is a "red" cut, pruning possible alternatives
to string_chars/2 from inside strtok/3. Placing it before will make
it a "green" cut.
> strtok([32|T], CharsAccum, [Word | W_T]):-
> !,
> string_chars(Word, CharsAccum),
> strtok(T, [], W_T).
> strtok([C|T], CharsAccum, WordList):-
> C \= 32,
> !,
> append(CharsAccum, [C], CharsAccum2),
> strtok(T, CharsAccum2, WordList).
In this clause the !/0 is a "blue" cut as it prunes \=/2
which is a builtin. For me builtins are all ISO predicates
not library predicates like: string_chars/2
For a description of red green and blue cuts please see Richard
O'Keefe s book "The Craft of Prolog" it is excellent on this respect.
Of course, in IF/Prolog we just write:
atom_split(Atom,' ',SplitAtomList)
to split a string into a list of sub-strings....
it avoids all the unnecessary list creation in the
tokenisation.
| |
| reader 2004-09-16, 11:12 am |
| Robert Oschler wrote:
> I wrote the following function to split a string into a list of words, using
> blanks as a delimiter:
>
> % Tokenize a sentence into a list of words.
> strtok([], [], _):-!.
> strtok([], CharsAccum, [Word]):-
> string_chars(Word, CharsAccum),
> !.
> strtok([32|T], CharsAccum, [Word | W_T]):-
> !,
> string_chars(Word, CharsAccum),
> strtok(T, [], W_T).
> strtok([C|T], CharsAccum, WordList):-
> C \= 32,
> !,
> append(CharsAccum, [C], CharsAccum2),
> strtok(T, CharsAccum2, WordList).
>
> str_to_list(S, L):-
> string_chars(S, Chars1),
> strtok(Chars1, [], L).
>
> What I don't like about what I've done is that I've used the cut frequently
> to make the (overall) predicate deterministic, but I only want the predicate
> to attempt to return one answer. Is there a more elegant way to do this?
>
> BTW, I know the predicate could be more efficient if I used a difference
> list for the CharsAccum list, but I was feeling lazy.
>
> Thanks.
>
>
There is another way of looking at the question.
Suppose I had a predicate next_token(Str,Token) which
returned successive tokens contained in string Str, much in the way
member(X,L) returns successive members of list L.
Then, to tokenize any given string, I could just say
findall(Token,next_token(Str,Token),Toke
nList)
However, if I use member(X,L) as my model:
next_token(Str,Token) :-
scan_token(Str,_,Token).
next_token(Str,Token) :-
scan_token(Str,Str0,_),
next_token(Str,Token).
each token is scanned twice, first to get it, then to get past it.
This is where a neat trick I learned in this n.g. comes in handy:
next_token(Str,Token) :-
s_token(Str,Str0,CurrToken),
dispatch(Str0,CurrToken,Token).
dispatch(_Str0,CurrToken,Token) :-
Token = CurrToken.
dispatch(Str0,_,Token) :-
next_token(Str0,Token).
s_token([X|Str],Str0,Token) :-
X = 32,
!,
s_token(Str,Str0,Token).
s_token([X|Str],Str0,[X|TokenTail]) :-
\+ X = 32,
s_tokentail(Str,Str0,TokenTail).
s_tokentail([],[],[]).
s_tokentail([X|Str],Str,TokenTail) :-
X = 32,
!,
TokenTail=[].
s_tokentail([X|Str],Str0,[X|TokenTail]) :-
\+ X = 32,
s_tokentail(Str,Str0,TokenTail).
Now the scan resumes at the point where it left off after each
successive token is scanned, and
?- findall(Tok,next_token(" how now brown cow ",Tok),Tokl).
Tok = _G527
Tokl = [[104, 111, 119], [110, 111, 119], [98, 114, 111, 119, 110], [99,
111, 119]] ;
No
?-
Done in haste.
Bill
| |
| student 2004-09-16, 8:39 pm |
| reader wrote:
> There is another way of looking at the question.
>
> Suppose I had a predicate next_token(Str,Token) which
> returned successive tokens contained in string Str, much in the way
> member(X,L) returns successive members of list L.
>
> Then, to tokenize any given string, I could just say
>
> findall(Token,next_token(Str,Token),Toke
nList)
>
> However, if I use member(X,L) as my model:
>
> next_token(Str,Token) :-
> scan_token(Str,_,Token).
> next_token(Str,Token) :-
> scan_token(Str,Str0,_),
> next_token(Str,Token). <--------------- error
>
> each token is scanned twice, first to get it, then to get past it.
>
That should be
next_token(Str,Token) :-
scan_token(Str,_,Token).
next_token(Str,Token) :-
scan_token(Str,Str0,_),
next_token(Str0,Token).
--
|
|
|
|
|