Code Comments

Programming Forum and web based access to our favorite programming groups.
For Programmers: Free Programming Magazines | New: Database administration forum
Registration is free! Edit your profileCalendarFind other membersFrequently Asked QuestionsSearch -> 
Post New Thread











Thread
Author

Make deterministic without using cut?
I wrote the following function to split a string into a list of words, using
blanks as a delimiter:

% Tokenize a sentence into a list of words.
strtok([], [], _):-!.
strtok([], CharsAccum, [Word]):-
string_chars(Word, CharsAccum),
!.
strtok([32|T], CharsAccum, [Word | W_T]):-
!,
string_chars(Word, CharsAccum),
strtok(T, [], W_T).
strtok([C|T], CharsAccum, WordList):-
C \= 32,
!,
append(CharsAccum, [C], CharsAccum2),
strtok(T, CharsAccum2, WordList).

str_to_list(S, L):-
string_chars(S, Chars1),
strtok(Chars1, [], L).

What I don't like about what I've done is that I've used the cut frequently
to make the (overall) predicate deterministic, but I only want the predicate
to attempt to return one answer.  Is there a more elegant way to do this?

BTW, I know the predicate could be more efficient if I used a difference
list for the CharsAccum list, but I was feeling lazy.

Thanks.



Report this thread to moderator Post Follow-up to this message
Old Post
Robert Oschler
09-11-04 08:56 AM


Re: Make deterministic without using cut?
"Robert Oschler" wrote in message
news:l4mdnYOnQ8zWwN_cRVn-qA@adelphia.com...
> I wrote the following function to split a string into a list
> of words, using blanks as a delimiter:
>
> % Tokenize a sentence into a list of words.
> strtok([], [], _):-!.
> strtok([], CharsAccum, [Word]):-
>  string_chars(Word, CharsAccum),
>  !.
> strtok([32|T], CharsAccum, [Word | W_T]):-
>  !,
>  string_chars(Word, CharsAccum),
>  strtok(T, [], W_T).
> strtok([C|T], CharsAccum, WordList):-
>  C \= 32,
>  !,
>  append(CharsAccum, [C], CharsAccum2),
>  strtok(T, CharsAccum2, WordList).
>
> str_to_list(S, L):-
>  string_chars(S, Chars1),
>  strtok(Chars1, [], L).
>
> What I don't like about what I've done is that I've used the
> cut frequently to make the (overall) predicate deterministic,
> but I only want the predicate to attempt to return one answer.
> Is there a more elegant way to do this?
>
> BTW, I know the predicate could be more efficient if I used a
> difference list for the CharsAccum list, but I was feeling lazy.

I posted the following Win-Prolog program almost a year
back (2003-09-26), which leads me to believe that you are
asking for help with a homework problem. This might not be
the case, however, but since the program is already in the
public domain, I suppose there's no reason to exhibit
scruples by not answering your request. ;)  Of course, only
you can decide whether it is elegant enough for you.

/* Substrings extraction program */

% substrings_all(+String, -Substrings) is true if
% Substrings is a list of all substrings in String.

substrings_all(String, Substrings) :-
Delimiter = 32,  % could specify a list of delimiters
substrings(String,Delimiter,Substrings).


% substrings(+String, +Delimiter, -Substrings)

substrings(InString, Delimiter, OutStrings) :-
(InString = []
-> OutStrings = []
;   frontstring(InString,Delimiter,Substring
,Rest),
string_chars(S,Substring),  % conv. chars. to string
OutStrings = [S|Substrings],
substrings(Rest,Delimiter,Substrings)
).


% frontstring(+String, +Delimiter, -FrontString, -Rest)

frontstring(InString, Delimiter, OutString, Rest) :-
(InString = [H|T]
-> (H == Delimiter  % replace if more than one delimiter
-> OutString = [],
Rest = T
;  OutString = [H|FrontString],
 frontstring(T,Delimiter,FrontString,Rest
)
)
;  OutString = [],
Rest = []
).

/* End of substrings extraction program */

Example goal:

| ?- substrings_all("Say hello, Molly", Substrings).
Substrings = ['Say','hello,','Molly']

Note that the comma in the input string is part of the
second word, not a substring (or token) or delimiter per
se. I have placed comments in substrings_all/2 and
frontstring/4 to indicate where to (start to) make changes
to enable a 'true' substrings extractor (or tokenizer).

Consider this program a framework for further development.
Good luck!

--
Mail sent to this email address is automatically deleted
(unread) on the server. Send replies to the newsgroup.



Report this thread to moderator Post Follow-up to this message
Old Post
Nameless
09-11-04 08:56 PM


Re: Make deterministic without using cut?
Robert Oschler wrote:
> I wrote the following function to split a string into a list of words, usi
ng
> blanks as a delimiter:
>
> % Tokenize a sentence into a list of words.
> strtok([], [], _):-!.
> strtok([], CharsAccum, [Word]):-
>  string_chars(Word, CharsAccum),
>  !.
> strtok([32|T], CharsAccum, [Word | W_T]):-
>  !,
>  string_chars(Word, CharsAccum),
>  strtok(T, [], W_T).
> strtok([C|T], CharsAccum, WordList):-
>  C \= 32,
>  !,
>  append(CharsAccum, [C], CharsAccum2),
>  strtok(T, CharsAccum2, WordList).
>
> str_to_list(S, L):-
>  string_chars(S, Chars1),
>  strtok(Chars1, [], L).
>
> What I don't like about what I've done is that I've used the cut frequentl
y
> to make the (overall) predicate deterministic, but I only want the predica
te
> to attempt to return one answer.  Is there a more elegant way to do this?

and yet another way to do it:

str_to_lists(A, L) :-
atom_codes(A, AL),
strtok(AL, Acu, L1),
join_sublists([Acu|L1], L).

strtok([], [], []) :- !.
strtok([32|M], [], [Acu|L]) :-
!,
strtok(M, Acu, L).
strtok([C|M], [C|Acu], L) :-
C \= 32,
strtok(M, Acu, L).

join_sublists([], []) :- !.
join_sublists([H|T], [H1|T1]) :-
atom_codes(H1, H),
join_sublists(T, T1).


bye

- Salva

Report this thread to moderator Post Follow-up to this message
Old Post
Salvador Fandiño
09-11-04 08:56 PM


Re: Make deterministic without using cut?
DCG may sometimes be rather elegant (still using cuts, however):

words([Word|Words]) --> word(Word), !, words(Words).
words(Words) --> blank, words(Words).
words([]) --> "".

word(Word) --> char(Char), chars(Chars), {atom_chars(Word,[Char|Chars])}.

chars([Char|Chars]) --> char(Char), !, chars(Chars).
chars([]) --> "".

char(Char) --> [Char], {Char >= 65, Char =< 122}.

blank --> " ".

Use like so:

| ?- words(L,"this is  a test","").
L = [this,is,a,test]

Cheers,
Torbjörn

"Robert Oschler" <no-mail-please@nospam.com> wrote in message
news:<l4mdnYOnQ8zWwN_cRVn-qA@adelphia.com>...
> I wrote the following function to split a string into a list of words, usi
ng
> blanks as a delimiter:
>
> % Tokenize a sentence into a list of words.
> strtok([], [], _):-!.
> strtok([], CharsAccum, [Word]):-
>  string_chars(Word, CharsAccum),
>  !.
> strtok([32|T], CharsAccum, [Word | W_T]):-
>  !,
>  string_chars(Word, CharsAccum),
>  strtok(T, [], W_T).
> strtok([C|T], CharsAccum, WordList):-
>  C \= 32,
>  !,
>  append(CharsAccum, [C], CharsAccum2),
>  strtok(T, CharsAccum2, WordList).
>
> str_to_list(S, L):-
>  string_chars(S, Chars1),
>  strtok(Chars1, [], L).
>
> What I don't like about what I've done is that I've used the cut frequentl
y
> to make the (overall) predicate deterministic, but I only want the predica
te
> to attempt to return one answer.  Is there a more elegant way to do this?
>
> BTW, I know the predicate could be more efficient if I used a difference
> list for the CharsAccum list, but I was feeling lazy.
>
> Thanks.

Report this thread to moderator Post Follow-up to this message
Old Post
Torbjorn Lager
09-11-04 08:56 PM


Re: Make deterministic without using cut?
Overall, I think using cut in this predicate is fine, it simply
expresses when a predicate commits to the clause. Prolog is always
a mixture of deterministic and non-deterministic predicates.

Personally, I use a predicate mode descriptor:

% strtok(+TokenList,+CharsAccum,-WordList)

to express input and output arguments.

But as I personally dont like to have to call strtok/3 from other clauses
with a [] as the start value of the accumulator, which somehow is informatio
n
that should be local to strtok/3, not to the calling environment, I would
add a clause:

strtok(TokenList,WordList) :-
strtok(TokenList,[],WordList).

and use strtok/2 from all calling locations. Export only strtok/2 from
the module.

> I wrote the following function to split a string into a list of words, usi
ng
> blanks as a delimiter:
>
> % Tokenize a sentence into a list of words.
> strtok([], [], _):-!.
> strtok([], CharsAccum, [Word]):-
>  string_chars(Word, CharsAccum),
>  !.

Here the !/0 should be placed before the call to string_chars/2.
Where you have it, it is a "red" cut, pruning possible alternatives
to string_chars/2 from inside strtok/3. Placing it before will make
it a "green" cut.

> strtok([32|T], CharsAccum, [Word | W_T]):-
>  !,
>  string_chars(Word, CharsAccum),
>  strtok(T, [], W_T).
> strtok([C|T], CharsAccum, WordList):-
>  C \= 32,
>  !,
>  append(CharsAccum, [C], CharsAccum2),
>  strtok(T, CharsAccum2, WordList).

In this clause the !/0 is a "blue" cut as it prunes \=/2
which is a builtin.  For me builtins are all ISO predicates
not library predicates like: string_chars/2

For a description of red green and blue cuts please see Richard
O'Keefe s book "The Craft of Prolog" it is excellent on this respect.

Of course, in IF/Prolog we just write:

atom_split(Atom,' ',SplitAtomList)

to split a string into a list of sub-strings....
it avoids all the unnecessary list creation in the
tokenisation.

Report this thread to moderator Post Follow-up to this message
Old Post
Dr Andrew R. Verden
09-12-04 01:56 PM


Re: Make deterministic without using cut?
Robert Oschler wrote:

> I wrote the following function to split a string into a list of words, usi
ng
> blanks as a delimiter:
>
> % Tokenize a sentence into a list of words.
> strtok([], [], _):-!.
> strtok([], CharsAccum, [Word]):-
>  string_chars(Word, CharsAccum),
>  !.
> strtok([32|T], CharsAccum, [Word | W_T]):-
>  !,
>  string_chars(Word, CharsAccum),
>  strtok(T, [], W_T).
> strtok([C|T], CharsAccum, WordList):-
>  C \= 32,
>  !,
>  append(CharsAccum, [C], CharsAccum2),
>  strtok(T, CharsAccum2, WordList).
>
> str_to_list(S, L):-
>  string_chars(S, Chars1),
>  strtok(Chars1, [], L).
>
> What I don't like about what I've done is that I've used the cut frequentl
y
> to make the (overall) predicate deterministic, but I only want the predica
te
> to attempt to return one answer.  Is there a more elegant way to do this?
>
> BTW, I know the predicate could be more efficient if I used a difference
> list for the CharsAccum list, but I was feeling lazy.
>
> Thanks.
>
>

There is another way of looking at the question.

Suppose I had a predicate next_token(Str,Token) which
returned successive tokens contained in string Str, much in the way
member(X,L) returns successive members of list L.

Then, to tokenize any given string, I could just say

 findall(Token,next_token(Str,Token),Toke
nList)

However, if I use member(X,L) as my model:

next_token(Str,Token) :-
scan_token(Str,_,Token).
next_token(Str,Token) :-
scan_token(Str,Str0,_),
next_token(Str,Token).

each token is scanned twice, first to get it, then to get past it.

This is where a neat trick I learned in this n.g. comes in handy:

next_token(Str,Token) :-
s_token(Str,Str0,CurrToken),
dispatch(Str0,CurrToken,Token).

dispatch(_Str0,CurrToken,Token) :-
Token = CurrToken.
dispatch(Str0,_,Token) :-
next_token(Str0,Token).

s_token([X|Str],Str0,Token) :-
X = 32,
!,
s_token(Str,Str0,Token).
s_token([X|Str],Str0,[X|TokenTail]) :-
\+ X = 32,
s_tokentail(Str,Str0,TokenTail).

s_tokentail([],[],[]).
s_tokentail([X|Str],Str,TokenTail) :-
X = 32,
!,
TokenTail=[].
s_tokentail([X|Str],Str0,[X|TokenTail]) :-
\+ X = 32,
s_tokentail(Str,Str0,TokenTail).

Now the scan resumes at the point where it left off after each
successive token is scanned, and

?- findall(Tok,next_token(" how now brown cow ",Tok),Tokl).

Tok = _G527
Tokl = [[104, 111, 119], [110, 111, 119], [98, 114, 111, 119, 110], [99,
111, 119]] ;

No
?-

Done in haste.

Bill

Report this thread to moderator Post Follow-up to this message
Old Post
reader
09-16-04 04:12 PM


Re: Make deterministic without using cut?
reader wrote:

>   There is another way of looking at the question.
>
>   Suppose I had a predicate next_token(Str,Token) which
> returned successive tokens contained in string Str, much in the way
> member(X,L) returns successive members of list L.
>
>   Then, to tokenize any given string, I could just say
>
>       findall(Token,next_token(Str,Token),Toke
nList)
>
>   However, if I use member(X,L) as my model:
>
>     next_token(Str,Token) :-
>       scan_token(Str,_,Token).
>     next_token(Str,Token) :-
>       scan_token(Str,Str0,_),
>       next_token(Str,Token).   <--------------- error
>
> each token is scanned twice, first to get it, then to get past it.
>

That should be

next_token(Str,Token) :-
scan_token(Str,_,Token).
next_token(Str,Token) :-
scan_token(Str,Str0,_),
next_token(Str0,Token).

--

Report this thread to moderator Post Follow-up to this message
Old Post
student
09-17-04 01:39 AM


Sponsored Links




Last Thread Next Thread Next
Search this forum -> 
Post New Thread

Prolog archive

Show a Printable Version Send to friend Email This Page to Someone! subscribe to this thread Receive updates to this thread
Computer Consultants
Programming Jobs
Visual Basic Controls
SQL Server Programming
Webservices
Java Security
Visual Studio
C# Programming
Visual J++
Software engineering
Open source Software
Perl Programming
PHP Programming
ASP Programming
ASP .NET Programming
Visual Basic Programming
Windows Scripting Host
Java Programming
Java Help
Java Beans
VBScript
Cobol
MAC Applications
Unix Programming
Forum Jump:
All times are GMT. The time now is 05:10 PM.

 
Free MCSE Braindumps | Real Estate Topics

Programming forum archive

Copyrights CodeComments.com 2004 - 2006

Powered by vBulletin Copyright 2000-2006 Jelsoft Enterprises Limited.