Code Comments
Programming Forum and web based access to our favorite programming groups.Hi, I have 2 strings(both may have variable number of words) first string is: Unix Shell programming by Kernighan and Pike second string is: Unix Shell Programming by Kernighan Would it possible thru unix bash shell script to tell that the portion of the second string "Unix Shell Programming by Kernighan" is matching partially with the first string? Further my second string can be: Unix Shell programming by Pike The same script(case insensitive) shud tell that the portion of the second string "Unix Shell programming by" matches partially with the first string. Can anyone help me. Thanks in advance, Anil.
Post Follow-up to this message
Anil wrote:
> Hi,
>
> I have 2 strings(both may have variable number of words)
> first string is: Unix Shell programming by Kernighan and Pike
> second string is: Unix Shell Programming by Kernighan
>
> Would it possible thru unix bash shell script to tell that the
> portion of the second string "Unix Shell Programming by Kernighan" is
> matching partially with the first string?
>
> Further my second string can be:
> Unix Shell programming by Pike
>
> The same script(case insensitive) shud tell that the portion of the
> second string
> "Unix Shell programming by" matches partially with the first string.
> Can anyone help me.
> Thanks in advance,
> Anil.
>
Here's some hints for you to apply to your own solution. Something like
this (untested):
awk 'BEGIN{
str1="Unix Shell programming by Kernighan and Pike"
str2="Unix Shell Programming by Kernighan"
c = split(str1,tmp," ")
for (i=1; i<=c; i++) {
word=tolower(tmp[i])
words1[word]=tmp[i]
}
c = split(str2,tmp," ")
for (i=1; i<=c; i++) {
word=tolower(tmp[i])
words2[word]=tmp[i]
}
for (word in words2) {
if (word in words1) {
printf "Found %s\n", words2[word]
}
}'
will create 2 arrays where the indices of each are the lower-case words
in the respective strings and each elemnt is the original case-varying
word and find if words from the second string occur in the first one.
Something like this (also untested):
awk 'BEGIN{
str1="Unix Shell programming by Kernighan and Pike"
str2="Unix Shell Programming by Kernighan"
c1 = split(str1,tmp," ")
substr1[0]=""
sep=""
for (i=1; i<=c1; i++) {
word=tolower(tmp[i])
substr1[i]=substr1[i-1] sep word
sep = " "
}
c2 = split(str2,tmp," ")
substr1[0]=""
sep=""
for (i=1; i<=c2; i++) {
word=tolower(tmp[i])
substr2[i]=substr2[i-1] sep word
sep = " "
}
for (i=1;i<=c1;i++) {
printf "Found substring %s in first string\n", substr1[i]
}
for (i=1;i<=c2;i++) {
printf "Found substring %s in second string\n", substr2[i]
}
}'
will print all substrings in each string after conversion to lower case.
A micture of these 2 techniques should give you what you want. Try it
and post a follow up if you have questions.
Ed.
Post Follow-up to this messageBEGIN {
s1="Unix Shell programming by Kernighan and Pike"
s2="Unix Shell Programming by Kernighan"
s3="Unix Shell programming by Pike"
comp(s1,s1)
comp(s1,s2)
comp(s1,s3)
}
function comp( s1, s2 )
{ s1=toupper(s1); s2=toupper(s2)
if (s1==s2)
{ print "The strings are the same."; return }
for (p=1; p<=length(s2); p++)
if (substr(s1,p,1)!=substr(s2,p,1) )
break
p--
if ( p==length(s2) )
print "The second string is an abbreviation of the first."
else
printf "%d of %d characters of the second string match the first.",
p, length(s2)
}
Running this awk program produces
The strings are the same.
The second string is an abbreviation of the first.
26 of 30 characters of the second string match the first.
Post Follow-up to this messageThanks very much Ed and William.
Post Follow-up to this messageI wrote a simple one(case sensitive), but not better than the one
posted above:
function compare_strings()
{
one=$1
two=$2
final_str=""
i_pos=0
for i in $one
do
i_pos=`expr $i_pos + 1`
j_pos=0
for j in $two
do
j_pos=`expr $j_pos + 1`
if [ "$i" = "$j" ];
then
if [ $i_pos -eq $j_pos ];
then
final_str="$final_str $j"
echo "The final string is $final_str"
continue 3
fi
fi
done
done
Post Follow-up to this message
Anil wrote:
> Hi,
>
> I have 2 strings(both may have variable number of words)
> first string is: Unix Shell programming by Kernighan and Pike
> second string is: Unix Shell Programming by Kernighan
>
> Would it possible thru unix bash shell script to tell that the
> portion of the second string "Unix Shell Programming by Kernighan" is
> matching partially with the first string?
>
> Further my second string can be:
> Unix Shell programming by Pike
>
> The same script(case insensitive) shud tell that the portion of the
> second string
> "Unix Shell programming by" matches partially with the first string.
> Can anyone help me.
> Thanks in advance,
> Anil.
>
Here's some hints for you to apply to your own solution. Something like
this (untested):
awk 'BEGIN{
str1="Unix Shell programming by Kernighan and Pike"
str2="Unix Shell Programming by Kernighan"
c = split(str1,tmp," ")
for (i=1; i<=c; i++) {
word=tolower(tmp[i])
words1[word]=tmp[i]
}
c = split(str2,tmp," ")
for (i=1; i<=c; i++) {
word=tolower(tmp[i])
words2[word]=tmp[i]
}
for (word in words2) {
if (word in words1) {
printf "Found %s\n", words2[word]
}
}'
will create 2 arrays where the indices of each are the lower-case words
in the respective strings and each elemnt is the original case-varying
word and find if words from the second string occur in the first one.
Something like this (also untested):
awk 'BEGIN{
str1="Unix Shell programming by Kernighan and Pike"
str2="Unix Shell Programming by Kernighan"
c1 = split(str1,tmp," ")
substr1[0]=""
sep=""
for (i=1; i<=c1; i++) {
word=tolower(tmp[i])
substr1[i]=substr1[i-1] sep word
sep = " "
}
c2 = split(str2,tmp," ")
substr1[0]=""
sep=""
for (i=1; i<=c2; i++) {
word=tolower(tmp[i])
substr2[i]=substr2[i-1] sep word
sep = " "
}
for (i=1;i<=c1;i++) {
printf "Found substring %s in first string\n", substr1[i]
}
for (i=1;i<=c2;i++) {
printf "Found substring %s in second string\n", substr2[i]
}
}'
will print all substrings in each string after conversion to lower case.
A micture of these 2 techniques should give you what you want. Try it
and post a follow up if you have questions.
Ed.
Post Follow-up to this messageBEGIN {
s1="Unix Shell programming by Kernighan and Pike"
s2="Unix Shell Programming by Kernighan"
s3="Unix Shell programming by Pike"
comp(s1,s1)
comp(s1,s2)
comp(s1,s3)
}
function comp( s1, s2 )
{ s1=toupper(s1); s2=toupper(s2)
if (s1==s2)
{ print "The strings are the same."; return }
for (p=1; p<=length(s2); p++)
if (substr(s1,p,1)!=substr(s2,p,1) )
break
p--
if ( p==length(s2) )
print "The second string is an abbreviation of the first."
else
printf "%d of %d characters of the second string match the first.",
p, length(s2)
}
Running this awk program produces
The strings are the same.
The second string is an abbreviation of the first.
26 of 30 characters of the second string match the first.
Post Follow-up to this messagePowered by vBulletin
Copyright 2000-2006 Jelsoft Enterprises Limited.