Home > Archive > AWK > January 2005 > XPosting:Search for best matched portion of a string
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
XPosting:Search for best matched portion of a string
|
|
|
| Hi,
I have 2 strings(both may have variable number of words)
first string is: Unix Shell programming by Kernighan and Pike
second string is: Unix Shell Programming by Kernighan
Would it possible thru unix bash shell script to tell that the
portion of the second string "Unix Shell Programming by Kernighan" is
matching partially with the first string?
Further my second string can be:
Unix Shell programming by Pike
The same script(case insensitive) shud tell that the portion of the
second string
"Unix Shell programming by" matches partially with the first string.
Can anyone help me.
Thanks in advance,
Anil.
| |
| Ed Morton 2004-12-29, 3:56 pm |
|
Anil wrote:
> Hi,
>
> I have 2 strings(both may have variable number of words)
> first string is: Unix Shell programming by Kernighan and Pike
> second string is: Unix Shell Programming by Kernighan
>
> Would it possible thru unix bash shell script to tell that the
> portion of the second string "Unix Shell Programming by Kernighan" is
> matching partially with the first string?
>
> Further my second string can be:
> Unix Shell programming by Pike
>
> The same script(case insensitive) shud tell that the portion of the
> second string
> "Unix Shell programming by" matches partially with the first string.
> Can anyone help me.
> Thanks in advance,
> Anil.
>
Here's some hints for you to apply to your own solution. Something like
this (untested):
awk 'BEGIN{
str1="Unix Shell programming by Kernighan and Pike"
str2="Unix Shell Programming by Kernighan"
c = split(str1,tmp," ")
for (i=1; i<=c; i++) {
word=tolower(tmp[i])
words1[word]=tmp[i]
}
c = split(str2,tmp," ")
for (i=1; i<=c; i++) {
word=tolower(tmp[i])
words2[word]=tmp[i]
}
for (word in words2) {
if (word in words1) {
printf "Found %s\n", words2[word]
}
}'
will create 2 arrays where the indices of each are the lower-case words
in the respective strings and each elemnt is the original case-varying
word and find if words from the second string occur in the first one.
Something like this (also untested):
awk 'BEGIN{
str1="Unix Shell programming by Kernighan and Pike"
str2="Unix Shell Programming by Kernighan"
c1 = split(str1,tmp," ")
substr1[0]=""
sep=""
for (i=1; i<=c1; i++) {
word=tolower(tmp[i])
substr1[i]=substr1[i-1] sep word
sep = " "
}
c2 = split(str2,tmp," ")
substr1[0]=""
sep=""
for (i=1; i<=c2; i++) {
word=tolower(tmp[i])
substr2[i]=substr2[i-1] sep word
sep = " "
}
for (i=1;i<=c1;i++) {
printf "Found substring %s in first string\n", substr1[i]
}
for (i=1;i<=c2;i++) {
printf "Found substring %s in second string\n", substr2[i]
}
}'
will print all substrings in each string after conversion to lower case.
A micture of these 2 techniques should give you what you want. Try it
and post a follow up if you have questions.
Ed.
| |
| William James 2004-12-29, 3:56 pm |
| BEGIN {
s1="Unix Shell programming by Kernighan and Pike"
s2="Unix Shell Programming by Kernighan"
s3="Unix Shell programming by Pike"
comp(s1,s1)
comp(s1,s2)
comp(s1,s3)
}
function comp( s1, s2 )
{ s1=toupper(s1); s2=toupper(s2)
if (s1==s2)
{ print "The strings are the same."; return }
for (p=1; p<=length(s2); p++)
if (substr(s1,p,1)!=substr(s2,p,1) )
break
p--
if ( p==length(s2) )
print "The second string is an abbreviation of the first."
else
printf "%d of %d characters of the second string match the first.",
p, length(s2)
}
Running this awk program produces
The strings are the same.
The second string is an abbreviation of the first.
26 of 30 characters of the second string match the first.
| |
|
| Thanks very much Ed and William.
| |
|
| I wrote a simple one(case sensitive), but not better than the one
posted above:
function compare_strings()
{
one=$1
two=$2
final_str=""
i_pos=0
for i in $one
do
i_pos=`expr $i_pos + 1`
j_pos=0
for j in $two
do
j_pos=`expr $j_pos + 1`
if [ "$i" = "$j" ];
then
if [ $i_pos -eq $j_pos ];
then
final_str="$final_str $j"
echo "The final string is $final_str"
continue 3
fi
fi
done
done
| |
| Ed Morton 2005-01-01, 3:55 am |
|
Anil wrote:
> Hi,
>
> I have 2 strings(both may have variable number of words)
> first string is: Unix Shell programming by Kernighan and Pike
> second string is: Unix Shell Programming by Kernighan
>
> Would it possible thru unix bash shell script to tell that the
> portion of the second string "Unix Shell Programming by Kernighan" is
> matching partially with the first string?
>
> Further my second string can be:
> Unix Shell programming by Pike
>
> The same script(case insensitive) shud tell that the portion of the
> second string
> "Unix Shell programming by" matches partially with the first string.
> Can anyone help me.
> Thanks in advance,
> Anil.
>
Here's some hints for you to apply to your own solution. Something like
this (untested):
awk 'BEGIN{
str1="Unix Shell programming by Kernighan and Pike"
str2="Unix Shell Programming by Kernighan"
c = split(str1,tmp," ")
for (i=1; i<=c; i++) {
word=tolower(tmp[i])
words1[word]=tmp[i]
}
c = split(str2,tmp," ")
for (i=1; i<=c; i++) {
word=tolower(tmp[i])
words2[word]=tmp[i]
}
for (word in words2) {
if (word in words1) {
printf "Found %s\n", words2[word]
}
}'
will create 2 arrays where the indices of each are the lower-case words
in the respective strings and each elemnt is the original case-varying
word and find if words from the second string occur in the first one.
Something like this (also untested):
awk 'BEGIN{
str1="Unix Shell programming by Kernighan and Pike"
str2="Unix Shell Programming by Kernighan"
c1 = split(str1,tmp," ")
substr1[0]=""
sep=""
for (i=1; i<=c1; i++) {
word=tolower(tmp[i])
substr1[i]=substr1[i-1] sep word
sep = " "
}
c2 = split(str2,tmp," ")
substr1[0]=""
sep=""
for (i=1; i<=c2; i++) {
word=tolower(tmp[i])
substr2[i]=substr2[i-1] sep word
sep = " "
}
for (i=1;i<=c1;i++) {
printf "Found substring %s in first string\n", substr1[i]
}
for (i=1;i<=c2;i++) {
printf "Found substring %s in second string\n", substr2[i]
}
}'
will print all substrings in each string after conversion to lower case.
A micture of these 2 techniques should give you what you want. Try it
and post a follow up if you have questions.
Ed.
| |
| William James 2005-01-03, 8:55 am |
| BEGIN {
s1="Unix Shell programming by Kernighan and Pike"
s2="Unix Shell Programming by Kernighan"
s3="Unix Shell programming by Pike"
comp(s1,s1)
comp(s1,s2)
comp(s1,s3)
}
function comp( s1, s2 )
{ s1=toupper(s1); s2=toupper(s2)
if (s1==s2)
{ print "The strings are the same."; return }
for (p=1; p<=length(s2); p++)
if (substr(s1,p,1)!=substr(s2,p,1) )
break
p--
if ( p==length(s2) )
print "The second string is an abbreviation of the first."
else
printf "%d of %d characters of the second string match the first.",
p, length(s2)
}
Running this awk program produces
The strings are the same.
The second string is an abbreviation of the first.
26 of 30 characters of the second string match the first.
|
|
|
|
|