Home > Archive > PERL Beginners > March 2004 > Fuzzy string matching
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Fuzzy string matching
|
|
| Juman 2004-03-26, 11:14 pm |
| I have two strings I want to compare doing some kind of fuzzy matching?
Is there some good way to that in perl or could someone help with a
routine matching word by word and giving a percental result.
Like
String 1 : This is a ten characters long string is it not
String 2 : This is not so long
String 1 compared to String 2 gives 40% (four words are the same)
String 2 compared to String 1 gives 80% (four word are the same)
/juman
| |
| James Edward Gray II 2004-03-26, 11:14 pm |
| On Mar 24, 2004, at 4:31 AM, juman wrote:
> I have two strings I want to compare doing some kind of fuzzy matching?
> Is there some good way to that in perl or could someone help with a
> routine matching word by word and giving a percental result.
>
> Like
>
> String 1 : This is a ten characters long string is it not
> String 2 : This is not so long
>
> String 1 compared to String 2 gives 40% (four words are the same)
> String 2 compared to String 1 gives 80% (four word are the same)
See if this gives you some ideas:
#!/usr/bin/perl
use strict;
use warnings;
my $string1 = 'This is a ten characters long string is it not';
my $string2 = 'This is not so long';
print compare_words($string1, $string2), "%\n";
print compare_words($string2, $string1), "%\n";
sub compare_words {
my($str1, $str2) = @_;
my @words = split ' ', $str2;
my $in_both_count = 0;
my %seen;
foreach (split ' ', $str1) {
next if $seen{$_}++;
$in_both_count++ if $str2 =~ m/\b$_\b/;
}
return sprintf '%.0f', $in_both_count / scalar(@words) * 100;
}
__END__
James
| |
| Juman 2004-03-26, 11:14 pm |
| Great... got it! :) Now my little script is running... Thanks for the
help (again)...
/juman
On Wed, Mar 24, 2004 at 09:34:58AM -0600, James Edward Gray II wrote:
> On Mar 24, 2004, at 4:31 AM, juman wrote:
>
>
> See if this gives you some ideas:
>
> #!/usr/bin/perl
>
> use strict;
> use warnings;
>
> my $string1 = 'This is a ten characters long string is it not';
> my $string2 = 'This is not so long';
>
> print compare_words($string1, $string2), "%\n";
> print compare_words($string2, $string1), "%\n";
>
> sub compare_words {
> my($str1, $str2) = @_;
>
> my @words = split ' ', $str2;
> my $in_both_count = 0;
> my %seen;
> foreach (split ' ', $str1) {
> next if $seen{$_}++;
> $in_both_count++ if $str2 =~ m/\b$_\b/;
> }
>
> return sprintf '%.0f', $in_both_count / scalar(@words) * 100;
> }
>
> __END__
>
> James
>
>
> --
> To unsubscribe, e-mail: beginners-unsubscribe@perl.org
> For additional commands, e-mail: beginners-help@perl.org
> <http://learn.perl.org/> <http://learn.perl.org/first-response>
>
>
| |
|
|
| Chris McMahon 2004-03-26, 11:14 pm |
| Hi Juman...=20
> -----Original Message-----
> From: juman [mailto:juman@chello.se]=20
> Sent: Wednesday, March 24, 2004 3:31 AM
> To: beginners@perl.org
> Subject: Fuzzy string matching
>=20
>=20
> I have two strings I want to compare doing some kind of fuzzy=20
> matching?
> Is there some good way to that in perl or could someone help with a
> routine matching word by word and giving a percental result.
>=20
> Like
>=20
> String 1 : This is a ten characters long string is it not
> String 2 : This is not so long
>=20
> String 1 compared to String 2 gives 40% (four words are the same)
> String 2 compared to String 1 gives 80% (four word are the same)
>=20
> /juman
>=20
You might try fooling around with the List::Compare module
http://search.cpan.org/~jkeenan/Lis...0.22/Compare.pm . I've
been using it recently and it's pretty nifty. It won't give you the
percentages you want, but I think it could supply the raw comparison
data, and then you could compute the percentages yourself. =20
And if that's not quite right, the List::Compare page on CPAN
has references to similar modules at the bottom of the page, maybe one
of the other diff/compare modules might get you there. =20
Hope that helps.=20
-Chris =20
|
|
|
|
|