Code Comments
Programming Forum and web based access to our favorite programming groups.I have two strings I want to compare doing some kind of fuzzy matching? Is there some good way to that in perl or could someone help with a routine matching word by word and giving a percental result. Like String 1 : This is a ten characters long string is it not String 2 : This is not so long String 1 compared to String 2 gives 40% (four words are the same) String 2 compared to String 1 gives 80% (four word are the same) /juman
Post Follow-up to this messageOn Mar 24, 2004, at 4:31 AM, juman wrote:
> I have two strings I want to compare doing some kind of fuzzy matching?
> Is there some good way to that in perl or could someone help with a
> routine matching word by word and giving a percental result.
>
> Like
>
> String 1 : This is a ten characters long string is it not
> String 2 : This is not so long
>
> String 1 compared to String 2 gives 40% (four words are the same)
> String 2 compared to String 1 gives 80% (four word are the same)
See if this gives you some ideas:
#!/usr/bin/perl
use strict;
use warnings;
my $string1 = 'This is a ten characters long string is it not';
my $string2 = 'This is not so long';
print compare_words($string1, $string2), "%\n";
print compare_words($string2, $string1), "%\n";
sub compare_words {
my($str1, $str2) = @_;
my @words = split ' ', $str2;
my $in_both_count = 0;
my %seen;
foreach (split ' ', $str1) {
next if $seen{$_}++;
$in_both_count++ if $str2 =~ m/\b$_\b/;
}
return sprintf '%.0f', $in_both_count / scalar(@words) * 100;
}
__END__
James
Post Follow-up to this messageGreat... got it! :) Now my little script is running... Thanks for the
help (again)...
/juman
On Wed, Mar 24, 2004 at 09:34:58AM -0600, James Edward Gray II wrote:
> On Mar 24, 2004, at 4:31 AM, juman wrote:
>
>
> See if this gives you some ideas:
>
> #!/usr/bin/perl
>
> use strict;
> use warnings;
>
> my $string1 = 'This is a ten characters long string is it not';
> my $string2 = 'This is not so long';
>
> print compare_words($string1, $string2), "%\n";
> print compare_words($string2, $string1), "%\n";
>
> sub compare_words {
> my($str1, $str2) = @_;
>
> my @words = split ' ', $str2;
> my $in_both_count = 0;
> my %seen;
> foreach (split ' ', $str1) {
> next if $seen{$_}++;
> $in_both_count++ if $str2 =~ m/\b$_\b/;
> }
>
> return sprintf '%.0f', $in_both_count / scalar(@words) * 100;
> }
>
> __END__
>
> James
>
>
> --
> To unsubscribe, e-mail: beginners-unsubscribe@perl.org
> For additional commands, e-mail: beginners-help@perl.org
> <http://learn.perl.org/> <http://learn.perl.org/first-response>
>
>
Post Follow-up to this messageOn Wed, Mar 24, 2004 at 11:31:15AM +0100, juman wrote: > String 1 compared to String 2 gives 40% (four words are the same) > String 2 compared to String 1 gives 80% (four word are the same) You can find a summary of possibilities here: http://www.perlmonks.org/index.pl?node_id=162038 Basically, what is comes down to are these modules: String::Approx Text::Levenshtein Algorithm::Diff My favorite is the Levenshtein distance module. Good luck, Damon -- Damon Allen Davison http://allolex.freeshell.org/ Perl and Linguistics <http://world.std.com/~swmcd/steven/...inguistics.html> <http://www.linuxjournal.com/article.php?sid=3394> <http://www.wall.org/~larry/keynote/keynote.html>
Post Follow-up to this messageHi Juman...=20 > -----Original Message----- > From: juman [mailto:juman@chello.se]=20 > Sent: Wednesday, March 24, 2004 3:31 AM > To: beginners@perl.org > Subject: Fuzzy string matching >=20 >=20 > I have two strings I want to compare doing some kind of fuzzy=20 > matching? > Is there some good way to that in perl or could someone help with a > routine matching word by word and giving a percental result. >=20 > Like >=20 > String 1 : This is a ten characters long string is it not > String 2 : This is not so long >=20 > String 1 compared to String 2 gives 40% (four words are the same) > String 2 compared to String 1 gives 80% (four word are the same) >=20 > /juman >=20 You might try fooling around with the List::Compare module http://search.cpan.org/~jkeenan/Lis...0.22/Compare.pm . I've been using it recently and it's pretty nifty. It won't give you the percentages you want, but I think it could supply the raw comparison data, and then you could compute the percentages yourself. =20 And if that's not quite right, the List::Compare page on CPAN has references to similar modules at the bottom of the page, maybe one of the other diff/compare modules might get you there. =20 Hope that helps.=20 -Chris =20
Post Follow-up to this message
Show a Printable Version
Email This Page to Someone!
Receive updates to this thread
Powered by vBulletin
Copyright 2000-2006 Jelsoft Enterprises Limited.