Code Comments

Programming Forum and web based access to our favorite programming groups.
For Programmers: Free Programming Magazines | New: Database administration forum
Registration is free! Edit your profileCalendarFind other membersFrequently Asked QuestionsSearch -> 
Post New Thread











Thread
Author

Fuzzy string matching
I have two strings I want to compare doing some kind of fuzzy matching?
Is there some good way to that in perl or could someone help with a
routine matching word by word and giving a percental result.

Like

String 1 : This is a ten characters long string is it not
String 2 : This is not so long

String 1 compared to String 2 gives 40% (four words are the same)
String 2 compared to String 1 gives 80% (four word are the same)

/juman

Report this thread to moderator Post Follow-up to this message
Old Post
Juman
03-27-04 04:14 AM


Re: Fuzzy string matching
On Mar 24, 2004, at 4:31 AM, juman wrote:

> I have two strings I want to compare doing some kind of fuzzy matching?
> Is there some good way to that in perl or could someone help with a
> routine matching word by word and giving a percental result.
>
> Like
>
> String 1 : This is a ten characters long string is it not
> String 2 : This is not so long
>
> String 1 compared to String 2 gives 40% (four words are the same)
> String 2 compared to String 1 gives 80% (four word are the same)

See if this gives you some ideas:

#!/usr/bin/perl

use strict;
use warnings;

my $string1 = 'This is a ten characters long string is it not';
my $string2 = 'This is not so long';

print compare_words($string1, $string2), "%\n";
print compare_words($string2, $string1), "%\n";

sub compare_words {
my($str1, $str2) = @_;

my @words = split ' ', $str2;
my $in_both_count = 0;
my %seen;
foreach (split ' ', $str1) {
next if $seen{$_}++;
$in_both_count++ if $str2 =~ m/\b$_\b/;
}

return sprintf '%.0f', $in_both_count / scalar(@words) * 100;
}

__END__

James


Report this thread to moderator Post Follow-up to this message
Old Post
James Edward Gray II
03-27-04 04:14 AM


Re: Fuzzy string matching
Great... got it! :) Now my little script is running... Thanks for the
help (again)...

/juman

On Wed, Mar 24, 2004 at 09:34:58AM -0600, James Edward Gray II wrote:
> On Mar 24, 2004, at 4:31 AM, juman wrote:
> 
>
> See if this gives you some ideas:
>
> #!/usr/bin/perl
>
> use strict;
> use warnings;
>
> my $string1 = 'This is a ten characters long string is it not';
> my $string2 = 'This is not so long';
>
> print compare_words($string1, $string2), "%\n";
> print compare_words($string2, $string1), "%\n";
>
> sub compare_words {
> 	my($str1, $str2) = @_;
>
> 	my @words = split ' ', $str2;
> 	my $in_both_count = 0;
> 	my %seen;
> 	foreach (split ' ', $str1) {
> 		next if $seen{$_}++;
> 		$in_both_count++ if $str2 =~ m/\b$_\b/;
> 	}
>
> 	return sprintf '%.0f', $in_both_count / scalar(@words) * 100;
> }
>
> __END__
>
> James
>
>
> --
> To unsubscribe, e-mail: beginners-unsubscribe@perl.org
> For additional commands, e-mail: beginners-help@perl.org
> <http://learn.perl.org/> <http://learn.perl.org/first-response>
>
>

Report this thread to moderator Post Follow-up to this message
Old Post
Juman
03-27-04 04:14 AM


Re: Fuzzy string matching
On Wed, Mar 24, 2004 at 11:31:15AM +0100, juman wrote:
> String 1 compared to String 2 gives 40% (four words are the same)
> String 2 compared to String 1 gives 80% (four word are the same)

You can find a summary of possibilities here:

http://www.perlmonks.org/index.pl?node_id=162038

Basically, what is comes down to are these modules:

String::Approx
Text::Levenshtein
Algorithm::Diff

My favorite is the Levenshtein distance module.

Good luck,

Damon

--


Damon Allen Davison

http://allolex.freeshell.org/

Perl and Linguistics
<http://world.std.com/~swmcd/steven/...inguistics.html>
<http://www.linuxjournal.com/article.php?sid=3394>
<http://www.wall.org/~larry/keynote/keynote.html>

Report this thread to moderator Post Follow-up to this message
Old Post
Damon Allen Davison
03-27-04 04:14 AM


RE: Fuzzy string matching
Hi Juman...=20

> -----Original Message-----
> From: juman [mailto:juman@chello.se]=20
> Sent: Wednesday, March 24, 2004 3:31 AM
> To: beginners@perl.org
> Subject: Fuzzy string matching
>=20
>=20
> I have two strings I want to compare doing some kind of fuzzy=20
> matching?
> Is there some good way to that in perl or could someone help with a
> routine matching word by word and giving a percental result.
>=20
> Like
>=20
> String 1 : This is a ten characters long string is it not
> String 2 : This is not so long
>=20
> String 1 compared to String 2 gives 40% (four words are the same)
> String 2 compared to String 1 gives 80% (four word are the same)
>=20
> /juman
>=20

You might try fooling around with the List::Compare module
http://search.cpan.org/~jkeenan/Lis...0.22/Compare.pm .  I've
been using it recently and it's pretty nifty.  It won't give you the
percentages you want, but I think it could supply the raw comparison
data, and then you could compute the percentages yourself.  =20
And if that's not quite right, the List::Compare page on CPAN
has references to similar modules at the bottom of the page, maybe one
of the other diff/compare modules might get you there. =20
Hope that helps.=20
-Chris  =20

Report this thread to moderator Post Follow-up to this message
Old Post
Chris McMahon
03-27-04 04:14 AM


Sponsored Links




Last Thread Next Thread Next
Search this forum -> 
Post New Thread

PERL Beginners archive

Show a Printable Version Send to friend Email This Page to Someone! subscribe to this thread Receive updates to this thread
Computer Consultants
Programming Jobs
Visual Basic Controls
SQL Server Programming
Webservices
Java Security
Visual Studio
C# Programming
Visual J++
Software engineering
Open source Software
Perl Programming
PHP Programming
ASP Programming
ASP .NET Programming
Visual Basic Programming
Windows Scripting Host
Java Programming
Java Help
Java Beans
VBScript
Cobol
MAC Applications
Unix Programming
Forum Jump:
All times are GMT. The time now is 04:08 PM.

 
Free MCSE Braindumps | Real Estate Topics

Programming forum archive

Copyrights CodeComments.com 2004 - 2006

Powered by vBulletin Copyright 2000-2006 Jelsoft Enterprises Limited.