For Programmers: Free Programming Magazines  


Home > Archive > PERL Beginners > April 2005 > Use Perl to extract keywords









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Use Perl to extract keywords
Robert Kerry

2005-04-25, 3:56 am

I want to use Perl to extract keywords from plaintext, don't know
whether there are some exsiting package / algorithm for doing that?
Thank you.

Regards,

Robert.
Ezra Taylor

2005-04-25, 3:56 am

Robert:


An example is below.



#!/usr/bin/perl -w

open(FILE,"/etc/passwd") || die "Cannot open file: $!";

while ( <FILE> )
{
if( /Ezra/ ) #I'm searching for strings with the word Ezra.
{
print $_; # Now I'm printing lines with the name Ezra
}
}=20

close(FILE);


On 4/24/05, Robert Kerry <kerry.robert@gmail.com> wrote:
> I want to use Perl to extract keywords from plaintext, don't know
> whether there are some exsiting package / algorithm for doing that?
> Thank you.
>=20
> Regards,
>=20
> Robert.
>=20
> --
> To unsubscribe, e-mail: beginners-unsubscribe@perl.org
> For additional commands, e-mail: beginners-help@perl.org
> <http://learn.perl.org/> <http://learn.perl.org/first-response>
>=20
>

John Doe

2005-04-25, 8:56 am

Hi Robert

Am Montag, 25. April 2005 01.45 schrieb Robert Kerry:
> I want to use Perl to extract keywords from plaintext, don't know
> whether there are some exsiting package / algorithm for doing that?
> Thank you.


In the case you know the keywords
(I suppose so, otherwise you would search words, right?)
you could use the following strategy:

1. Put your keywords in a hash
2. Split your text into words
3. handle the split words found in the keyword hash according to your "extract
rules".

Depending on your "extract rules", other/extended methods may be more
appropriate.

hth, joe
M. Kristall

2005-04-27, 8:56 am

Robert Kerry wrote:
> I want to use Perl to extract keywords from plaintext, don't know
> whether there are some exsiting package / algorithm for doing that?
> Thank you.
>
> Regards,
>
> Robert.

If you are attempting to "extract" keywords as a search engine might
(i.e. find all words of substance), you might split the document by
spaces (\s), loop through your array ignoring all non-words and
unsubstantial words, etc., and incrementing / adding the corresponding
hash element (rating / # of occurances), then doing whatever is
appropriate with this information.

Something like this should work if you are only searching local
documents that don't have single words you might want to consider
multiple words - otherwise, you could change \s to \W
foreach (split /\s+/, $document) {
unless (&badword) {
# a function to check if $_ is a common or "bad" word
if (exists $keywords{$_}) { $keywords{$_}++; }
else { $keywords{$_} = 1; }
}
}
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2009 codecomments.com