Home > Archive > PERL Beginners > April 2005 > Use Perl to extract keywords
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Use Perl to extract keywords
|
|
| Robert Kerry 2005-04-25, 3:56 am |
| I want to use Perl to extract keywords from plaintext, don't know
whether there are some exsiting package / algorithm for doing that?
Thank you.
Regards,
Robert.
| |
| Ezra Taylor 2005-04-25, 3:56 am |
| Robert:
An example is below.
#!/usr/bin/perl -w
open(FILE,"/etc/passwd") || die "Cannot open file: $!";
while ( <FILE> )
{
if( /Ezra/ ) #I'm searching for strings with the word Ezra.
{
print $_; # Now I'm printing lines with the name Ezra
}
}=20
close(FILE);
On 4/24/05, Robert Kerry <kerry.robert@gmail.com> wrote:
> I want to use Perl to extract keywords from plaintext, don't know
> whether there are some exsiting package / algorithm for doing that?
> Thank you.
>=20
> Regards,
>=20
> Robert.
>=20
> --
> To unsubscribe, e-mail: beginners-unsubscribe@perl.org
> For additional commands, e-mail: beginners-help@perl.org
> <http://learn.perl.org/> <http://learn.perl.org/first-response>
>=20
>
| |
| John Doe 2005-04-25, 8:56 am |
| Hi Robert
Am Montag, 25. April 2005 01.45 schrieb Robert Kerry:
> I want to use Perl to extract keywords from plaintext, don't know
> whether there are some exsiting package / algorithm for doing that?
> Thank you.
In the case you know the keywords
(I suppose so, otherwise you would search words, right?)
you could use the following strategy:
1. Put your keywords in a hash
2. Split your text into words
3. handle the split words found in the keyword hash according to your "extract
rules".
Depending on your "extract rules", other/extended methods may be more
appropriate.
hth, joe
| |
| M. Kristall 2005-04-27, 8:56 am |
| Robert Kerry wrote:
> I want to use Perl to extract keywords from plaintext, don't know
> whether there are some exsiting package / algorithm for doing that?
> Thank you.
>
> Regards,
>
> Robert.
If you are attempting to "extract" keywords as a search engine might
(i.e. find all words of substance), you might split the document by
spaces (\s), loop through your array ignoring all non-words and
unsubstantial words, etc., and incrementing / adding the corresponding
hash element (rating / # of occurances), then doing whatever is
appropriate with this information.
Something like this should work if you are only searching local
documents that don't have single words you might want to consider
multiple words - otherwise, you could change \s to \W
foreach (split /\s+/, $document) {
unless (&badword) {
# a function to check if $_ is a common or "bad" word
if (exists $keywords{$_}) { $keywords{$_}++; }
else { $keywords{$_} = 1; }
}
}
|
|
|
|
|