For Programmers: Free Programming Magazines  


Home > Archive > PERL Beginners > January 2005 > Storing $DIGIT variables in arrays









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Storing $DIGIT variables in arrays
Jesse Taylor

2005-01-23, 8:56 pm

Below I have posted the source for a program I am attempting to write
that will take a list of URL's, grab the pages, and search them for
email addresses and IP addresses, remove duplicate entries, and store
the results in a text file. Everything compiles fine and runs without
any warnings, however my output is not what I expected. When I run it
with this URL in my list: http://gentoo-solid.no-ip.com/testpage.php ,
the output file does not contain either of the actual IP addresses and
instead picks the only one that is NOT an IP address and just takes the
first four numbers "45.45.45.45". The emails are working fine, however
and all show up in the output file. Any help on this would be appreciated.

Thanks,
Jesse Taylor

######--START CODE--######

#!/usr/bin/perl -w

#Given a text file containing URLs, this script will extract any IP
addresses or email address from said URLs

use LWP::Simple;

print "Enter location of URL list file: ";
chomp($infile=<STDIN> );
open INFILE, $infile
or die "Could not open file $infile";

print "Enter location at which to create output file: ";
chomp($outfile=<STDIN> );
open OUTFILE, ">$outfile"
or die "Could not open/create $outfile";

while($url=<INFILE> )
{
chomp($url);
$html=get("$url")
or die "Couldn't open page located at $url";

@ips = $html =~
/(\d{1,3}[0-255]\.\d{1,3}[0-255]\.\d{1,3}[0-255]\.\d{1,3}[0-255])/g;
#find and store IP addresses

@emails = $html =~ /(\w+\@\w+\.\w+)/g; #find email addresses and
store

push(@allips, @ips);
push(@allemails, @emails);

}
####remove duplicate array members####

for($i=0; $i<(scalar @allips); $i++)
{
for ($j=0; $j<(scalar @allips); $j++)
{
if ($allips[$i] eq $allips[$j] && $i!=$j)
{
splice(@allips, $j, 0);
}
}
}

####remove duplicate array members####

for($i=0; $i<(scalar @allemails); $i++)
{
for ($j=0; $j<(scalar @allemails); $j++)
{
if ($allemails[$i] eq $allemails[$j] && $i!=$j)
{
splice(@allemails, $j, 1);
}
}
}

####Store data in output file####

print OUTFILE "IP Addresses: \n";
foreach (@allips)
{
print OUTFILE "$_\n";
}

print OUTFILE "\nEmail Addresses: \n";

foreach (@allemails)
{
print OUTFILE "$_\n";
}

close(INFILE);
close(OUTFILE);
Graeme St. Clair

2005-01-25, 3:56 pm

I'm no expert on regex's, but IP @'s are not all that easy to match and
validate at the same time. There is a good discussion of precisely this
problem in Chap 4 of "Mastering Regular Expressions" by J E F Friedl, pub
O'Reilly, at pp 123-125. He concludes:-

"Sometimes it's better to take some of the work out of the regex. For
example, going back to

^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$

and wrapping each component in parentheses will stuff the numbers into $1,
$2, $3 and $4, which can then be validated by other programming constructs".

The discussion context is Perl specifically.

HTH, GStC.


-----Original Message-----
From: Jesse Taylor [mailto:jrtaylor@tulane.edu]
Sent: Sunday, January 23, 2005 5:20 PM
To: beginners@perl.org
Subject: Storing $DIGIT variables in arrays

<snip/>

while($url=<INFILE> )
{
chomp($url);
$html=get("$url")
or die "Couldn't open page located at $url";

@ips = $html =~
/(\d{1,3}[0-255]\.\d{1,3}[0-255]\.\d{1,3}[0-255]\.\d{1,3}[0-255])/g;
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com