For Programmers: Free Programming Magazines  


Home > Archive > PERL Beginners > August 2006 > Need some help filtering thru results









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Need some help filtering thru results
Mike Blezien

2006-08-30, 6:57 pm

Hello,

We need to grab some data from a webpage fetch via the LWP module. This is the
coding and
the $resultdata below, need to regrex out various data, indicated by the [ ]
brackets... see below for further explainations.
My regrex is not very strong and need to some help figuring out the best way to
do this.
========================================
=======================
#!/usr/bin/perl
BEGIN { open (STDERR, ">./mandy_error.log"); }
use CGI::Carp qw(fatalsToBrowser);
use CGI qw(:standard);
use HTTP::Request;
use LWP::UserAgent;
use strict;

my $agent = "Thunder Rain Scraper";
my $adminemail = 'mickalo@frontiernet.net';
my $urltofetch =
'http://www.mandy.com/1/jobs2.cfm?terr=usny&skill=crw&paid=no&p=';

my $resultdata = fetch_results($urltofetch);

print header();

if(defined($resultdata))
{
# process resulting data returned
$resultdata =~ s/&/&/ig;
$resultdata =~ s/ / /ig;

LOOP:
for my $lines ( split(/\n/,$resultdata) )
{
if($lines =~ /<tr class=\"main\"/i) # THIS IS NOT WORKING.
{
# DO STUFF HERE -
}
}
}
else
{
print qq~\nNo Result Data Returned\r\n~;
}
print qq~\nProcess Completed\n~;
exit();

sub fetch_results {
my $url = shift();

# MAIN
my $ua = new LWP::UserAgent; # create a new LWP agent
$ua->from($adminemail); # set HTTP From
$ua->agent($agent); # set Agent-Name

# retrieve the file from $url
my $request = new HTTP::Request GET => $url;
my $response = $ua->request($request);

# return content
if ($response->is_success()) { return $response->content(); }
else { return undef; }
}

__END__
========================================
===========================

Now the data returned, we need to filter out all except where it has <!-- START
GRABBING RESULT HERE -->
till the <!-- END RESULT HERE --> I need to grab the data within the [ ]
brackets. Those brackets [ ] I inserted for clarification, there not normally
there. And go through each <tr class="main"> (.*?)</tr> table cell up to the end
of the </table>

########################################
########################################
######
# FILTET TO RESULTS
.... A BUNCH HEADER STUFF HERE ....

# START TABLE HERE
<table border="0" width="100%" cellpadding="5" cellspacing="0">
<tr class="dbluetoppedbox" bgcolor="#E6EFF8"><td valign="TOP">
<span
class="main">Vacancy</span>          
    </td><td valign="TOP"><span class="main">Employer</span>
       </td><td valign="TOP" nowrap><span
class="main">
Where (Ad posted)</span></td>
<td valign="TOP"><span class="main">Duration</span></td>
<td valign="TOP" nowrap><span class="main">Pay</span></td>
</tr>

<!-- START GRABBING RESULT HERE -->
<tr class="main"><td valign="TOP"><a href="[jobs3.cfm?v=18327933]">
[Camera Operator/ Video Editor]</a></td><td valign="TOP">[BigbreakNy]</td>
<td valign="TOP">[Manhattan and Union ]([30 Aug ])</td>
<td valign="TOP">[ASAP / A few days of shooting]</td><td
valign="TOP">[Lo/no]</td>
</tr>
# NEXT ROW CELL
<tr class="main"><td valign="TOP"><a href="[jobs3.cfm?v=18326674]">
[Video Sub]</a></td><td valign="TOP">[Blue Man Group]</td><td valign="TOP">[New
York (30 Aug)]
</td><td valign="TOP">[ASAP / open ended]</td><td
valign="TOP">[Paid]</td></tr>
# NEXT ROW CELL
.......

<!-- END RESULT GRABBING HERE -->
</table>

Mike(mickalo)Blezien
===============================
Thunder Rain Internet Publishing
Providing Internet Solution that Work
http://www.thunder-rain.com
===============================

Charles K. Clarkson

2006-08-30, 9:57 pm

Mike Blezien wrote:

: My regrex is not very strong and need to some help figuring
: out the best way to do this.

I don't use regular expressions to parse HTML. I generally
use HTML::TokeParser. The documentation is pretty good and I
can step through the markup the same way I read it.


HTH,

Charles K. Clarkson
--
Mobile Homes Specialist
Free Market Advocate
Web Programmer

254 968-8328

Don't tread on my bandwidth. Trim your posts.

Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com