Code Comments

Programming Forum and web based access to our favorite programming groups.
For Programmers: Free Programming Magazines | New: Database administration forum
Registration is free! Edit your profileCalendarFind other membersFrequently Asked QuestionsSearch -> 
Post New Thread











Thread
Author

Trying to scape data out of HTML
I am trying to scrape a website and grab out some data. Problem is Im
getting the data I want but lots of extra data as well, due to my crap
perl :)

Thanks for anyhelp !!!
Perl Newbie

Here is the snippet of data Im trying to get from the HTML.
---------------------------------------------------------------------------
<table bgcolor="#DDDDDD" cellspacing="0" cellpadding="2" style="border:
1px solid black; margin-top: 10px; margin-bottom: 5px;" width="98%">
<tr bgcolor="#B5BBC5"><td style="border-bottom: 1px solid
black;"><b>My Submissions</b></td></tr>
<tr><td class="textsm">
<center><a class="i"
href="challenge_stats.php?action=comments&IMAGE_ID=206144"><img
src="http://images.dpchallenge.com/images_challenge/361/thumb/206144.jpg"
width="120" height="83" style="border: 1px solid black" ><br>When
Chairs Sleep</a><br><a
href="challenge_vote_list.php?CHALLENGE_ID=361">Wooden</a></center><table><t
r><td
colspan="2" class="textsm2"></td></tr><tr><td align="right"
class="textsm2">Votes:</td>   <td class="textsm2"><b>129

</b></td></tr><tr><td align="right" class="textsm2">Views:</td><td
class="textsm2"><b>194
</td></tr><tr><td align="right" class="textsm2">Avg Vote:</td><td
class="textsm2"><b>5.9070
</b></td></tr><tr><td align="right" class="textsm2">Comments:</td><td
class="textsm2"><b><a
href="challenge_stats.php?action=comments&IMAGE_ID=206144">14
</a></b></td></tr><tr><td align="right"
class="textsm2">Favorites:</td><td class="textsm2"><b>1
</td></tr><tr><td align="right" class="textsm2">Wish Lists:</td><td
class="textsm2"><b>0
</td></tr><tr><td align="right" class="textsm2">Updated:</td> <td
class="textsm2"><b>07/27/05 09:45 am</b></td></tr><tr><td
class="textsm2"><br></td></tr></table><center><a class="i"
href="challenge_stats.php?action=comments&IMAGE_ID=207836"><img
src="http://images.dpchallenge.com/images_challenge/362/thumb/207836.jpg"
width="76" height="120" style="border: 1px solid black" ><br>Towering
Scaffolds</a><br><a
href="challenge_vote_list.php?CHALLENGE_ID=362">Tools of the
Trade</a></center><table><tr><td colspan="2"
class="textsm2"></td></tr><tr><td align="right"
class="textsm2">Votes:</td>   <td class="textsm2"><b>53
</b></td></tr><tr><td align="right" class="textsm2">Views:</td><td
class="textsm2"><b>60
</td></tr><tr><td align="right" class="textsm2">Avg Vote:</td><td
class="textsm2"><b>4.5472
</b></td></tr><tr><td align="right" class="textsm2">Comments:</td><td
class="textsm2"><b><a
href="challenge_stats.php?action=comments&IMAGE_ID=207836">1
</a></b></td></tr><tr><td align="right"
class="textsm2">Favorites:</td><td class="textsm2"><b>0
</td></tr><tr><td align="right" class="textsm2">Wish Lists:</td><td
class="textsm2"><b>0
</td></tr><tr><td align="right" class="textsm2">Updated:</td> <td
class="textsm2"><b>07/27/05 09:45 am</b></td></tr><tr><td
class="textsm2"><br></td></tr></table><div align="right">[ <a
href="challenge_stats.php?action=update">Update</a> ]</div>		</td></tr>
</table>

----------------------------------------------------------------------------
Here is what Im getting back. I only want the section from "When Chairs
Sleep down to "Site News"

When Chairs Sleep Wooden        Votes:     129
Views:  194
Avg Vote:  5.9070
Comments:  14
Favorites:  1
Wish Lists:  0
Updated:   07/27/05 09:45 am         [IMG] Towering Scaffolds Tools
of the Trade        Votes:     53
Views:  60
Avg Vote:  4.5472
Comments:  1
Favorites:  0
Wish Lists:  0
Updated:   07/27/05 09:45 am         [ Update ]


Site News

05.29.05- Small Site Update    A few pages on the site received minor
updates today.  Read more here.
04.20.05- Lenses are Here    Please update your equipment.  More about
the new feature here.
03.13.05- Important Reminders    Please read these important reminders
about problem forum posts, voting and requesting DQ.
01.01.05- Portfolio Limit Increased    Member portfolios have been
increased from 10mb to 25mb.  You may need to log out and back in again
to see the change.  Happy New Year!
Member Challenge Winners :: At the Zoo (Jul. 11 2005 - Jul. 17 2005)
[IMG]
1st PlaceFar From Home scalvert
Print Available!  [IMG]
2nd PlaceFeline love arpita
Print Available!  [IMG]
3rd PlaceSummer Song CeeDeez
Member Challenge Winners :: Independence (Jul. 10 2005 - Jul. 16
2005)    [IMG]
1st PlaceSymbol of... kosmikkreeper
[IMG]
2nd PlaceNature photographer heida
[IMG]

--------------------------------------------------
Here my script
#!/usr/bin/perl

# Include the WWW::Mechanize module
use WWW::Mechanize;
use WWW::Mechanize::FormFiller;
use HTML::TokeParser;
use diagnostics;
use Data::Dumper;

my $browser = WWW::Mechanize->new;

$url="http://www.dpchallenge.com/login.php";
my $user="myusername";
my $pass="mypassword";
my $resp1=$browser->get($url);
print "looking at: ", $browser->uri, "\n";

#print $resp1->content;
# print $browser->res->content;
#
$browser->form("frmLogin");

$browser->set_fields(
"USERNAME" => $user,
"PASSWORD" => $pass,

);

$browser->click();

#print $browser->res->content;

# Parse the HTML
my $stream = HTML::TokeParser->new(\$browser->res->content);

while (my $token = $stream->get_tag("br") ){
my $challenge = $token->[1]{href};
my $text = $stream->get_text("/br");
print "$challenge\t$text\n";
}
~


Report this thread to moderator Post Follow-up to this message
Old Post
jseyerle
07-27-05 10:02 PM


Sponsored Links




Last Thread Next Thread Next
Search this forum -> 
Post New Thread

PERL Beginners archive

Show a Printable Version Send to friend Email This Page to Someone! subscribe to this thread Receive updates to this thread
Computer Consultants
Programming Jobs
Visual Basic Controls
SQL Server Programming
Webservices
Java Security
Visual Studio
C# Programming
Visual J++
Software engineering
Open source Software
Perl Programming
PHP Programming
ASP Programming
ASP .NET Programming
Visual Basic Programming
Windows Scripting Host
Java Programming
Java Help
Java Beans
VBScript
Cobol
MAC Applications
Unix Programming
Forum Jump:
All times are GMT. The time now is 06:38 AM.

 
Free MCSE Braindumps | Real Estate Topics

Programming forum archive

Copyrights CodeComments.com 2004 - 2006

Powered by vBulletin Copyright 2000-2006 Jelsoft Enterprises Limited.