For Programmers: Free Programming Magazines  


Home > Archive > PERL Beginners > August 2005 > Packages for writing screen scrapers









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Packages for writing screen scrapers
Siegfried Heintze

2005-08-19, 4:13 pm

[Siegfried Heintze] I've been using HTML::Parser with MySQL and I've had a
lot of problems with (both RAM and disk) memory leaks and multi-threading. I
was really disappointed, for example, discover that having multiple threads
did not really speed things up at all. I wonder if HTML::Parser is not
multi-threaded and blocks all my threads when there is a single outstanding
socket READ in progress?

I specifically chose DBI and MySQL so I could have multiple database
operations going concurrently.

HTML::Parser does not seem to do form submission either.

Well, I'm writing another scraper and thought I would experiment with some
different packages. I looked at WWW::Mechanize and it does form submission
but it only appears to follow links. I could not find any functions for
fetching and parsing the HTML.

I get a lot of matches on CPAN when I search for WWW. Can anyone recommend
some alternatives to HTML::Parser I could experiment with?

Thanks,
Siegfried


Scott R. Godin

2005-08-24, 6:56 pm

Siegfried Heintze wrote:
> [Siegfried Heintze] I've been using HTML::Parser with MySQL and I've had a
> lot of problems with (both RAM and disk) memory leaks and multi-threading. I
> was really disappointed, for example, discover that having multiple threads
> did not really speed things up at all. I wonder if HTML::Parser is not
> multi-threaded and blocks all my threads when there is a single outstanding
> socket READ in progress?
>
> I specifically chose DBI and MySQL so I could have multiple database
> operations going concurrently.
>
> HTML::Parser does not seem to do form submission either.
>
> Well, I'm writing another scraper and thought I would experiment with some
> different packages. I looked at WWW::Mechanize and it does form submission
> but it only appears to follow links. I could not find any functions for
> fetching and parsing the HTML.
>
> I get a lot of matches on CPAN when I search for WWW. Can anyone recommend
> some alternatives to HTML::Parser I could experiment with?
>
> Thanks,
> Siegfried
>
>


look here, and you'll see how I used WWW::Mechanize initially and then used the
object to further parse the response, while experimenting to teach myself some
OO skills.

http://www.webdragon.net/miscel/tinyurl.htm
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2009 codecomments.com