Home > Archive > PERL Beginners > August 2005 > Packages for writing screen scrapers
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Packages for writing screen scrapers
|
|
| Siegfried Heintze 2005-08-19, 4:13 pm |
| [Siegfried Heintze] I've been using HTML::Parser with MySQL and I've had a
lot of problems with (both RAM and disk) memory leaks and multi-threading. I
was really disappointed, for example, discover that having multiple threads
did not really speed things up at all. I wonder if HTML::Parser is not
multi-threaded and blocks all my threads when there is a single outstanding
socket READ in progress?
I specifically chose DBI and MySQL so I could have multiple database
operations going concurrently.
HTML::Parser does not seem to do form submission either.
Well, I'm writing another scraper and thought I would experiment with some
different packages. I looked at WWW::Mechanize and it does form submission
but it only appears to follow links. I could not find any functions for
fetching and parsing the HTML.
I get a lot of matches on CPAN when I search for WWW. Can anyone recommend
some alternatives to HTML::Parser I could experiment with?
Thanks,
Siegfried
| |
| Scott R. Godin 2005-08-24, 6:56 pm |
| Siegfried Heintze wrote:
> [Siegfried Heintze] I've been using HTML::Parser with MySQL and I've had a
> lot of problems with (both RAM and disk) memory leaks and multi-threading. I
> was really disappointed, for example, discover that having multiple threads
> did not really speed things up at all. I wonder if HTML::Parser is not
> multi-threaded and blocks all my threads when there is a single outstanding
> socket READ in progress?
>
> I specifically chose DBI and MySQL so I could have multiple database
> operations going concurrently.
>
> HTML::Parser does not seem to do form submission either.
>
> Well, I'm writing another scraper and thought I would experiment with some
> different packages. I looked at WWW::Mechanize and it does form submission
> but it only appears to follow links. I could not find any functions for
> fetching and parsing the HTML.
>
> I get a lot of matches on CPAN when I search for WWW. Can anyone recommend
> some alternatives to HTML::Parser I could experiment with?
>
> Thanks,
> Siegfried
>
>
look here, and you'll see how I used WWW::Mechanize initially and then used the
object to further parse the response, while experimenting to teach myself some
OO skills.
http://www.webdragon.net/miscel/tinyurl.htm
|
|
|
|
|