For Programmers: Free Programming Magazines  


Home > Archive > PERL Beginners > December 2006 > HTML::TokeParser question









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author HTML::TokeParser question
Mathew Snyder

2006-12-20, 3:59 am

I have a script which runs WWW::Mechanize to obtain a page so it can be parsed
for email addresses. However, I can't recall how I'm supposed to use
HTML::TokeParser to get what I need. This is the pertinent part of the script:

....
my $data = $agent->content();
my $parse = new HTML::TokeParser($data);
my @emails;
my $token;

while ($data) {
$token = $parse->get_trimmed_text("/small");
push @emails, $token;
}

foreach my $email (@emails){
print $email;
};

This gives me the error Can't call method "get_trimmed_text" on an undefined
value at ./check_delete_users.pl line 40.

I had this working at one point but lost the file. What am I missing?


Mathew
Mumia W.

2006-12-20, 3:59 am

On 12/19/2006 10:58 PM, Mathew Snyder wrote:
> I have a script which runs WWW::Mechanize to obtain a page so it can be parsed
> for email addresses. However, I can't recall how I'm supposed to use
> HTML::TokeParser to get what I need. This is the pertinent part of the script:
>
> ...
> my $data = $agent->content();
> my $parse = new HTML::TokeParser($data);
> my @emails;
> my $token;
>
> while ($data) {
> $token = $parse->get_trimmed_text("/small");
> push @emails, $token;
> }
>
> foreach my $email (@emails){
> print $email;
> };
>
> This gives me the error Can't call method "get_trimmed_text" on an undefined
> value at ./check_delete_users.pl line 40.
>
> I had this working at one point but lost the file. What am I missing?
>
>
> Mathew
>


It looks like the invokation of HTML::TokeParser->new() failed.

Read the documentation HTML::TokeParser to find out why. Hint: the "new"
method expects a simple scalar to be a file name.

Also, you get the content of the request not directly from the
user-agent object, but from the response object returned by the "get"
method.


Rob Dixon

2006-12-20, 9:58 pm

Mumia W. wrote:
> On 12/19/2006 10:58 PM, Mathew Snyder wrote:
>
> It looks like the invokation of HTML::TokeParser->new() failed.
>
> Read the documentation HTML::TokeParser to find out why. Hint: the "new"
> method expects a simple scalar to be a file name.
>
> Also, you get the content of the request not directly from the user-agent
> object, but from the response object returned by the "get" method.


No. A call to $agent->content is valid and does what is expected.

Rob

Rob Dixon

2006-12-20, 9:58 pm

Mathew Snyder wrote:
>
> I have a script which runs WWW::Mechanize to obtain a page so it can be parsed
> for email addresses. However, I can't recall how I'm supposed to use
> HTML::TokeParser to get what I need. This is the pertinent part of the
> script:
>
> ...
> my $data = $agent->content();
> my $parse = new HTML::TokeParser($data);


If you are supplying an HTML string directly then the 'new' method expects a
scalar reference. A simple scalar is assumed to be a filename, and checking the
return value from the constructor would have shown your problem.

my $parse = new HTML::TokeParser(\$data) or die $!;

> my @emails;
> my $token;
>
> while ($data) {


$data remains unchanged and will always be true. You need to fetch all the
<small> tags in the HTML and exit the loop when there are no more.

while ($parse->get_tag('small')) {

> $token = $parse->get_trimmed_text("/small");
> push @emails, $token;
> }
>
> foreach my $email (@emails){
> print $email;
> };
>
> This gives me the error Can't call method "get_trimmed_text" on an undefined
> value at ./check_delete_users.pl line 40.
>
> I had this working at one point but lost the file. What am I missing?


The rest should work.

HTH,

Rob

Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com