Home > Archive > PERL Beginners > December 2006 > HTML::TokeParser question
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
HTML::TokeParser question
|
|
| Mathew Snyder 2006-12-20, 3:59 am |
| I have a script which runs WWW::Mechanize to obtain a page so it can be parsed
for email addresses. However, I can't recall how I'm supposed to use
HTML::TokeParser to get what I need. This is the pertinent part of the script:
....
my $data = $agent->content();
my $parse = new HTML::TokeParser($data);
my @emails;
my $token;
while ($data) {
$token = $parse->get_trimmed_text("/small");
push @emails, $token;
}
foreach my $email (@emails){
print $email;
};
This gives me the error Can't call method "get_trimmed_text" on an undefined
value at ./check_delete_users.pl line 40.
I had this working at one point but lost the file. What am I missing?
Mathew
| |
| Mumia W. 2006-12-20, 3:59 am |
| On 12/19/2006 10:58 PM, Mathew Snyder wrote:
> I have a script which runs WWW::Mechanize to obtain a page so it can be parsed
> for email addresses. However, I can't recall how I'm supposed to use
> HTML::TokeParser to get what I need. This is the pertinent part of the script:
>
> ...
> my $data = $agent->content();
> my $parse = new HTML::TokeParser($data);
> my @emails;
> my $token;
>
> while ($data) {
> $token = $parse->get_trimmed_text("/small");
> push @emails, $token;
> }
>
> foreach my $email (@emails){
> print $email;
> };
>
> This gives me the error Can't call method "get_trimmed_text" on an undefined
> value at ./check_delete_users.pl line 40.
>
> I had this working at one point but lost the file. What am I missing?
>
>
> Mathew
>
It looks like the invokation of HTML::TokeParser->new() failed.
Read the documentation HTML::TokeParser to find out why. Hint: the "new"
method expects a simple scalar to be a file name.
Also, you get the content of the request not directly from the
user-agent object, but from the response object returned by the "get"
method.
| |
| Rob Dixon 2006-12-20, 9:58 pm |
| Mumia W. wrote:
> On 12/19/2006 10:58 PM, Mathew Snyder wrote:
>
> It looks like the invokation of HTML::TokeParser->new() failed.
>
> Read the documentation HTML::TokeParser to find out why. Hint: the "new"
> method expects a simple scalar to be a file name.
>
> Also, you get the content of the request not directly from the user-agent
> object, but from the response object returned by the "get" method.
No. A call to $agent->content is valid and does what is expected.
Rob
| |
| Rob Dixon 2006-12-20, 9:58 pm |
| Mathew Snyder wrote:
>
> I have a script which runs WWW::Mechanize to obtain a page so it can be parsed
> for email addresses. However, I can't recall how I'm supposed to use
> HTML::TokeParser to get what I need. This is the pertinent part of the
> script:
>
> ...
> my $data = $agent->content();
> my $parse = new HTML::TokeParser($data);
If you are supplying an HTML string directly then the 'new' method expects a
scalar reference. A simple scalar is assumed to be a filename, and checking the
return value from the constructor would have shown your problem.
my $parse = new HTML::TokeParser(\$data) or die $!;
> my @emails;
> my $token;
>
> while ($data) {
$data remains unchanged and will always be true. You need to fetch all the
<small> tags in the HTML and exit the loop when there are no more.
while ($parse->get_tag('small')) {
> $token = $parse->get_trimmed_text("/small");
> push @emails, $token;
> }
>
> foreach my $email (@emails){
> print $email;
> };
>
> This gives me the error Can't call method "get_trimmed_text" on an undefined
> value at ./check_delete_users.pl line 40.
>
> I had this working at one point but lost the file. What am I missing?
The rest should work.
HTH,
Rob
|
|
|
|
|