Home > Archive > PERL Beginners > November 2007 > what's wrong with my http header
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
what's wrong with my http header
|
|
| Francois 2007-11-20, 7:00 pm |
| I tried to get data from a site which use cookies and redirect the
user, I spend a lot of time with the same result: connection timed out
until I realised that all was fine if I did'nt send the header...
Thanks for any explanations !!!
Francois
here is my code:
use strict;
use warnings;
use LWP;
use HTML::Parser;
use HTML::FormatText;
use HTML::Tree;
# use DateTime::Duration;
use HTTP::Headers;
use HTTP::Cookies;
use HTTP::Cookies::Netscape;
use CGI qw(header -no_debug);
my $h = HTTP::Headers->new(
Accept => "text/xml,application/xml,application/xhtml+xml,text/
html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5",
Host => "www.unifr.ch",
);
$h->server("Apache/2.0.46 (Red Hat)");
$h->user_agent("Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:
1.8.1.9) Gecko/20071025 Firefox/2.0.0.9");
my $reflink = "http://linkinghub.elsevier.com/retrieve/pii/
S0020138307000095";
my $c = HTTP::Cookies::Netscape->new(file=>'cookies.txt',
autosave=>"1");
my $ua_short = LWP::UserAgent->new(cookie_jar => $c, timeout=>
20);
$ua_short->agent("Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:
1.8.1.9) Gecko/20071025 Firefox/2.0.0.9");
# with this line the header is send with my request and it does
not work
# my $req = HTTP::Request->new(GET=>$reflink, $h);
#with this line it's ok ....
my $req = HTTP::Request->new(GET=>$reflink);
my $response =$ua_short->request($req);
print header;
print $response->status_line,"\n";
my $formatter = HTML::FormatText->new();
if ($response->is_success) {
my $tree = HTML::TreeBuilder->new->parse($response->content);
my $ascii = $formatter->format($tree);
$tree->delete();
print $ascii;
}
| |
| Rob Dixon 2007-11-20, 10:04 pm |
| Francois wrote:
> I tried to get data from a site which use cookies and redirect the
> user, I spend a lot of time with the same result: connection timed out
> until I realised that all was fine if I did'nt send the header...
>
> Thanks for any explanations !!!
> Francois
>
> here is my code:
>
> use strict;
> use warnings;
>
> use LWP;
> use HTML::Parser;
> use HTML::FormatText;
> use HTML::Tree;
> # use DateTime::Duration;
> use HTTP::Headers;
> use HTTP::Cookies;
> use HTTP::Cookies::Netscape;
> use CGI qw(header -no_debug);
>
>
> my $h = HTTP::Headers->new(
> Accept => "text/xml,application/xml,application/xhtml+xml,text/
> html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5",
> Host => "www.unifr.ch",
> );
>
> $h->server("Apache/2.0.46 (Red Hat)");
> $h->user_agent("Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:
> 1.8.1.9) Gecko/20071025 Firefox/2.0.0.9");
>
> my $reflink = "http://linkinghub.elsevier.com/retrieve/pii/
> S0020138307000095";
>
>
> my $c = HTTP::Cookies::Netscape->new(file=>'cookies.txt',
> autosave=>"1");
> my $ua_short = LWP::UserAgent->new(cookie_jar => $c, timeout=>
> 20);
> $ua_short->agent("Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:
> 1.8.1.9) Gecko/20071025 Firefox/2.0.0.9");
> # with this line the header is send with my request and it does
> not work
> # my $req = HTTP::Request->new(GET=>$reflink, $h);
>
> #with this line it's ok ....
> my $req = HTTP::Request->new(GET=>$reflink);
>
> my $response =$ua_short->request($req);
> print header;
> print $response->status_line,"\n";
> my $formatter = HTML::FormatText->new();
>
> if ($response->is_success) {
> my $tree = HTML::TreeBuilder->new->parse($response->content);
> my $ascii = $formatter->format($tree);
> $tree->delete();
> print $ascii;
> }
Hi Francois.
As a general rule it's polite to reduce code as much as possible before
posting it here to ask for help: there's a lot of junk in here that
isn't relevant to the problem and just needs to be waded through before
we can give you an answer.
What's going wrong is that you have a Host header value of www.unifr.ch
but you are sending the request to linkinghub.elsevier.com, which
doesn't have a host of that name and so doesn't reply.
But that's a huge amount of code just to fetch a web page! You may need
some of that stuff but I can't see how you would want all of it. How
about just
my $ua = LWP::UserAgent->new;
my $resp =
$ua->get('http://linkinghub.elsevier.com/retrieve/pii/S0020138307000095');
which seems to me to do the same thing.
HTH,
Rob
| |
| Francois 2007-11-21, 10:01 pm |
| On Nov 21, 2:33 am, rob.di...@350.com (Rob Dixon) wrote:
> Francois wrote:
>
>
>
>
>
>
>
>
>
>
>
>
> Hi Francois.
>
> As a general rule it's polite to reduce code as much as possible before
> posting it here to ask for help: there's a lot of junk in here that
> isn't relevant to the problem and just needs to be waded through before
> we can give you an answer.
>
> What's going wrong is that you have a Host header value ofwww.unifr.ch
> but you are sending the request to linkinghub.elsevier.com, which
> doesn't have a host of that name and so doesn't reply.
>
> But that's a huge amount of code just to fetch a web page! You may need
> some of that stuff but I can't see how you would want all of it. How
> about just
>
> my $ua = LWP::UserAgent->new;
> my $resp =
> $ua->get('http://linkinghub.elsevier.com/retrieve/pii/S0020138307000095');
>
> which seems to me to do the same thing.
>
> HTH,
>
> Rob
Hi Rob
Many thanks for educating me and for the answer. I tried to post to
libwwww forum without having an answer yet. My wrong host in the
header explains also the troubles I hade with cookies (witch was the
topic on my post there)
Thanks again !
Francois
|
|
|
|
|