For Programmers: Free Programming Magazines  


Home > Archive > PERL Beginners > November 2007 > what's wrong with my http header









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author what's wrong with my http header
Francois

2007-11-20, 7:00 pm

I tried to get data from a site which use cookies and redirect the
user, I spend a lot of time with the same result: connection timed out
until I realised that all was fine if I did'nt send the header...

Thanks for any explanations !!!
Francois

here is my code:

use strict;
use warnings;

use LWP;
use HTML::Parser;
use HTML::FormatText;
use HTML::Tree;
# use DateTime::Duration;
use HTTP::Headers;
use HTTP::Cookies;
use HTTP::Cookies::Netscape;
use CGI qw(header -no_debug);


my $h = HTTP::Headers->new(
Accept => "text/xml,application/xml,application/xhtml+xml,text/
html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5",
Host => "www.unifr.ch",
);

$h->server("Apache/2.0.46 (Red Hat)");
$h->user_agent("Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:
1.8.1.9) Gecko/20071025 Firefox/2.0.0.9");

my $reflink = "http://linkinghub.elsevier.com/retrieve/pii/
S0020138307000095";


my $c = HTTP::Cookies::Netscape->new(file=>'cookies.txt',
autosave=>"1");
my $ua_short = LWP::UserAgent->new(cookie_jar => $c, timeout=>
20);
$ua_short->agent("Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:
1.8.1.9) Gecko/20071025 Firefox/2.0.0.9");
# with this line the header is send with my request and it does
not work
# my $req = HTTP::Request->new(GET=>$reflink, $h);

#with this line it's ok ....
my $req = HTTP::Request->new(GET=>$reflink);

my $response =$ua_short->request($req);
print header;
print $response->status_line,"\n";
my $formatter = HTML::FormatText->new();

if ($response->is_success) {
my $tree = HTML::TreeBuilder->new->parse($response->content);
my $ascii = $formatter->format($tree);
$tree->delete();
print $ascii;
}

Rob Dixon

2007-11-20, 10:04 pm

Francois wrote:
> I tried to get data from a site which use cookies and redirect the
> user, I spend a lot of time with the same result: connection timed out
> until I realised that all was fine if I did'nt send the header...
>
> Thanks for any explanations !!!
> Francois
>
> here is my code:
>
> use strict;
> use warnings;
>
> use LWP;
> use HTML::Parser;
> use HTML::FormatText;
> use HTML::Tree;
> # use DateTime::Duration;
> use HTTP::Headers;
> use HTTP::Cookies;
> use HTTP::Cookies::Netscape;
> use CGI qw(header -no_debug);
>
>
> my $h = HTTP::Headers->new(
> Accept => "text/xml,application/xml,application/xhtml+xml,text/
> html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5",
> Host => "www.unifr.ch",
> );
>
> $h->server("Apache/2.0.46 (Red Hat)");
> $h->user_agent("Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:
> 1.8.1.9) Gecko/20071025 Firefox/2.0.0.9");
>
> my $reflink = "http://linkinghub.elsevier.com/retrieve/pii/
> S0020138307000095";
>
>
> my $c = HTTP::Cookies::Netscape->new(file=>'cookies.txt',
> autosave=>"1");
> my $ua_short = LWP::UserAgent->new(cookie_jar => $c, timeout=>
> 20);
> $ua_short->agent("Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:
> 1.8.1.9) Gecko/20071025 Firefox/2.0.0.9");
> # with this line the header is send with my request and it does
> not work
> # my $req = HTTP::Request->new(GET=>$reflink, $h);
>
> #with this line it's ok ....
> my $req = HTTP::Request->new(GET=>$reflink);
>
> my $response =$ua_short->request($req);
> print header;
> print $response->status_line,"\n";
> my $formatter = HTML::FormatText->new();
>
> if ($response->is_success) {
> my $tree = HTML::TreeBuilder->new->parse($response->content);
> my $ascii = $formatter->format($tree);
> $tree->delete();
> print $ascii;
> }


Hi Francois.

As a general rule it's polite to reduce code as much as possible before
posting it here to ask for help: there's a lot of junk in here that
isn't relevant to the problem and just needs to be waded through before
we can give you an answer.

What's going wrong is that you have a Host header value of www.unifr.ch
but you are sending the request to linkinghub.elsevier.com, which
doesn't have a host of that name and so doesn't reply.

But that's a huge amount of code just to fetch a web page! You may need
some of that stuff but I can't see how you would want all of it. How
about just

my $ua = LWP::UserAgent->new;
my $resp =
$ua->get('http://linkinghub.elsevier.com/retrieve/pii/S0020138307000095');

which seems to me to do the same thing.

HTH,

Rob
Francois

2007-11-21, 10:01 pm

On Nov 21, 2:33 am, rob.di...@350.com (Rob Dixon) wrote:
> Francois wrote:
>
>
>
>
>
>
>
>
>
>
>
>
> Hi Francois.
>
> As a general rule it's polite to reduce code as much as possible before
> posting it here to ask for help: there's a lot of junk in here that
> isn't relevant to the problem and just needs to be waded through before
> we can give you an answer.
>
> What's going wrong is that you have a Host header value ofwww.unifr.ch
> but you are sending the request to linkinghub.elsevier.com, which
> doesn't have a host of that name and so doesn't reply.
>
> But that's a huge amount of code just to fetch a web page! You may need
> some of that stuff but I can't see how you would want all of it. How
> about just
>
> my $ua = LWP::UserAgent->new;
> my $resp =
> $ua->get('http://linkinghub.elsevier.com/retrieve/pii/S0020138307000095');
>
> which seems to me to do the same thing.
>
> HTH,
>
> Rob


Hi Rob

Many thanks for educating me and for the answer. I tried to post to
libwwww forum without having an answer yet. My wrong host in the
header explains also the troubles I hade with cookies (witch was the
topic on my post there)
Thanks again !
Francois

Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com