For Programmers: Free Programming Magazines  


Home > Archive > PERL Modules > October 2005 > LWP::UserAgent and redirected page responses









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author LWP::UserAgent and redirected page responses
Bill

2005-10-21, 6:56 pm

Hello. This concerns LWP::UserAgent. If a request is sent to a certain
web site, the response in the browser comes back as a completely
different domain and site due to redirection. How do I find out, from
the UserAgent module, what the redirected url is? A uri() method as in
WWW::Mechanize seems like a good candidate but when checked the uri()
method seems to return the original request uri, not the redirected
one. I need to know exactly what would be in the url box of a web
browser after the redirected response happens. Anyone know how to do
this? I will post code if requested.

Paul Lalli

2005-10-21, 6:56 pm

Bill wrote:
> Hello. This concerns LWP::UserAgent. If a request is sent to a certain
> web site, the response in the browser comes back as a completely
> different domain and site due to redirection. How do I find out, from
> the UserAgent module, what the redirected url is? A uri() method as in
> WWW::Mechanize seems like a good candidate but when checked the uri()
> method seems to return the original request uri, not the redirected
> one. I need to know exactly what would be in the url box of a web
> browser after the redirected response happens. Anyone know how to do
> this? I will post code if requested.


[Disclaimer: All of the below is gleaned from reading the relevant
docs. I have not tried any LWP code myself ]

The LWP::UserAgent object sends a request to a server by means of the
post() method. The return value of the post() method is an object of
HTTP::Response. The HTTP::Response man page shows that one of its
methods is request(), which is defined as follows:
$r->request
$r->request( $request )

This is used to get/set the request attribute. The request attribute
is a
reference to the the request that caused this response. It does not
have
to be the same request passed to the $ua->request() method, because
there
might have been redirects and authorization retries in between.

To find out what we can get from that object, we look to HTTP::Request,
which has this method:
$r->uri
$r->uri( $val )

This is used to get/set the uri attribute. The $val can be a
reference to
a URI object or a plain string. If a string is given, then it should
be
parseable as an absolute URI.

Putting it altogether then:

my $ua = new HTTP::UserAgent;
my $response = $ua->post($url);
my $request = $response->request();
my $found_url = $request->uri();

Hope this helps,
Paul Lalli

William Herrera

2005-10-21, 9:55 pm

Paul Lalli wrote:

> To find out what we can get from that object, we look to HTTP::Request,
> which has this method:
> $r->uri
> $r->uri( $val )
>
> This is used to get/set the uri attribute. The $val can be a
> reference to
> a URI object or a plain string. If a string is given, then it should
> be
> parseable as an absolute URI.
>
> Putting it altogether then:
>
> my $ua = new HTTP::UserAgent;
> my $response = $ua->post($url);
> my $request = $response->request();
> my $found_url = $request->uri();
>
> Hope this helps,
> Paul Lalli
>


Sorry that I did not post code the last time.
Here's an excerpt from the method in question:

-----------------------------------------
# log in to my.tmobile.com (T-Mobile USA) and
# return hashref keyed to total charged minutes (not free) and
# total charged SMS messaging. Keys are 'calls' and 'messages'
sub get_billing {
my ($self) = @_;
$self->{start_page} = $base_uri;
$self->{agent} =
new WWW::Mechanize(
agent => "Mozilla/4.0 (compatible; MSIE 7.0b; Perl $])",
);
$self->{agent}->get($base_uri);
$self->{agent}->form_name("Form1") or croak $self->content;

# Even though WWW:Mechanize does most of the work, we have to
# manually change readonly on hidden fields. Annoying.
my $input = $self->{agent}->current_form->find_input('__EVENTTARGET')
or $self->_err("Cannot find hidden field for signin in Form1");
no warnings;
$input->readonly(0);
use warnings;
$self->{agent}->set_fields(
'txtMSISDN' => $self->{user_number},
'txtPassword' => $self->{password},
'__EVENTTARGET' => 'signin',
);
$self->{agent}->submit
or $self->_err("Could not submit form1 successfully");
$self->{agent}->get("https://my.t-mobile.com/Billing/")
or $self->_err("Cannot get Billing page: ");
print "Line uri: ", $self->{agent}->uri, "\n";

# cut for brevity here....
}

The problem is that the second uri printed is NOT the same as the uri
displayed in the url line of the browser doing the same tasks, even
though the CONTENT of the FIRST request's response text is correct. As a
result, the user agent fails to correctly submit the next click, since
the base URL is now incorrect. I cannot just plug in a fixed url there,
since the redirected URL contains some cookie-like values needed by the
host.

Ideas?
John Bokma

2005-10-21, 9:55 pm

William Herrera
<spamnotwanted-use-first-initial-then-last-name@skylightview.com> wrote:

> The problem is that the second uri printed is NOT the same as the uri
> displayed in the url line of the browser doing the same tasks, even
> though the CONTENT of the FIRST request's response text is correct. As
> a result, the user agent fails to correctly submit the next click,
> since the base URL is now incorrect. I cannot just plug in a fixed url
> there, since the redirected URL contains some cookie-like values
> needed by the host.
>
> Ideas?


Might be the UserAgent or any other header that triggers this behaviour. If
the uri you get back contains the "cookie-like" values, you can tweak them
into the URL you know.

--
John Small Perl scripts: http://johnbokma.com/perl/
Perl programmer available: http://castleamber.com/
I ploink googlegroups.com :-)

William Herrera

2005-10-21, 9:55 pm

John Bokma wrote:
>
>
> Might be the UserAgent or any other header that triggers this behaviour. If
> the uri you get back contains the "cookie-like" values, you can tweak them
> into the URL you know.


Yes, and the "cookie-like" values seem to be a per-session ID that
changes. So, I need to know that the uri is that I get back. Which LWP
does not seem to keep anywhere--it keeps the original, non-redirected
uri instead?
John Bokma

2005-10-22, 3:55 am

William Herrera
<spamnotwanted-use-first-initial-then-last-name@skylightview.com> wrote:

> John Bokma wrote:
>
> Yes, and the "cookie-like" values seem to be a per-session ID that
> changes. So, I need to know that the uri is that I get back. Which LWP
> does not seem to keep anywhere--it keeps the original, non-redirected
> uri instead?


If there is a redirect, LWP stores this info. IIRC in debug mode you can
see what is happening. Another trick is to set the redirect level to 0, to
1, etc.

I am sure there are little (Perl) proxy programs available that show you
exactly what is being send out, and comes back.

Also, try with a browser with JavaScript off, since that is what LWP is
doing.

--
John Small Perl scripts: http://johnbokma.com/perl/
Perl programmer available: http://castleamber.com/
I ploink googlegroups.com :-)

William Herrera

2005-10-22, 3:55 am

John Bokma wrote:

> If there is a redirect, LWP stores this info. IIRC in debug mode you can
> see what is happening. Another trick is to set the redirect level to 0, to
> 1, etc.
>


Thanks. Using LWP::DebugFile shows that LWP correctly GETS the URL which
the browser displays, yet the uri() method returns the initial URL, not
the finally redirected one. Weird. I suppose I could check the tail of
the LWP::DebugFile as the program progresses, but that seems so clumsy.
There ought to be a method or value inside UserAgent that I can use?
William Herrera

2005-10-22, 9:55 pm


>
> If there is a redirect, LWP stores this info. IIRC in debug mode you can
> see what is happening. Another trick is to set the redirect level to 0, to
> 1, etc.
>
> I am sure there are little (Perl) proxy programs available that show you
> exactly what is being send out, and comes back.
>
> Also, try with a browser with JavaScript off, since that is what LWP is
> doing.
>


I wrote an itty bitty module to fix the problem (currently calling it
LWP::LastURI). So now things work okay. Thanks for the suggestion to
look at the LWP debug output.

--Bill
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com