| Author |
A couple of vague LWP questions
|
|
| Franklin H. 2005-04-25, 8:58 am |
| 1) When using LWP::Simple to grab a webpage the GET request
occasionally and irreproducibly appears to hang and does not return.
Any clue as to why this could conceivably occur? There doesn't appear
to be a way to set the request timeout with this particular module but
perhaps someone may know of a workaround?
2) When using LWP::UserAgent to grab the same webpage as above the
webserver somehow seems able to recognizes the request as coming from
an "automated tool". Any idea why this might possibly occure with
LWP::UserAgent but not with LWP::Simple?
TYIA,
Fr.
| |
| Franklin H. 2005-04-25, 8:58 am |
| > 2) When using LWP::UserAgent to grab the same webpage as above the
> webserver somehow seems able to recognizes the request as coming from
> an "automated tool". Any idea why this might possibly occure with
> LWP::UserAgent but not with LWP::Simple?
It would appear that the trick here is to set USERAGENt to something
other than the default "libwww-perl/#.##". Arbitrarily I chise:
$ua->agent('Mozilla/5.001');
| |
| Franklin H. 2005-04-25, 8:58 am |
|
> 2) When using LWP::UserAgent to grab the same webpage as above the
> webserver somehow seems able to recognizes the request as coming from
> an "automated tool". Any idea why this might possibly occure with
> LWP::UserAgent but not with LWP::Simple?
It would appear that the trick here is to set USERAGENT to something
other than the default "libwww-perl/#.##".
Arbitrarily I chose: $ua->agent('Mozilla/5.001');
| |
| Brian Wakem 2005-04-25, 8:58 am |
| Franklin H. wrote:
>
>
> It would appear that the trick here is to set USERAGENt to something
> other than the default "libwww-perl/#.##". Arbitrarily I chise:
>
> $ua->agent('Mozilla/5.001');
If you are trying to blend in with normal traffic then I suggest using -
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
- which is IE6 on Windows XP.
The answer to your other question is either use LWP::UserAgent and use the
timeout function provdied ( $ua->timeout( $secs ) ), or use alarm.
eval {
local $SIG{ALRM} = sub { die "timeout" };
alarm $secs;
$response = get($url);
alarm 0;
};
if ($@ =~ m/timeout/) {
# timed out
}
--
Brian Wakem
| |
| Franklin H. 2005-04-25, 8:58 am |
| Well I am tryting t9o make this platform independent and as such would
hate to run into problems with $SIG{ALRM} on XP.
Similarly, mightn't "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
" be suspicious if the request came from a LINUX OS?
| |
| Mark Clements 2005-04-25, 8:58 am |
| Franklin H. wrote:
> Well I am tryting t9o make this platform independent and as such would
> hate to run into problems with $SIG{ALRM} on XP.
>
> Similarly, mightn't "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
> " be suspicious if the request came from a LINUX OS?
>
Nah. The remote server only sees an HTTP request: it has no idea from
what type of system the request originated, other than what is in the
HTTP headers.
Mark
| |
| Charles DeRykus 2005-04-25, 8:58 am |
| In article <1114415731.248534.312010@l41g2000cwc.googlegroups.com>,
Franklin H. <franklin28@hushmail.com> wrote:
>1) When using LWP::Simple to grab a webpage the GET request
>occasionally and irreproducibly appears to hang and does not return.
>Any clue as to why this could conceivably occur? There doesn't appear
>to be a way to set the request timeout with this particular module but
>perhaps someone may know of a workaround?
>
LWP::Simple's is built on LWP::UserAgent so you can import
$ua and invoke a timeout,e.g:
use LWP qw($ua); $ua->timeout(10);
See LWP::Simple doc for discussion of above.
>2) When using LWP::UserAgent to grab the same webpage as above the
>webserver somehow seems able to recognizes the request as coming from
>an "automated tool". Any idea why this might possibly occure with
>LWP::UserAgent but not with LWP::Simple?
>
Some servers may be checking the user agent id. No idea why
LWP::Simple would slip by if that's the case. Again see
LWP::UserAgent vs LWP::Simple docs or how to alter setting.
hth,
--
Charles DeRykus
| |
| Joe Smith 2005-04-25, 8:58 am |
| Charles DeRykus wrote:
> Some servers may be checking the user agent id. No idea why
> LWP::Simple would slip by if that's the case.
perldoc LWP::UserAgent
the default agent identifier is "libwww-perl/#.##"
Line 43 of LWP/Simple.pm
$ua->agent("LWP::Simple/$LWP::VERSION");
|
|
|
|