Home > Archive > PERL POE > May 2007 > PoCo::Client::HTTP using decoded content
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
PoCo::Client::HTTP using decoded content
|
|
| Daisuke Maki 2007-04-13, 4:23 am |
| Hi,
I'm trying to fetch pages using PoCo::Client::HTTP, and some pages are
turning out to be in a different encoding then what the content-type
header says. I've traced this to this change:
2006-10-25 06:55:14 (r294) by rcaputo
lib/POE/Component/Client/HTTP.pm M; t/14_gzipped_content.t A; MANIFEST
M; Makefile.PL M; lib/POE/Component/Client/HTTP/Request.pm M
Apply Rob Bloodgood's patch to transparently decode non-streaming
content before it's returned. This gives us support for gzip
compressed content. Resolves long-standing rt.cpan.org ticket 8454.
Can't this feature be optional?
This is how the problem is reproduced:
1) http://d.hatena.ne.jp/lestrrat/ is a page in euc-jp
2) the server supports gzip encoding
3) PoCo::Client::HTTP sends headers claiming it can
handle gzip encoding (which is fine)
4) In the response, content-encoding header is specified
5) HTTP::Response->decoded_content is called
6) in decoded_content, it handles the gzip encoding part
7) then in the next clause it goes on toe do the following
if ($ct && $ct =~ m,^text/,,) {
my $charset = $opt{charset} || $ct_param{charset} ||
$opt{default_charset} || "ISO-8859-1";
$charset = lc($charset);
if ($charset ne "none") {
require Encode;
if (do{my $v = $Encode::VERSION; $v =~ s/_//g; $v} < 2.0901 &&
!$content_ref_iscopy)
{
# LEAVE_SRC did not work before Encode-2.0901
my $copy = $$content_ref;
$content_ref = \$copy;
$content_ref_iscopy++;
}
$content_ref = \Encode::decode($charset, $$content_ref,
Encode::FB_CROAK() | Encode::LEAVE_SRC());
}
}
At the end , I have a request with content-type = 'text/hml;
charset=euc-jp', and yet the content is UTF-8.
I realize it may be a problem in HTTP::Message more so than POE, but I'd
rather be able to turn off this feature by, for example, being able to
NOT send the accept-encoding header.
Can something like that be done?
--d
| |
| Rocco Caputo 2007-04-16, 7:16 pm |
| On Apr 13, 2007, at 02:22, Daisuke Maki wrote:
> At the end , I have a request with content-type = 'text/hml;
> charset=euc-jp', and yet the content is UTF-8.
>
> I realize it may be a problem in HTTP::Message more so than POE,
> but I'd
> rather be able to turn off this feature by, for example, being able to
> NOT send the accept-encoding header.
>
> Can something like that be done?
Currently it's entirely dependent on whether the server sends back a
content-encoding header. There's no guarantee that a web server will
refrain from sending a content-encoding response header if we avoid
sending an accept-encoding request header.
So the proper way to do this will be some kind of option, either
associated with the request or with the component. It'll take time
to make this optional in a compatible way, and I'm not available to
do this right now.
If someone would like to take a stab at a patch, I'll be happy to
apply it.
--
Rocco Caputo - rcaputo@pobox.com
| |
|
|
|
|
|
|
|
|
|