Home > Archive > PHP Programming > February 2005 > Post response embeds weird stuff in html code
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Post response embeds weird stuff in html code
|
|
|
| Hello there,
I'm really stumped...
I'm fetching a web page with a script and parsing it.
There is a problem because the response inserts '8 1ff8' in random
places.
For example, I get things like
8< tr1ff8>
or
class=mytabl8 rowclass1ff8
Obviously my parsing doesn't work. I'm able to remove 1ff8 with regex
but not the first '8'. This following is never true:
preg_match("/.*8.*1ff8.*/",$page)
the response also prints this at the top of the page:
HTTP/1.1 200 Date: Thu, 24 Feb 2005 19:20:26 GMTServer: Apache/1.3.23
(Unix) (Red-Hat/Linux) mod_jk/1.2.4Set-Cookie:
JSESSIONID=9BAB933BDC5C23784D65084CF9967
645; Path=/portalConnection:
closeTransfer-Encoding: chunkedContent-Type:
text/html;charset=ISO-8859-11ff8
this my post :
$mainpage = getpost(80,"english.montrealplus.ca","/portal/exploreSearch.do","siteId=6§ion=79&pageIndex=0&maxLinkPerPage=1000&maxPagePerSection=15&category=sportEventByType&subCategory=");
function getpost($portnb,$host,$path,$data)
{
$fp = fsockopen ($host,$portnb);
if (!$fp)
{ return false;
}
else
{ $response="";
fputs($fp, "POST $path HTTP/1.1\r\n");
fputs($fp, "Host: $host\r\n");
fputs($fp, "Content-type: application/x-www-form-urlencoded\r\n");
fputs($fp, "Content-length: ".strlen($data)."\r\n");
fputs($fp, "Connection: close\r\n\r\n");
fputs($fp, $data);
while(!feof($fp))
$response.=fgets($fp, 1024);
fclose($fp);
return $response;
}
}
another page i fetched had no such problem and the response header
displayed at the top of the page had a different charset:
HTTP/1.1 200 Date: Thu, 24 Feb 2005 19:22:21 GMT Server: Apache/1.3.23
(Unix) (Red-Hat/Linux) mod_jk/1.2.4 Set-Cookie:
JSESSIONID=5C311056FED1528E46126B87D7425
533; Path=/portal Connection:
close Content-Type: text/html;charset=ISO-8859-1
so i tried adding that charset in my post - ";charset=ISO-8859-1"
after "Content-type: application/x-www-form-urlencoded" but no
success.
| |
| Andy Hassall 2005-02-24, 8:56 pm |
| On 24 Feb 2005 11:28:09 -0800, myahact@yahoo.ca (zorro) wrote:
>I'm fetching a web page with a script and parsing it.
>There is a problem because the response inserts '8 1ff8' in random
>places.
>
>For example, I get things like
>8< tr1ff8>
>or
>class=mytabl8 rowclass1ff8
>
>Obviously my parsing doesn't work. I'm able to remove 1ff8 with regex
>but not the first '8'. This following is never true:
>preg_match("/.*8.*1ff8.*/",$page)
>
>
>the response also prints this at the top of the page:
>HTTP/1.1 200
OK, clue #1 - this is an HTTP/1.1 response.
>Date: Thu, 24 Feb 2005 19:20:26 GMTServer: Apache/1.3.23
>(Unix) (Red-Hat/Linux) mod_jk/1.2.4Set-Cookie:
> JSESSIONID=9BAB933BDC5C23784D65084CF9967
645; Path=/portalConnection:
>closeTransfer-Encoding: chunked
Clue #2 - this is chunked encoded. HTTP/1.1 clients MUST be able to accept
chunked encoding.
RFC2616 HTTP/1.1 sec 4.4 "Message Length", a few paragraphs down:
"
All HTTP/1.1 applications that receive entities MUST accept the
"chunked" transfer-coding (section 3.6), thus allowing this mechanism
to be used for messages when the message length cannot be determined
in advance.
"
>function getpost($portnb,$host,$path,$data)
>{
> $fp = fsockopen ($host,$portnb);
> if (!$fp)
> { return false;
> }
> else
> { $response="";
> fputs($fp, "POST $path HTTP/1.1\r\n");
You're claiming you're an HTTP/1.1 client... but you're not...
> while(!feof($fp))
> $response.=fgets($fp, 1024);
>
> fclose($fp);
> return $response;
... because you're not handling Chunked encoding.
> }
>}
>another page i fetched had no such problem and the response header
>displayed at the top of the page had a different charset:
>HTTP/1.1 200 Date: Thu, 24 Feb 2005 19:22:21 GMT Server: Apache/1.3.23
>(Unix) (Red-Hat/Linux) mod_jk/1.2.4 Set-Cookie:
> JSESSIONID=5C311056FED1528E46126B87D7425
533; Path=/portal Connection:
>close Content-Type: text/html;charset=ISO-8859-1
The charset is a red herring; it's the transfer encoding that's tripping you
up. I think your options are:
(a) Don't claim you're an HTTP/1.1 client - use HTTP/1.0.
(b) Be an HTTP/1.1 client - implement Chunked transfer-encoding decoding.
(c) Use an HTTP/1.1 client library - cURL is a good bet as PHP has native
support for it.
--
Andy Hassall / <andy@andyh.co.uk> / <http://www.andyh.co.uk>
<http://www.andyhsoftware.co.uk/space> Space: disk usage analysis tool
| |
| John Dunlop 2005-02-24, 8:56 pm |
| Andy Hassall wrote:
> (b) Be an HTTP/1.1 client - implement Chunked transfer-encoding decoding.
Pseudo-code for which is given in appendix 19.4.6.
--
Jock
| |
| Chung Leong 2005-02-26, 3:55 am |
| "Andy Hassall" <andy@andyh.co.uk> wrote in message
news:pqes119itt7ibtvg7lpttpbu87ofq9ji1g@
4ax.com...
> The charset is a red herring; it's the transfer encoding that's tripping
you
> up. I think your options are:
>
> (a) Don't claim you're an HTTP/1.1 client - use HTTP/1.0.
> (b) Be an HTTP/1.1 client - implement Chunked transfer-encoding decoding.
> (c) Use an HTTP/1.1 client library - cURL is a good bet as PHP has native
> support for it.
I'm on a mission to convert people to using stream context, hence:
http://www.php.net/stream_context_create/.
|
|
|
|
|