For Programmers: Free Programming Magazines  


Home > Archive > PHP Programming > February 2005 > Post response embeds weird stuff in html code









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Post response embeds weird stuff in html code
zorro

2005-02-24, 3:56 pm

Hello there,
I'm really stumped...

I'm fetching a web page with a script and parsing it.
There is a problem because the response inserts '8 1ff8' in random
places.

For example, I get things like
8< tr1ff8>
or
class=mytabl8 rowclass1ff8

Obviously my parsing doesn't work. I'm able to remove 1ff8 with regex
but not the first '8'. This following is never true:
preg_match("/.*8.*1ff8.*/",$page)


the response also prints this at the top of the page:
HTTP/1.1 200 Date: Thu, 24 Feb 2005 19:20:26 GMTServer: Apache/1.3.23
(Unix) (Red-Hat/Linux) mod_jk/1.2.4Set-Cookie:
JSESSIONID=9BAB933BDC5C23784D65084CF9967
645; Path=/portalConnection:
closeTransfer-Encoding: chunkedContent-Type:
text/html;charset=ISO-8859-11ff8

this my post :

$mainpage = getpost(80,"english.montrealplus.ca","/portal/exploreSearch.do","siteId=6§ion=79&pageIndex=0&maxLinkPerPage=1000&maxPagePerSection=15&category=sportEventByType&subCategory=");
function getpost($portnb,$host,$path,$data)
{
$fp = fsockopen ($host,$portnb);
if (!$fp)
{ return false;
}
else
{ $response="";
fputs($fp, "POST $path HTTP/1.1\r\n");
fputs($fp, "Host: $host\r\n");
fputs($fp, "Content-type: application/x-www-form-urlencoded\r\n");
fputs($fp, "Content-length: ".strlen($data)."\r\n");
fputs($fp, "Connection: close\r\n\r\n");
fputs($fp, $data);

while(!feof($fp))
$response.=fgets($fp, 1024);

fclose($fp);
return $response;
}
}
another page i fetched had no such problem and the response header
displayed at the top of the page had a different charset:
HTTP/1.1 200 Date: Thu, 24 Feb 2005 19:22:21 GMT Server: Apache/1.3.23
(Unix) (Red-Hat/Linux) mod_jk/1.2.4 Set-Cookie:
JSESSIONID=5C311056FED1528E46126B87D7425
533; Path=/portal Connection:
close Content-Type: text/html;charset=ISO-8859-1



so i tried adding that charset in my post - ";charset=ISO-8859-1"
after "Content-type: application/x-www-form-urlencoded" but no
success.
Andy Hassall

2005-02-24, 8:56 pm

On 24 Feb 2005 11:28:09 -0800, myahact@yahoo.ca (zorro) wrote:

>I'm fetching a web page with a script and parsing it.
>There is a problem because the response inserts '8 1ff8' in random
>places.
>
>For example, I get things like
>8< tr1ff8>
>or
>class=mytabl8 rowclass1ff8
>
>Obviously my parsing doesn't work. I'm able to remove 1ff8 with regex
>but not the first '8'. This following is never true:
>preg_match("/.*8.*1ff8.*/",$page)
>
>
>the response also prints this at the top of the page:
>HTTP/1.1 200


OK, clue #1 - this is an HTTP/1.1 response.

>Date: Thu, 24 Feb 2005 19:20:26 GMTServer: Apache/1.3.23
>(Unix) (Red-Hat/Linux) mod_jk/1.2.4Set-Cookie:
> JSESSIONID=9BAB933BDC5C23784D65084CF9967
645; Path=/portalConnection:
>closeTransfer-Encoding: chunked


Clue #2 - this is chunked encoded. HTTP/1.1 clients MUST be able to accept
chunked encoding.

RFC2616 HTTP/1.1 sec 4.4 "Message Length", a few paragraphs down:
"
All HTTP/1.1 applications that receive entities MUST accept the
"chunked" transfer-coding (section 3.6), thus allowing this mechanism
to be used for messages when the message length cannot be determined
in advance.
"

>function getpost($portnb,$host,$path,$data)
>{
> $fp = fsockopen ($host,$portnb);
> if (!$fp)
> { return false;
> }
> else
> { $response="";
> fputs($fp, "POST $path HTTP/1.1\r\n");


You're claiming you're an HTTP/1.1 client... but you're not...

> while(!feof($fp))
> $response.=fgets($fp, 1024);
>
> fclose($fp);
> return $response;


... because you're not handling Chunked encoding.

> }
>}
>another page i fetched had no such problem and the response header
>displayed at the top of the page had a different charset:
>HTTP/1.1 200 Date: Thu, 24 Feb 2005 19:22:21 GMT Server: Apache/1.3.23
>(Unix) (Red-Hat/Linux) mod_jk/1.2.4 Set-Cookie:
> JSESSIONID=5C311056FED1528E46126B87D7425
533; Path=/portal Connection:
>close Content-Type: text/html;charset=ISO-8859-1


The charset is a red herring; it's the transfer encoding that's tripping you
up. I think your options are:

(a) Don't claim you're an HTTP/1.1 client - use HTTP/1.0.
(b) Be an HTTP/1.1 client - implement Chunked transfer-encoding decoding.
(c) Use an HTTP/1.1 client library - cURL is a good bet as PHP has native
support for it.

--
Andy Hassall / <andy@andyh.co.uk> / <http://www.andyh.co.uk>
<http://www.andyhsoftware.co.uk/space> Space: disk usage analysis tool
John Dunlop

2005-02-24, 8:56 pm

Andy Hassall wrote:

> (b) Be an HTTP/1.1 client - implement Chunked transfer-encoding decoding.


Pseudo-code for which is given in appendix 19.4.6.

--
Jock
Chung Leong

2005-02-26, 3:55 am

"Andy Hassall" <andy@andyh.co.uk> wrote in message
news:pqes119itt7ibtvg7lpttpbu87ofq9ji1g@
4ax.com...
> The charset is a red herring; it's the transfer encoding that's tripping

you
> up. I think your options are:
>
> (a) Don't claim you're an HTTP/1.1 client - use HTTP/1.0.
> (b) Be an HTTP/1.1 client - implement Chunked transfer-encoding decoding.
> (c) Use an HTTP/1.1 client library - cURL is a good bet as PHP has native
> support for it.


I'm on a mission to convert people to using stream context, hence:

http://www.php.net/stream_context_create/.



Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2010 codecomments.com