For Programmers: Free Programming Magazines  


Home > Archive > PHP Language > February 2007 > html source









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author html source
yoko

2007-02-14, 9:58 pm

Is there anyway to capture the html source code of a page and only grab the
content in the body tags without using fsockopen?

for example lets say the URL is $url="http://ca3.php.net/manual/en/faq.obtaining.php";

Thanks to everyone that helps.


Dennis Kehrig

2007-02-15, 8:00 am

yoko wrote:
> Is there anyway to capture the html source code of a page and only grab
> the content in the body tags without using fsockopen?
> for example lets say the URL is
> $url="http://ca3.php.net/manual/en/faq.obtaining.php";
>
> Thanks to everyone that helps.


Try this (allow_url_fopen needs to be enabled, probably a bad idea):

// Get the HTML file
$html = file_get_contents($url);
// Reduce it to the contents of the <body> tag
$body = preg_replace("#^.*<body[^>]*>(.*)</body>.*$#si", "\\1", $html);
// Strip of whitespace at the beginning and the end
$body = trim($body);

Best regards,

Dennis Kehrig
yoko

2007-02-16, 3:59 am


That worked no problems. What about cURL is that a good method as well?

Hello Dennis,

> // Get the HTML file
> $html = file_get_contents($url);
> // Reduce it to the contents of the <body> tag
> $body = preg_replace("#^.*<body[^>]*>(.*)</body>.*$#si", "\\1",
> $html);
> // Strip of whitespace at the beginning and the end
> $body = trim($body);



Rik

2007-02-16, 3:59 am

On Fri, 16 Feb 2007 05:05:41 +0100, yoko <nana@na.ca> wrote:

>
> That worked no problems. What about cURL is that a good method as =


> well?


'the body' of the response for CURL is the entire HTML document, just =

without the headers (so _not_ without the html head). No extra =

functionality there to get only the body.

Using cURL is usefull when:
- You're possibly redirected, cURL will follow the redirect if you tell =
it =

to.
- You want to use cookie or post values to get the content.
-- =

Rik Wasmus
Lorenzo Bettini

2007-02-23, 3:59 am

yoko wrote:
> Is there anyway to capture the html source code of a page and only grab
> the content in the body tags without using fsockopen?
> for example lets say the URL is
> $url="http://ca3.php.net/manual/en/faq.obtaining.php";
>


here's my version:
http://tronprog.blogspot.com/2007/0...ody-in-php.html

hope this helps

--
Lorenzo Bettini, PhD in Computer Science, DSI, Univ. di Firenze
ICQ# lbetto, 16080134 (GNU/Linux User # 158233)
HOME: http://www.lorenzobettini.it MUSIC: http://www.purplesucker.com
BLOGS: http://tronprog.blogspot.com http://longlivemusic.blogspot.com
http://www.gnu.org/software/src-highlite
http://www.gnu.org/software/gengetopt
http://www.gnu.org/software/gengen http://doublecpp.sourceforge.net
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com