Home > Archive > PHP Language > March 2006 > checking URL
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
|
|
|
| Hi all!
I have an interesting problem - I'd like to hear your comments on this.
The user can enter URLs for links via web application. Since users make
mistakes I was wondering if there is an automated way to see if the URL
that was entered is valid? Basically, I would need some sort of "browser"
that would try to fetch the page and alert the user if it wasn't possible.
Any idea would be appreciated.
Regards,
Anze
| |
|
| Anze wrote:
[snip]
> The user can enter URLs for links via web application. Since users make
> mistakes I was wondering if there is an automated way to see if the URL
> that was entered is valid? Basically, I would need some sort of "browser"
> that would try to fetch the page and alert the user if it wasn't possible.
[snip]
Maybe get_headers() will work for you. Only works in php 5, though.
Zilla
| |
|
|
|
| >> Maybe get_headers() will work for you. Only works in php 5, though.
Nice!
Thank you, didn't know that this function existed. Looks like it's time I
went through PHP documentation again. :)
Best,
Anze
| |
|
| >>> Maybe get_headers() will work for you. Only works in php 5, though.
> Nice!
Oooops, PHP5 only. :(
Most of my pages are PHP4 hosted, so this is not an option. The scritps that
people posted have some limitations though...
Thanks anyway.
Does anyone have a PHP4-compatible solution?
Regards,
Anze
| |
| BearItAll 2006-03-06, 7:55 am |
| Anze wrote:
>
> Oooops, PHP5 only. :(
>
> Most of my pages are PHP4 hosted, so this is not an option. The scritps
> that people posted have some limitations though...
>
> Thanks anyway.
>
> Does anyone have a PHP4-compatible solution?
>
> Regards,
>
> Anze
You don't realy want to load the page as a browser function, only know if
the page exists. Oh, file_exists is php5, (all the things we're finding
needing php5 its a wonder we ever managed to do anything at all with php4).
But you can do a plain old fopen
$file = fopen("http://www.mysite.com","r");
if(!$file)
{
#no such file
}
else
{
#file exists.
}
A post for file_exist() in php.netsuggests that files less than 2k show as
not existing. fopen wouldn't have that problem.
But even with fopen you may not have joy the world over, because of access
problems. So you need to socket to them (sorry, can't say I accidentally
punned there).
$splitUrl = parse_url($url);
$port = $splitUrl['port']; #(or set to standard none secure port)
$mySocks = fsockopen('host', $port);
$command = 'GET ' . $splitUrl['path'] . " HTTP/1.0\n" . 'Host " .
$splitUrl['host']\n\n";
fwrite($mySocks, $command);
#now all you really need is one small 'fread' then can close the socket. But
you will find there was a reason for using a minimum read of 1k, I mean
greater than a few bytes. Which was to cover an error page from the host.
You can carry on and check that the error page returned isn't a 404 or one
of the other common pages, which seems wise because I just looked at the
average file sizes in my current project and really a fair few would be
reported as none existent if the 1k limit is used (those that have seen my
web pages would argue that is a good thing).
| |
| Christian Hansel 2006-03-06, 7:55 am |
| BearItAll wrote:
> Anze wrote:
>
>
> You don't realy want to load the page as a browser function, only know if
> the page exists. Oh, file_exists is php5, (all the things we're finding
> needing php5 its a wonder we ever managed to do anything at all with
> php4).
>
> But you can do a plain old fopen
>
> $file = fopen("http://www.mysite.com","r");
> if(!$file)
> {
> #no such file
> }
> else
> {
> #file exists.
> }
>
> A post for file_exist() in php.netsuggests that files less than 2k show as
> not existing. fopen wouldn't have that problem.
>
> But even with fopen you may not have joy the world over, because of access
> problems. So you need to socket to them (sorry, can't say I accidentally
> punned there).
>
> $splitUrl = parse_url($url);
> $port = $splitUrl['port']; #(or set to standard none secure port)
> $mySocks = fsockopen('host', $port);
>
> $command = 'GET ' . $splitUrl['path'] . " HTTP/1.0\n" . 'Host " .
> $splitUrl['host']\n\n";
>
> fwrite($mySocks, $command);
>
> #now all you really need is one small 'fread' then can close the socket.
> #But
> you will find there was a reason for using a minimum read of 1k, I mean
> greater than a few bytes. Which was to cover an error page from the host.
> You can carry on and check that the error page returned isn't a 404 or one
> of the other common pages, which seems wise because I just looked at the
> average file sizes in my current project and really a fair few would be
> reported as none existent if the 1k limit is used (those that have seen my
> web pages would argue that is a good thing).
Not sure there, but it seems to me (w/o testing though) that on a correctly
configured server you will be getting a valid html-formatted error file,
larger than 1k.
Thus, I suspect, fopen will fail
| |
|
|
Thank you both for your help!
From what I gathered:
- file_exists works over http, but only in PHP5
- there are problems with fopen and fsockopen
There is another way - cUrl. I used it in some other project and it seemed
to be very reliable. But the principle is the same - trying to fetch the
page and complaining if the server or the page are unavailable.
Thanks again, I think I can manage from here. :)
Anze
| |
|
|
By the way, if there is someone reading this and trying to copy - be
careful. I only use this on authenticated pages with known (and trusted)
users.
Potential security hole is that the user makes your server make a request to
another server - which might help break down the target server and similar.
It's a dangerous world out there. ;)
Anze
Anze wrote:
>
> Thank you both for your help!
>
> From what I gathered:
> - file_exists works over http, but only in PHP5
> - there are problems with fopen and fsockopen
>
> There is another way - cUrl. I used it in some other project and it seemed
> to be very reliable. But the principle is the same - trying to fetch the
> page and complaining if the server or the page are unavailable.
>
> Thanks again, I think I can manage from here. :)
>
> Anze
|
|
|
|
|