For Programmers: Free Programming Magazines  


Home > Archive > PHP Language > July 2004 > Open File from URL and get its address due to redirect









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Open File from URL and get its address due to redirect
Alan Taylor

2004-07-24, 3:56 pm

Im trying to make something like google. It opens a page, finds all the
links and then follows them links. The problem with somepages is that they
redirect you so when i try to complete a relative link im following the
wrong domain name.

e.g.

I open the web page http://www.fake.com/dir1/dir2/file1.htm
this web page redirects to another site which i dont know the URL of.
which contains the links.

/dir5/file.htm
dir6/file.htm
.../file.htm

I now dont know how to complete there partial URLs.

How do I find out the URL of the file I have just opened so I can complete
these partial links?

Thanks.


eclipsboi

2004-07-24, 8:55 pm

On Sat, 24 Jul 2004 16:03:09 GMT, "Alan Taylor"
<alan.taylor173@ntlworld.com> wrote:

>Im trying to make something like google. It opens a page, finds all the
>links and then follows them links. The problem with somepages is that they
>redirect you so when i try to complete a relative link im following the
>wrong domain name.
>
>e.g.
>
>I open the web page http://www.fake.com/dir1/dir2/file1.htm
>this web page redirects to another site which i dont know the URL of.
>which contains the links.
>
>/dir5/file.htm
>dir6/file.htm
>../file.htm
>
>I now dont know how to complete there partial URLs.
>
>How do I find out the URL of the file I have just opened so I can complete
>these partial links?
>
>Thanks.
>


There are two types of URLs, a) Absolute; meaning that no matter where
you call that link from in your directory structure it points to the
same file (e.x. http://www.example.com/somelink or /somelink) and b)
Relative; meaning that if you call a link from somewhere else in the
directory structure, the file will not be found (e.x. somelink or
.../somelink). (Just had to get that out of the way, keep reading)

Now, when a URL doesn't have the http://www.domain.tld attached, the
web browser knows automatically that the URL is on whatever server the
previous page is on (e.g. www.fake.com). What you are left with is
coding your script to make this determination, and if the link is like
the three examples you provided above, to prepend the domain to the
link. One way to do this is to code your URL into four different
parts, i.e. $protocol (the http://, https://, ftp://, etc), $domain
(the www.fake.com or whatever), the $path (either / or the directory
path to the file), and the $file (in your case file.htm). This will
make it easier for you to work with the varying changes in the web
structure of any server.

Also you will have to code to keep in mind the definitions of relative
and absolute links, to ensure you build a proper URL for file
retrieval. substr is your friend. Use him wisely, and he can work
wonders for you. (e.x. if (substr($file, -1) == "/") // it's a
directory).
Markus G. Klötzer

2004-07-25, 3:55 pm

eclipsboi <eclipsboi@hotmail.com> wrote:

> Now, when a URL doesn't have the http://www.domain.tld attached, the
> web browser knows automatically that the URL is on whatever server the
> previous page is on (e.g. www.fake.com).


not forgetting the posibility that in the head of the document there
might be an alternative base for relative URI defined.

hth

mgk
--
"Advertisements contain the only truths to be relied
on in the newspaper." - Thomas Jefferson
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com