Home > Archive > PHP Language > October 2004 > how to stop SE's listing 2 url's
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
how to stop SE's listing 2 url's
|
|
|
| Hello
Am really worried, so wondered if anyone could help.
My site outgrew itself recently so we've had to make changes to the url
structure.
I have some important url's like this: www.mysite.com/bluewidgets/, Yet now
with the expansion of the site and url structure change (had to be done) we
also have urls like: www.mysite.com/country1/bluewidgets/ which serves up
identical content to the above first url.
Is this bad? There is no way around it, cause if i dump my old url (i have
50 important ones kept) I will have to get around 6,000 webmasters to change
my link url on their pages, which i dont want to have to do.
My programmer says it wont be a problem with google etc, but i'm worried. I
rely on this site for my income.
Is it possible to stop google from crawling and most importantly listing the
50 new url's in the new format? So it sticks with the old ones? Everything
is done with php/mod rewrite rules and so its not simple for me to know.
Or is it possible to have 50 redirects from the new url's to the old ones?
Will that stop google listing both?
How do i get round this? The problem is, because the site is very much
database driven, i have no way of making it use the old format url's for the
50 in question. I hope this all makes sense. Thanks for any help,
Chris
| |
| Centurion 2004-10-19, 8:55 pm |
| Chris wrote:
> Hello
> Am really worried, so wondered if anyone could help.
>
> My site outgrew itself recently so we've had to make changes to the url
> structure.
> I have some important url's like this: www.mysite.com/bluewidgets/, Yet
> now with the expansion of the site and url structure change (had to be
> done) we also have urls like: www.mysite.com/country1/bluewidgets/ which
> serves up identical content to the above first url.
>
> Is this bad? There is no way around it, cause if i dump my old url (i have
> 50 important ones kept) I will have to get around 6,000 webmasters to
> change my link url on their pages, which i dont want to have to do.
> My programmer says it wont be a problem with google etc, but i'm worried.
> I rely on this site for my income.
> Is it possible to stop google from crawling and most importantly listing
> the 50 new url's in the new format? So it sticks with the old ones?
> Everything is done with php/mod rewrite rules and so its not simple for me
> to know. Or is it possible to have 50 redirects from the new url's to the
> old ones? Will that stop google listing both?
>
> How do i get round this? The problem is, because the site is very much
> database driven, i have no way of making it use the old format url's for
> the 50 in question. I hope this all makes sense. Thanks for any help,
>
> Chris
Put a robots.txt file in your web server's root dir and tell all user agents
to NOT crawl /counrty1 /couuntry2 etc. There's no way to wildcard anything
with robots.txt files, so DON'T try something like "disallow: /country*"
etc.
I currently tell robots/spiders to leave a bunch of virtual directories
alone and it works well. By "virtual" I mean, they don't really exist in
the file system, they are URL's that get rewritten with apache rewrites
etc. eg, http://www.mysite.eg/gallery/foo doesn't exist, but the rewrite
rules get the correct files from the right place in the file system
(<webroot>/content/users/foo/gallery). I've got "disallow: /gallery/foo"
in my robots.txt file and google/msn/yahoo etc, all honour that.
There's heaps of info online and tools to verify robots.txt files online -
just google it ;)
Cheers,
James
--
"In short, _N is Richardian if, and only if, _N is not Richardian."
|
|
|
|
|