For Programmers: Free Programming Magazines  


Home > Archive > PHP DB > October 2005 > Re: Subject: Searching remote web sites for content









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Re: Subject: Searching remote web sites for content
Neil Smith [MVP, Digital media]

2005-10-23, 7:55 am

At 06:26 23/10/2005, you wrote:
>Message-ID: < 8d9a42800510221021l54d3ba35y111666680ac3
b643@mail.gmail.com>
>Date: Sat, 22 Oct 2005 13:21:26 -0400
>From: Joseph Crawford <codebowl@gmail.com>
>To: "[PHP-DB] Mailing List" <php-db@lists.php.net>
>MIME-Version: 1.0
>Content-Type: multipart/alternative;
> boundary="----=_Part_33359_9054580.1130001686839"
>Subject: Re: [PHP-DB] Re: Subject: Searching remote web sites for content
>
>why do all that,


Oh, it's far less work than the method you're proposing - you only
have one site to fopen() not many dozens. There's no 'all that' to it
- it's the same method we're discussing, but more optimal (see point 3)

> if you know the address of the page that the link will
>reside on just curl that page for the results and preg_match that.



Ref the OP : "I ask them to nominate where the link back page is, and
I could check this manually. But is there a way to check whether the
remote page links back using a php script, so that I could get a
report and follow up on exceptions, without having to check all pages
that say they link to my site?"

Three reasons : 1 is because the nomination process might be poorly
understood by the nominee, or they could be inept and place the link
somewhere other than where they specified (or move it about once
nominated). You'd need to be able to crawl their entire site in order
to automate the scan on a regular basis, or you're back to " and I
could check this manually"

2 is that unless you want to write a very very robust parser, you may
as well rely on google's hard work writing such a parser. You can't
be sure *how* the referring webmaster has set up his links (re:inept)
so they could occur in a wide range of formats. The results from
google come in a regular format, so they're easy to parse - and you
said yourself you're not too certain of the regex you'd need - why
complicate it by having to cover dozens of eventualities ?

3 is that the point of the exercise is to ensure goos SE rankings by
having referring links of high relevance. Only google knows how that
relevance ranking results in a search index placement based on link
popularity - and that includes using hidden links to 'spam' the
search engine, whic you don't want.

So, relying on google to spider the remote site is a way to ensure
your QA process for the link referrals really does result in a usable
link:mysite index in the search engine - which of course is *the
whole point of the exercise* !

HTH
Cheers - Neil
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com