Code Comments
Programming Forum and web based access to our favorite programming groups.
Hi,
I am trying to make 1000 searches from a site from a keyword file. I
want to automate these searches. I have copied the search form of
site and modified the post part of url so i can make necessary
modifications for the automation.
// structure of form
<form post "http://www.XXXXXXXX.com/processform.php">
select box>// a selectbox with 100 options
input box
</form>
Program will be as follows
While not end of file
{
1st step: Read a keyword from the file and assign it $value
For each option $op on the form
//Fill form
2nd step: $op is selected
3rd step: enter $value to inputbox
4: submit
5: save the result to a file, $result
6: parse $result and save to database
}
There is no problem until step 5.When i fill and submit the
form a page is opened from the remote host and contains the table
//table
data1-- data2 -- data3
data1 and data2 is text and data 3 is a link
and i want to save data1 data2 and data3link to a database.
but when the control is gone to formprocessor on the remote host i
have no idea how to complete step 5. I mean how can i gain control and
save the result to a file and parse it.
Also this is the way i can to think of but necesserly the feasible
solution. If you can think of other options, i am all ears :)
Thank you very much for your kind response.
Post Follow-up to this messageraven wrote:
> Hi,
>
> I am trying to make 1000 searches from a site from a keyword file. I
> want to automate these searches. I have copied the search form of
> site and modified the post part of url so i can make necessary
> modifications for the automation.
>
> // structure of form
> <form post "http://www.XXXXXXXX.com/processform.php">
> select box>// a selectbox with 100 options
> input box
> </form>
>
> Program will be as follows
>
> While not end of file
> {
> 1st step: Read a keyword from the file and assign it $value
> For each option $op on the form
> //Fill form
> 2nd step: $op is selected
> 3rd step: enter $value to inputbox
> 4: submit
>
> 5: save the result to a file, $result
> 6: parse $result and save to database
> }
> There is no problem until step 5.When i fill and submit the
> form a page is opened from the remote host and contains the table
>
> //table
>
> data1-- data2 -- data3
>
> data1 and data2 is text and data 3 is a link
>
> and i want to save data1 data2 and data3link to a database.
>
> but when the control is gone to formprocessor on the remote host i
> have no idea how to complete step 5. I mean how can i gain control and
> save the result to a file and parse it.
>
> Also this is the way i can to think of but necesserly the feasible
> solution. If you can think of other options, i am all ears :)
>
> Thank you very much for your kind response.
>
>
>
>
You will need to use CURL or similar to submit the form so you can get
the information back.
And BTW - do you have permission to do this? If I saw someone doing
this on one of my sites, they'd be blocked immediately - if not sooner.
--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================
Post Follow-up to this messageOr you could just make a textfile containing all the URLs and pass it to WGET. Or use Jerry's approach with CURL. CURL would be my first approach since it's built-in, but you could use WGET as well.
Post Follow-up to this messageThank you Jerry and George for your quick responses. The 100 options in the forms corresponds the cities and the form in the site doesn't allow you to make a query without selecting the city first. Since I have no idea for city information(actually i am searching it) a specific query takes 50 average submit to find and i have over 100 queries. Be it because of the mistake of the remote site designer or me being evil:( it is unbearable to proceed one by one by hand.
Post Follow-up to this messageOn Mon, 31 Mar 2008 07:14:54 -0700 (PDT), raven <rvnsnest@gmail.com> wrote: >Thank you Jerry and George for your quick responses. The 100 options >in the forms corresponds the cities and the form in the site doesn't >allow you to make a query without selecting the city first. Since I >have no idea for city information(actually i am searching it) a >specific query takes 50 average submit to find and i have over 100 >queries. Be it because of the mistake of the remote site designer or >me being evil:( it is unbearable to proceed one by one by hand. My bet is the site will figure out that you're botting them after about the first 50 and shut you down. Ever thought of just asking them for the data? -- gburnore at DataBasix dot Com --------------------------------------------------------------------------- How you look depends on where you go. --------------------------------------------------------------------------- Gary L. Burnore | ÝÛ³ºÝ³Þ³ºÝ³³Ýۺݳ޳ºÝ³Ý³Þ³ºÝ³ÝÝÛ³ | ÝÛ³ºÝ³Þ³ºÝ³³Ýۺݳ޳ºÝ³Ý³Þ³ºÝ³ÝÝÛ³ Official .sig, Accept no substitutes. | ÝÛ³ºÝ³Þ³ºÝ³³Ýۺݳ޳ºÝ³Ý³Þ³ºÝ³ÝÝÛ³ | ÝÛ 0 1 7 2 3 / Ý³Þ 3 7 4 9 3 0 Û³ Black Helicopter Repair Services, Ltd.| Official Proof of Purchase ======================================== ===================================
Post Follow-up to this messageraven wrote: > Thank you Jerry and George for your quick responses. The 100 options > in the forms corresponds the cities and the form in the site doesn't > allow you to make a query without selecting the city first. Since I > have no idea for city information(actually i am searching it) a > specific query takes 50 average submit to find and i have over 100 > queries. Be it because of the mistake of the remote site designer or > me being evil:( it is unbearable to proceed one by one by hand. > Then you have a problem. If their webmaster is paying any attention at all, you'll be in deep trouble with him (and most probably the site owner, if they aren't the same people). I wouldn't recommend it. -- ================== Remove the "x" from my email address Jerry Stuckle JDS Computer Training Corp. jstucklex@attglobal.net ==================
Post Follow-up to this messageOn Mar 31, 6:25 pm, Gary L. Burnore <gburn...@databasix.com> wrote: > On Mon, 31 Mar 2008 07:14:54 -0700 (PDT), raven <rvnsn...@gmail.com> > wrote: > > > My bet is the site will figure out that you're botting them after > about the first 50 and shut you down. Ever thought of just asking > them for the data? > -- Well, in order to go about this you should do the following things: 1. Choose a user-agent to emulate (Microsoft Internet Explorer or Firefox) 2. Choose a random request time so that you don't send requests all the time. Both this options are available in CURL and also WGET so you could use any of them. And even might want to do it from more than one IP. That's my opinion, if you need any more help on spidering the data just drop me an email. Regards, George Maicovschi.
Post Follow-up to this messagethank you all.I will make a delay between queries like 10 seconds so it will not consume remote site bandwith.but there is another problem with curl. The remote site needs creditentals to access.Normally I was opening site with firefox, entering credidentals and after that I was completing the form from the local copy and it was working.but when i used curl for posting the site directs me to credidentals page. I have made trivial post form and curl handle it very well so i think my mistake isn't about curl.Things I have tried 1-I thought maybe the host knows request isn't coming from a html form so i used user-agent as George suggested $userAgent = 'Firefox (WindowsXP) - Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6'; curl_setopt($Curl_Session, CURLOPT_USERAGENT, $userAgent); also i used google& yahoo crawler id's as suggested elsewhere didn't worked 2-I thought web page uses cookie based session management so somehow directly using curl submit doesnt carry session variables but webdeveloper addon for mozilla does not show any 3-I used a intermediate processor and tried to acces curl_setopt ($Curl_Session, CURLOPT_POSTFIELDS, $_POST); with no luck I don't get how a local form submit works but a curl submit do not.I used curl for other form completion tasks and everywhere it seems ok. Is there a way the form processor knows it is coming from curl not from a html form and rejects it? or alternatively is there a way i can use local form and *somehow* gain control of generated page?
Post Follow-up to this message"Jerry Stuckle" <jstucklex@attglobal.net> wrote in message news:U_udnbJBmK8ikmzanZ2dnUVZ_uXinZ2d@co mcast.com... > raven wrote: > > Then you have a problem. If their webmaster is paying any attention > at all, you'll be in deep trouble with him (and most probably the > site owner, if they aren't the same people). > > I wouldn't recommend it. > Jerry, I always thought programmers were supposed to automate repetitive tasks. I see nothing wrong with this, as long as it "friendly fire" ;) R.
Post Follow-up to this messageRichard wrote: > "Jerry Stuckle" <jstucklex@attglobal.net> wrote in message > news:U_udnbJBmK8ikmzanZ2dnUVZ_uXinZ2d@co mcast.com... > > Jerry, > I always thought programmers were supposed to automate repetitive > tasks. > I see nothing wrong with this, as long as it "friendly fire" ;) > > R. > > > > What's wrong with it is he's using someone else's information and bandwidth. For instance, if the information is copyrighted, he could be in serious legal trouble. Even if it isn't copyrighted, the owner may not like the way he's using their website. Automating repetitive tasks is fine when it's your resources. But when you're using someone else's resources, you need to pay attention to what they allow. The whole thing could get him in serious legal trouble if he doesn't have permission to do what he wants. And if the owner of the site wanted to press it, it could cost the op a LOT of money. -- ================== Remove the "x" from my email address Jerry Stuckle JDS Computer Training Corp. jstucklex@attglobal.net ==================
Post Follow-up to this message
Show a Printable Version
Email This Page to Someone!
Receive updates to this thread
Powered by vBulletin
Copyright 2000-2006 Jelsoft Enterprises Limited.