For Programmers: Free Programming Magazines  


Home > Archive > PERL Miscellaneous > December 2006 > Replacing expression in a file from mechanize









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Replacing expression in a file from mechanize
Nospam

2006-12-23, 10:00 pm

Basically I have a local html file, called file1.html it has a series of
links (with a particular domain name) in addition
to the html code, I am trying to follow each of these links (based on the
regular expression /on\.fe/) each of these links, in their content have a
link to another page, (I would like to capture this particular page based on
a regular expression /www\.arax/), and substitute for each link (with
regular expression /on\.fe/)in file1.html with their corresponding link
(with regular expression/www\.arax/)

So far this is what I have come up with, and am a little stuck



#! perl\bin\perl

use strict;
use warnings;
use WWW::Mechanize;

my $mech = WWW::Mechanize->new();

open(FILE, "< file1.html") || print "Unable to open the file file1 \n";

while (<FILE> )
{
if($_ =~ /on\.fe/)
{
my $url = $_;
print $mech->uri."\n";
$mech->get($_);
$mech->content();
if($mech->content()=~ /www\.arax/)
{
my $url2 = $mech->content() =~ /www\.arax/;
print $mech->uri."\n";
s/$url/$url2/;
print;

}

}

}


close(FILE);



kens

2006-12-23, 10:00 pm


Nospam wrote:
> Basically I have a local html file, called file1.html it has a series of
> links (with a particular domain name) in addition
> to the html code, I am trying to follow each of these links (based on the
> regular expression /on\.fe/) each of these links, in their content have a
> link to another page, (I would like to capture this particular page based on
> a regular expression /www\.arax/), and substitute for each link (with
> regular expression /on\.fe/)in file1.html with their corresponding link
> (with regular expression/www\.arax/)
>
> So far this is what I have come up with, and am a little stuck
>
>
>
> #! perl\bin\perl
>
> use strict;
> use warnings;
> use WWW::Mechanize;
>
> my $mech = WWW::Mechanize->new();
>
> open(FILE, "< file1.html") || print "Unable to open the file file1 \n";
>
> while (<FILE> )
> {
> if($_ =~ /on\.fe/)
> {
> my $url = $_;
> print $mech->uri."\n";
> $mech->get($_);
> $mech->content();
> if($mech->content()=~ /www\.arax/)
> {
> my $url2 = $mech->content() =~ /www\.arax/;
> print $mech->uri."\n";
> s/$url/$url2/;
> print;
>
> }
>
> }
>
> }
>
>
> close(FILE);


Hi,
I have never used WWW::Mechanize module, and I am a little by
your code
(could just be me).

The statement "my $url2 = $mech->content() =~ /www\.arax/;" is not
going to
set $url2 to a string if that was your intent. Since you already know
that the regular expression matches (the preceding 'if' statement),
$url2 is set to 1 (true) indicating there was a match.

Did you just want the following?

my $url2 = $mech->content();

Ken

Mumia W. (on aioe)

2006-12-23, 10:00 pm

On 12/23/2006 11:01 AM, Nospam wrote:
> Basically I have a local html file, called file1.html it has a series of
> links (with a particular domain name) in addition
> to the html code, I am trying to follow each of these links (based on the
> regular expression /on\.fe/) each of these links, in their content have a
> link to another page, (I would like to capture this particular page based on
> a regular expression /www\.arax/), and substitute for each link (with
> regular expression /on\.fe/)in file1.html with their corresponding link
> (with regular expression/www\.arax/)
>
> So far this is what I have come up with, and am a little stuck
>
>
>
> #! perl\bin\perl
>
> use strict;
> use warnings;
> use WWW::Mechanize;
>
> my $mech = WWW::Mechanize->new();
>
> open(FILE, "< file1.html") || print "Unable to open the file file1 \n";
>
> while (<FILE> )
> {
> if($_ =~ /on\.fe/)
> {
> my $url = $_;
> print $mech->uri."\n";
> $mech->get($_);
> $mech->content();
> if($mech->content()=~ /www\.arax/)
> {
> my $url2 = $mech->content() =~ /www\.arax/;
> print $mech->uri."\n";
> s/$url/$url2/;
> print;
>
> }
>
> }
>
> }
>
>
> close(FILE);
>
>
>


Neither your prose nor your program give me a feel for what you're
trying to do. Can we see some sample data for both file1.html and one of
the "www\.arax" containing files?



--
paduille.4060.mumia.w@earthlink.net
http://home.earthlink.net/~mumia.w.18.spam/
Nospam

2006-12-24, 7:03 pm


"Mumia W. (on aioe)" <paduille.4060.mumia.w@earthlink.net> wrote in message
news:emkmp5$fh0$1@aioe.org...
> On 12/23/2006 11:01 AM, Nospam wrote:
the[color=darkred]
a[color=darkred]
based on[color=darkred]
>
> Neither your prose nor your program give me a feel for what you're
> trying to do. Can we see some sample data for both file1.html and one of
> the "www\.arax" containing files?
>


From file1.html, a sample of the html code:


<li class="MsoNormal" style="line-height: 18.0pt; text-autospace:
ideograph-numeric ideograph-other; background: white">
<span style="font-size: 11.0pt; font-family: Tahoma">
<a href="http://...online.feeds.com/link1/" target="_blank" style="color:
blue; text-decoration: underline; text-underline: single">
<span style="color: #336699; text-decoration: none">Links
Part 2</span></a> </span></li>
<li class="MsoNormal" style="line-height: 18.0pt; text-autospace:
ideograph-numeric ideograph-other; background: white">
<span style="font-size: 11.0pt; font-family: Tahoma">
<a href="http://...online.feeds.com/link2/" target="_blank" style="color:
blue; text-decoration: underline; text-underline: single">
<span style="color: #336699; text-decoration: none">Links
Part 3</span></a> </span></li>
<li class="MsoNormal" style="line-height: 18.0pt; text-autospace:
ideograph-numeric ideograph-other; background: white">
<span style="font-size: 11.0pt; font-family: Tahoma">
<a href="http://...online.feeds.com/link3/" target="_blank" style="color:
blue; text-decoration: underline; text-underline: single">
<span style="color: #336699; text-decoration: none">Links
Part 4</span></a> </span></li>
<li class="MsoNormal" style="line-height: 18.0pt; text-autospace:
ideograph-numeric ideograph-other; background: white">
<span style="font-size: 11.0pt; font-family: Tahoma">
<a href="http://...online.feeds.com/link4/" target="_blank" style="color:
blue; text-decoration: underline; text-underline: single">
<span style="color: #336699; text-decoration: none">Links
Part 5</span></a> </span></li>

The contents of the link http://...online.feeds.com/link1/ for example is:


<body>
...
</td></tr><tr><td
style="height:81%;width:100%;padding:0;text-align:left;"><embed
src="http://...arax.../v/gomlckZfGYU..." </embed> </td>
</tr>
<tr>
<td style="height:13%;width:100%;padding:0;text-align:left;">


Gunnar Hjalmarsson

2006-12-24, 7:03 pm

Nospam wrote:
> Basically I have a local html file,


The guy is multi-posting.
http://www.thescripts.com/forum/thread580426.html

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com