Home > Archive > PERL Miscellaneous > December 2006 > Replacing expression in a file from mechanize
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Replacing expression in a file from mechanize
|
|
| Nospam 2006-12-23, 10:00 pm |
| Basically I have a local html file, called file1.html it has a series of
links (with a particular domain name) in addition
to the html code, I am trying to follow each of these links (based on the
regular expression /on\.fe/) each of these links, in their content have a
link to another page, (I would like to capture this particular page based on
a regular expression /www\.arax/), and substitute for each link (with
regular expression /on\.fe/)in file1.html with their corresponding link
(with regular expression/www\.arax/)
So far this is what I have come up with, and am a little stuck
#! perl\bin\perl
use strict;
use warnings;
use WWW::Mechanize;
my $mech = WWW::Mechanize->new();
open(FILE, "< file1.html") || print "Unable to open the file file1 \n";
while (<FILE> )
{
if($_ =~ /on\.fe/)
{
my $url = $_;
print $mech->uri."\n";
$mech->get($_);
$mech->content();
if($mech->content()=~ /www\.arax/)
{
my $url2 = $mech->content() =~ /www\.arax/;
print $mech->uri."\n";
s/$url/$url2/;
print;
}
}
}
close(FILE);
| |
|
|
Nospam wrote:
> Basically I have a local html file, called file1.html it has a series of
> links (with a particular domain name) in addition
> to the html code, I am trying to follow each of these links (based on the
> regular expression /on\.fe/) each of these links, in their content have a
> link to another page, (I would like to capture this particular page based on
> a regular expression /www\.arax/), and substitute for each link (with
> regular expression /on\.fe/)in file1.html with their corresponding link
> (with regular expression/www\.arax/)
>
> So far this is what I have come up with, and am a little stuck
>
>
>
> #! perl\bin\perl
>
> use strict;
> use warnings;
> use WWW::Mechanize;
>
> my $mech = WWW::Mechanize->new();
>
> open(FILE, "< file1.html") || print "Unable to open the file file1 \n";
>
> while (<FILE> )
> {
> if($_ =~ /on\.fe/)
> {
> my $url = $_;
> print $mech->uri."\n";
> $mech->get($_);
> $mech->content();
> if($mech->content()=~ /www\.arax/)
> {
> my $url2 = $mech->content() =~ /www\.arax/;
> print $mech->uri."\n";
> s/$url/$url2/;
> print;
>
> }
>
> }
>
> }
>
>
> close(FILE);
Hi,
I have never used WWW::Mechanize module, and I am a little by
your code
(could just be me).
The statement "my $url2 = $mech->content() =~ /www\.arax/;" is not
going to
set $url2 to a string if that was your intent. Since you already know
that the regular expression matches (the preceding 'if' statement),
$url2 is set to 1 (true) indicating there was a match.
Did you just want the following?
my $url2 = $mech->content();
Ken
| |
| Mumia W. (on aioe) 2006-12-23, 10:00 pm |
| On 12/23/2006 11:01 AM, Nospam wrote:
> Basically I have a local html file, called file1.html it has a series of
> links (with a particular domain name) in addition
> to the html code, I am trying to follow each of these links (based on the
> regular expression /on\.fe/) each of these links, in their content have a
> link to another page, (I would like to capture this particular page based on
> a regular expression /www\.arax/), and substitute for each link (with
> regular expression /on\.fe/)in file1.html with their corresponding link
> (with regular expression/www\.arax/)
>
> So far this is what I have come up with, and am a little stuck
>
>
>
> #! perl\bin\perl
>
> use strict;
> use warnings;
> use WWW::Mechanize;
>
> my $mech = WWW::Mechanize->new();
>
> open(FILE, "< file1.html") || print "Unable to open the file file1 \n";
>
> while (<FILE> )
> {
> if($_ =~ /on\.fe/)
> {
> my $url = $_;
> print $mech->uri."\n";
> $mech->get($_);
> $mech->content();
> if($mech->content()=~ /www\.arax/)
> {
> my $url2 = $mech->content() =~ /www\.arax/;
> print $mech->uri."\n";
> s/$url/$url2/;
> print;
>
> }
>
> }
>
> }
>
>
> close(FILE);
>
>
>
Neither your prose nor your program give me a feel for what you're
trying to do. Can we see some sample data for both file1.html and one of
the "www\.arax" containing files?
--
paduille.4060.mumia.w@earthlink.net
http://home.earthlink.net/~mumia.w.18.spam/
| |
| Nospam 2006-12-24, 7:03 pm |
|
"Mumia W. (on aioe)" <paduille.4060.mumia.w@earthlink.net> wrote in message
news:emkmp5$fh0$1@aioe.org...
> On 12/23/2006 11:01 AM, Nospam wrote:
the[color=darkred]
a[color=darkred]
based on[color=darkred]
>
> Neither your prose nor your program give me a feel for what you're
> trying to do. Can we see some sample data for both file1.html and one of
> the "www\.arax" containing files?
>
From file1.html, a sample of the html code:
<li class="MsoNormal" style="line-height: 18.0pt; text-autospace:
ideograph-numeric ideograph-other; background: white">
<span style="font-size: 11.0pt; font-family: Tahoma">
<a href="http://...online.feeds.com/link1/" target="_blank" style="color:
blue; text-decoration: underline; text-underline: single">
<span style="color: #336699; text-decoration: none">Links
Part 2</span></a> </span></li>
<li class="MsoNormal" style="line-height: 18.0pt; text-autospace:
ideograph-numeric ideograph-other; background: white">
<span style="font-size: 11.0pt; font-family: Tahoma">
<a href="http://...online.feeds.com/link2/" target="_blank" style="color:
blue; text-decoration: underline; text-underline: single">
<span style="color: #336699; text-decoration: none">Links
Part 3</span></a> </span></li>
<li class="MsoNormal" style="line-height: 18.0pt; text-autospace:
ideograph-numeric ideograph-other; background: white">
<span style="font-size: 11.0pt; font-family: Tahoma">
<a href="http://...online.feeds.com/link3/" target="_blank" style="color:
blue; text-decoration: underline; text-underline: single">
<span style="color: #336699; text-decoration: none">Links
Part 4</span></a> </span></li>
<li class="MsoNormal" style="line-height: 18.0pt; text-autospace:
ideograph-numeric ideograph-other; background: white">
<span style="font-size: 11.0pt; font-family: Tahoma">
<a href="http://...online.feeds.com/link4/" target="_blank" style="color:
blue; text-decoration: underline; text-underline: single">
<span style="color: #336699; text-decoration: none">Links
Part 5</span></a> </span></li>
The contents of the link http://...online.feeds.com/link1/ for example is:
<body>
...
</td></tr><tr><td
style="height:81%;width:100%;padding:0;text-align:left;"><embed
src="http://...arax.../v/gomlckZfGYU..." </embed> </td>
</tr>
<tr>
<td style="height:13%;width:100%;padding:0;text-align:left;">
| |
|
|
|
|
|