Home > Archive > PERL Beginners > January 2006 > Processing a web page (or looping over a multi line string)
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Processing a web page (or looping over a multi line string)
|
|
|
| The problem I'm trying to solve:
There's a status web page that this program needs to check. Any line
that matches:
^\s<td>some information</td>
is one that I need to process. I just need the 'some information'
part.
I found LWP::Simple, and I'm getting the webpage with 'get $url'. The
problem is I can't figure out how to loop over the lines of HTML to
find what I want.
I've tried:
my $content = get $url;
while ($line = $content) {
and
while ($line = get $url) {
and
my $content = get $url;
open(CONTENT, "$content") or die "Couldn't get url $!";
and
my $content = get $url;
open(CONTENT, $content) or die "Couldn't get url $!";
and
open(CONTENT, get $url) or die "Couldn't get url $!";
and
open(CONTENT, "get $url") or die "Couldn't get url $!";
I'm missing something fundamental. Thanks for any pointers.
| |
| Paul Lalli 2006-01-10, 4:01 am |
| Jason wrote:
> The problem I'm trying to solve:
> There's a status web page that this program needs to check. Any line
> that matches:
> ^\s<td>some information</td>
> is one that I need to process. I just need the 'some information'
> part.
>
> I found LWP::Simple, and I'm getting the webpage with 'get $url'. The
> problem is I can't figure out how to loop over the lines of HTML to
> find what I want.
>
> I've tried:
>
> my $content = get $url;
$content is one big scalar. One single string. If you want to operate
on the "lines" contained in that string, you'll have to tell Perl what
a "line" is, first:
foreach my $line (split /\n/, $content) {
process_line($line);
}
Hope this helps,
Paul Lalli
| |
|
| I found a solution. Here's what I ended up doing:
my $content = get $url;
die "Couldn't get $url" unless defined $content;
while ($content =~ m/<td>/g) {
#lots of pos hacking with m/?/g
| |
| Paul Lalli 2006-01-10, 4:01 am |
| Jason wrote:
> I found a solution. Here's what I ended up doing:
>
> my $content = get $url;
> die "Couldn't get $url" unless defined $content;
>
> while ($content =~ m/<td>/g) {
> #lots of pos hacking with m/?/g
That's... a really, really terrible idea.
1) What if the HTML you happen to retrieve doesn't have any tables?
2) Your "pos hacking" is likely difficult to read and therefore
difficult to maintain
3) WHY would you want to do that, instead of splitting on newlines, as
I said in my response? You said you wanted to process the file line by
line, so just do that.
I'm thoroughly by your desire to use this "solution"
Paul Lalli
|
|
|
|
|