For Programmers: Free Programming Magazines  


Home > Archive > PERL Beginners > January 2006 > Processing a web page (or looping over a multi line string)









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Processing a web page (or looping over a multi line string)
Jason

2006-01-10, 4:01 am

The problem I'm trying to solve:
There's a status web page that this program needs to check. Any line
that matches:
^\s<td>some information</td>
is one that I need to process. I just need the 'some information'
part.

I found LWP::Simple, and I'm getting the webpage with 'get $url'. The
problem is I can't figure out how to loop over the lines of HTML to
find what I want.

I've tried:

my $content = get $url;
while ($line = $content) {

and

while ($line = get $url) {

and

my $content = get $url;
open(CONTENT, "$content") or die "Couldn't get url $!";

and

my $content = get $url;
open(CONTENT, $content) or die "Couldn't get url $!";

and

open(CONTENT, get $url) or die "Couldn't get url $!";

and

open(CONTENT, "get $url") or die "Couldn't get url $!";

I'm missing something fundamental. Thanks for any pointers.

Paul Lalli

2006-01-10, 4:01 am

Jason wrote:
> The problem I'm trying to solve:
> There's a status web page that this program needs to check. Any line
> that matches:
> ^\s<td>some information</td>
> is one that I need to process. I just need the 'some information'
> part.
>
> I found LWP::Simple, and I'm getting the webpage with 'get $url'. The
> problem is I can't figure out how to loop over the lines of HTML to
> find what I want.
>
> I've tried:
>
> my $content = get $url;


$content is one big scalar. One single string. If you want to operate
on the "lines" contained in that string, you'll have to tell Perl what
a "line" is, first:

foreach my $line (split /\n/, $content) {
process_line($line);
}

Hope this helps,
Paul Lalli

Jason

2006-01-10, 4:01 am

I found a solution. Here's what I ended up doing:

my $content = get $url;
die "Couldn't get $url" unless defined $content;

while ($content =~ m/<td>/g) {
#lots of pos hacking with m/?/g

Paul Lalli

2006-01-10, 4:01 am

Jason wrote:
> I found a solution. Here's what I ended up doing:
>
> my $content = get $url;
> die "Couldn't get $url" unless defined $content;
>
> while ($content =~ m/<td>/g) {
> #lots of pos hacking with m/?/g


That's... a really, really terrible idea.
1) What if the HTML you happen to retrieve doesn't have any tables?
2) Your "pos hacking" is likely difficult to read and therefore
difficult to maintain
3) WHY would you want to do that, instead of splitting on newlines, as
I said in my response? You said you wanted to process the file line by
line, so just do that.

I'm thoroughly by your desire to use this "solution"

Paul Lalli

Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2009 codecomments.com