For Programmers: Free Programming Magazines  


Home > Archive > PERL Beginners > August 2007 > Recurse into an HTML tree?









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Recurse into an HTML tree?
johnspeth@yahoo.com

2007-08-25, 7:59 am

Hi Group-

(Please excuse me if this post is a duplicate - free news servers
aren't reliable)

I'm trying to figure out how to recursively scan an HTML tree.
Through
trial and error I've arrived at the solution below except I'm stumped
on how
to recurse into the next level. I can't seem to find a way to
determine if
an item in the content_list is an entry point into the next deeper
level
(that is, not a leaf node). My progress so far is shown in the code
snippet
below. Can anyone provide any clues to what code I can sub for "CAN
RECURSE
DEEPER" in the if() statement?

Thanks, John.

################
sub recurse
{
my @children = @_;

my $itemCount = @children;

for(my $i = 0; $i < $itemCount; $i++)
{
my $item = $children[$i];

my $s = $item->as_text();
my $d = $item->depth();

print "Position $i, depth=$d, '$s'\n";

if("CAN RECURSE DEEPER")
{
recurse($item->content_list);
}
}
}

################
sub main
{
# $htmlFile is the HTM file spec string

# Parse the input file into an HTML tree
my $tree = HTML::TreeBuilder->new();
$tree->parse_file($htmlFile);
recurse($tree->content_list);
$tree->delete;
}

Chas Owens

2007-08-25, 7:59 am

On 8/24/07, johnspeth@yahoo.com <johnspeth@yahoo.com> wrote:
snp
> I'm trying to figure out how to recursively scan an HTML tree.

snip

#!/usr/bin/perl

use strict;
use warnings;

use HTML::TreeBuilder;
my $tree = HTML::TreeBuilder->new();
$tree->parse_file("t.html");

recurse($tree);

sub recurse {
my ($elt, $level) = (@_, 0);
print "\t" x $level, "start ", $elt->tag, "\n";
for my $child ($elt->content_list) {
if (ref $child) {
recurse($child, $level + 1)
} else {
print "\t" x ($level + 1), $child, "\n"; #text node
}
}
print "\t" x $level, "end ", $elt->tag, "\n";
}
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com