Home > Archive > PERL Beginners > August 2007 > grabbing content from tree builder
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
grabbing content from tree builder
|
|
| Hunter Barrington 2007-08-31, 7:27 pm |
| so i have the following code:
foreach my $node (@tables){
my @rows = $node->look_down(_tag => 'tr');
foreach my $row (@rows){ #grabbing data from each row at this point
my @part_cell = $row->splice_content(3, 1); #part number
my $cell = $part_cell[0];
next unless defined($cell);
my @children = $cell->content_list();
#my $part_number = @children[-1];
foreach my $child (@children){
#if (ref($child ne "HASH")){
print $child, "\n\n";
}
#my $parts = {$part_number => +{'part_number' =>
$part_number, 'part_image' => '', 'description' => '', 'documents' => '',
#'RoHS' => '', 'availability' => '', 'pricing' => '',
'list_price' => ''}};
}
}
i need to be able to grab just the plain text part of the 4th td tag in
each column and assign it to $part_number
right now $child always gives back two things, the part number i want
and HTML::Element=HASH(0x....)
how do i get what i want? hopefully thats clear, any help would be
greatly appreciated!
| |
| Chas Owens 2007-08-31, 7:27 pm |
| On 8/31/07, Hunter Barrington <godsrock37@gmail.com> wrote:
> so i have the following code:
>
> foreach my $node (@tables){
> my @rows = $node->look_down(_tag => 'tr');
> foreach my $row (@rows){ #grabbing data from each row at this point
> my @part_cell = $row->splice_content(3, 1); #part number
> my $cell = $part_cell[0];
> next unless defined($cell);
> my @children = $cell->content_list();
> #my $part_number = @children[-1];
> foreach my $child (@children){
> #if (ref($child ne "HASH")){
> print $child, "\n\n";
> }
>
> #my $parts = {$part_number => +{'part_number' =>
> $part_number, 'part_image' => '', 'description' => '', 'documents' => '',
> #'RoHS' => '', 'availability' => '', 'pricing' => '',
> 'list_price' => ''}};
> }
> }
>
> i need to be able to grab just the plain text part of the 4th td tag in
> each column and assign it to $part_number
> right now $child always gives back two things, the part number i want
> and HTML::Element=HASH(0x....)
>
> how do i get what i want? hopefully thats clear, any help would be
> greatly appreciated!
I believe you want the as_text method. You can read about it here
http://search.cpan.org/~petek/HTML-...TML/Element.pm#$h->as_text()
| |
| Rob Dixon 2007-08-31, 7:27 pm |
| Hunter Barrington wrote:
> so i have the following code:
>
> foreach my $node (@tables){
> my @rows = $node->look_down(_tag => 'tr');
> foreach my $row (@rows){ #grabbing data from each row at this point
> my @part_cell = $row->splice_content(3, 1); #part number
> my $cell = $part_cell[0];
> next unless defined($cell);
> my @children = $cell->content_list();
> #my $part_number = @children[-1];
> foreach my $child (@children){
> #if (ref($child ne "HASH")){
> print $child, "\n\n";
> }
>
> #my $parts = {$part_number => +{'part_number' =>
> $part_number, 'part_image' => '', 'description' => '', 'documents' => '',
> #'RoHS' => '', 'availability' => '', 'pricing' => '',
> 'list_price' => ''}};
> }
> }
>
> i need to be able to grab just the plain text part of the 4th td tag in
> each column and assign it to $part_number
> right now $child always gives back two things, the part number i want
> and HTML::Element=HASH(0x....)
>
> how do i get what i want? hopefully thats clear, any help would be
> greatly appreciated!
Calling splice_content will remove the <td> elements from the HTML tree, and
while that probably won't matter in this case it's certainly not necessary.
Something like this will do the trick I think.
foreach my $node (@tables) {
my @rows = $node->look_down(_tag => 'tr');
foreach my $row (@rows) {
my @cells = $row->look_down(_tag => 'td');
next unless @cells >= 4;
my $partno = $cells[3]->as_trimmed_text;
print $partno, "\n";
}
}
HTH,
Rob
| |
| Rob Dixon 2007-08-31, 7:27 pm |
| Rob Dixon wrote:
> Hunter Barrington wrote:
>
> Calling splice_content will remove the <td> elements from the HTML tree,
> and
> while that probably won't matter in this case it's certainly not necessary.
> Something like this will do the trick I think.
>
>
> foreach my $node (@tables) {
>
> my @rows = $node->look_down(_tag => 'tr');
>
> foreach my $row (@rows) {
>
> my @cells = $row->look_down(_tag => 'td');
> next unless @cells >= 4;
> my $partno = $cells[3]->as_trimmed_text;
> print $partno, "\n";
> } }
My apologies for the layout of that code. It seems my email client isn't as
WYSIWYG as I would like :(
Rob
|
|
|
|
|