Code Comments
Programming Forum and web based access to our favorite programming groups.Hi All, I have a list of url source files... I need to get a certain "<img src=" from each file. The one thing that separates it from the other <img src tags is it is preceded by <center> for example: <center><img src="/rcp/ObjectServer?table=Images&id=381" but the sequence of img tags is different in each of the files. Is there a way to get the img 'src' tag if the img tag is eq to <center>? Maybe I could write a regex to do this? pointers? I've broken my script down to try and get the <center> <img scr= from just one source file. Below is one attempt where I thought I was getting close ... maybe not... :~). Any suggestions would be greatly appreciated. #!/usr/bin/perl use strict; use warnings; use HTML::TokeParser::Simple; use LWP::Simple; my $url = " http://www.rcpworksmarter.com/rcp/p...jsp?rcpNum=1013 <http://www.rcpworksmarter.com/rcp/p...jsp?rcpNum=1013> "; my $page = get($url) or die "Could not load URL\n"; my $parser = HTML::TokeParser::Simple->new(\$page) or die "Could not parse page"; $parser->get_tag ("img") || die; my $token = $parser->get_token; if ($token->[0] eq "center"); print; # ---end --- Brian Volk HP Products 317.298.9950 x1245 <mailto:bvolk@hpproducts.com> bvolk@hpproducts.com
Post Follow-up to this message----- Original Message ----- From: Brian Volk <BVolk@HPProducts.com> Date: Wednesday, December 22, 2004 12:59 pm Subject: LWP get only <center> img > Hi All, Hello > > I have a list of url source files... I need to get a certain "<img > src="from each file. The one thing that separates it from the > other <img src > tags is it is preceded by <center> for example: <center><img > src="/rcp/ObjectServer?table=Images&id=381" but the sequence of > img tags is > different in each of the files. Is there a way to get the img > 'src' tag if > the img tag is eq to <center>? Maybe I could write a regex to do > this?pointers? The module you are trying to use already has everything you need for the task. > > I've broken my script down to try and get the <center> <img scr= > from just > one source file. > > Below is one attempt where I thought I was getting close ... > maybe not... > :~). Any suggestions would be greatly appreciated. You are real close, you need to use a few other functions from the module. > > > > #!/usr/bin/perl > > use strict; > use warnings; > use HTML::TokeParser::Simple; > use LWP::Simple; > > my $url = " > http://www.rcpworksmarter.com/rcp/p...jsp?rcpNum=1013 > <http://www.rcpworksmarter.com/rcp/p...jsp?rcpNum=1013> "; > my $page = get($url) > or die "Could not load URL\n"; You can avoid all of that if you download the latest release of HTML::TokePa rser::Simple from CPAN. > > my $parser = HTML::TokeParser::Simple->new(\$page) > or die "Could not parse page"; > > $parser->get_tag ("img") || die; > my $token = $parser->get_token; > if ($token->[0] eq "center"); > print; > > # ---end --- Here is one way to do it. It's not a compleate deal, but will work for the t est page you have supplied and as a learning base. #!/usr/bin/perl use strict; use warnings; use HTML::TokeParser::Simple; my $url = 'http://www.rcpworksmarter.com/rcp/products/detail.jsp?rcpNum=1013 '; my $parser = HTML::TokeParser::Simple->new(url => $url) or die "Could not pa rse page"; while ( my $token = $parser->get_token ) { if ( $token->is_start_tag( 'center' ) ) { my $TAG = $parser->get_token(); print $TAG->get_attr('src'); } } HTH, Mark G. P.S. How about a free garbige bin for getting you on the way :O) > > > Brian Volk > HP Products > 317.298.9950 x1245 > <bvolk@hpproducts.com> bvolk@hpproducts.com > > >
Post Follow-up to this messageMark, Thank you so much for your help, that worked great! It turns out that I already had the latest version of HTML::Tokeparser::Simple installed. Thanks again! Brian > -----Original Message----- > From: mgoland@optonline.net [mailto:mgoland@optonline.net] > Sent: Thursday, December 23, 2004 1:22 AM > To: Brian Volk > Cc: Beginners (E-mail) > Subject: Re: LWP get only <center> img > > > > > > ----- Original Message ----- > From: Brian Volk <BVolk@HPProducts.com> > Date: Wednesday, December 22, 2004 12:59 pm > Subject: LWP get only <center> img > > > Hello > The module you are trying to use already has everything you > need for the task. > > You are real close, you need to use a few other functions > from the module. > <http://www.rcpworksmarter.com/rcp/p...jsp?rcpNum=1013> "; > You can avoid all of that if you download the latest release > of HTML::TokeParser::Simple from CPAN. > > > Here is one way to do it. It's not a compleate deal, but will > work for the test page you have supplied and as a learning base. > > #!/usr/bin/perl > > use strict; > use warnings; > use HTML::TokeParser::Simple; > > > my $url = 'http://www.rcpworksmarter.com/rcp/products/detail.jsp?rcpNum=1013'; my $parser = HTML::TokeParser::Simple->new(url => $url) or die "Could not parse page"; while ( my $token = $parser->get_token ) { if ( $token->is_start_tag( 'center' ) ) { my $TAG = $parser->get_token(); print $TAG->get_attr('src'); } } HTH, Mark G. P.S. How about a free garbige bin for getting you on the way :O) > > > Brian Volk > HP Products > 317.298.9950 x1245 > <bvolk@hpproducts.com> bvolk@hpproducts.com > > > -- To unsubscribe, e-mail: beginners-unsubscribe@perl.org For additional commands, e-mail: beginners-help@perl.org <http://learn.perl.org/> <http://learn.perl.org/first-response>
Post Follow-up to this message----- Original Message ----- From: Brian Volk <BVolk@HPProducts.com> Date: Thursday, December 23, 2004 8:55 am Subject: RE: LWP get only <center> img > Mark, > > Thank you so much for your help, that worked great! It turns out > that I > already had the latest version of HTML::Tokeparser::Simple installed. NP Bryon, that is what this list is all about. I am sure next time you'll tr y to give me a hand. Happy Holiday's All...Cheers Mark G. > > Thanks again! > > Brian > > "<img > of > do > scr= > <http://www.rcpworksmarter.com/rcp/p...jsp?rcpNum=1013> "; > 'http://www.rcpworksmarter.com/rcp/products/detail.jsp?rcpNum=1013'; > > my $parser = HTML::TokeParser::Simple->new(url => $url) or die > "Could not > parse page"; > > > > while ( my $token = $parser->get_token ) { > > > if ( $token->is_start_tag( 'center' ) ) { > my $TAG = $parser->get_token(); > print $TAG->get_attr('src'); > } > > } > > HTH, > Mark G. > > P.S. How about a free garbige bin for getting you on the way :O) > > > > -- > To unsubscribe, e-mail: beginners-unsubscribe@perl.org > For additional commands, e-mail: beginners-help@perl.org > <http://learn.perl.org/> <http://learn.perl.org/first-response> > > > > -- > To unsubscribe, e-mail: beginners-unsubscribe@perl.org > For additional commands, e-mail: beginners-help@perl.org > <http://learn.perl.org/> <http://learn.perl.org/first-response> > > >
Post Follow-up to this message
Show a Printable Version
Email This Page to Someone!
Receive updates to this thread
Powered by vBulletin
Copyright 2000-2006 Jelsoft Enterprises Limited.