For Programmers: Free Programming Magazines  


Home > Archive > PERL Miscellaneous > April 2005 > Extracting the link text









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Extracting the link text
Fritz Bayer

2005-04-28, 8:57 am

Hi,

I would like to extract all the links from a html page, which I store
in a string variable.

For each link, I would also like to print out the link text, however,
omitting ALL possible tags in which the text could be embedded.

I'm looking for a regular expression, which does just that. Can
somebody help me out?

Fritz
Chris Mattern

2005-04-28, 8:57 am

Fritz Bayer wrote:

> Hi,
>
> I would like to extract all the links from a html page, which I store
> in a string variable.
>
> For each link, I would also like to print out the link text, however,
> omitting ALL possible tags in which the text could be embedded.


Use one of the modules for parsing HTML.
>
> I'm looking for a regular expression, which does just that.


No, you aren't, because there ain't no such thing.

> Can
> somebody help me out?
>
> Fritz


--
Christopher Mattern

"Which one you figure tracked us?"
"The ugly one, sir."
"...Could you be more specific?"
chris-usenet@roaima.co.uk

2005-04-28, 8:57 am

Fritz Bayer <fritz-bayer@web.de> wrote:
> I would like to extract all the links from a html page [...]
> I'm looking for a regular expression, which does just that. Can
> somebody help me out?


perldoc -q "remove html"

Chris
Gunnar Hjalmarsson

2005-04-28, 8:57 am

Chris Mattern wrote:
> Fritz Bayer wrote:
>
> Use one of the modules for parsing HTML.


HTML::LinkExtor sounds promising. :)

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
Gunnar Hjalmarsson

2005-04-28, 8:57 am

chris-usenet@roaima.co.uk wrote:
> Fritz Bayer <fritz-bayer@web.de> wrote:
>
> perldoc -q "remove html"


Better yet:

perldoc -q "extract URLs"

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2009 codecomments.com