Home > Archive > PERL Miscellaneous > April 2005 > Extracting the link text
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Extracting the link text
|
|
| Fritz Bayer 2005-04-28, 8:57 am |
| Hi,
I would like to extract all the links from a html page, which I store
in a string variable.
For each link, I would also like to print out the link text, however,
omitting ALL possible tags in which the text could be embedded.
I'm looking for a regular expression, which does just that. Can
somebody help me out?
Fritz
| |
| Chris Mattern 2005-04-28, 8:57 am |
| Fritz Bayer wrote:
> Hi,
>
> I would like to extract all the links from a html page, which I store
> in a string variable.
>
> For each link, I would also like to print out the link text, however,
> omitting ALL possible tags in which the text could be embedded.
Use one of the modules for parsing HTML.
>
> I'm looking for a regular expression, which does just that.
No, you aren't, because there ain't no such thing.
> Can
> somebody help me out?
>
> Fritz
--
Christopher Mattern
"Which one you figure tracked us?"
"The ugly one, sir."
"...Could you be more specific?"
| |
| chris-usenet@roaima.co.uk 2005-04-28, 8:57 am |
| Fritz Bayer <fritz-bayer@web.de> wrote:
> I would like to extract all the links from a html page [...]
> I'm looking for a regular expression, which does just that. Can
> somebody help me out?
perldoc -q "remove html"
Chris
| |
| Gunnar Hjalmarsson 2005-04-28, 8:57 am |
| Chris Mattern wrote:
> Fritz Bayer wrote:
>
> Use one of the modules for parsing HTML.
HTML::LinkExtor sounds promising. :)
--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
| |
| Gunnar Hjalmarsson 2005-04-28, 8:57 am |
| chris-usenet@roaima.co.uk wrote:
> Fritz Bayer <fritz-bayer@web.de> wrote:
>
> perldoc -q "remove html"
Better yet:
perldoc -q "extract URLs"
--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
|
|
|
|
|