For Programmers: Free Programming Magazines  


Home > Archive > PHP Language > July 2006 > regular expression for parsing html using preg_match_all









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author regular expression for parsing html using preg_match_all
crescent_au@yahoo.com

2006-07-06, 6:56 pm

Hi all,

I've been trying unsuccessfully to get the text from html page. Html
tag that I'm interested in looks like this:

<a class=link
href="http://www.something.com/_something.php?type=cart">Shopping
Cart</a>
<div><em class=newentry><a href=http://nothing.com>New
Age</a></em></div>

>From the above tag, I want to extract "Shopping Cart". I'm not very

good with RE. I tried this:
$lines = file_get_contents("http://theabovetag.com/page.html");
preg_match_all("/(<a\ class\=link\ href\=(.*)> )(<\/a> )/", $lines,
$matches1);

The above RE gives me "Shopping Cart" plus "New Age" as well. I just
want "Shopping Cart". What am I doing wrong? My RE is somehow ignoring
</a> tag right after Shopping Cart and instead accepting </a> after New
Age. Please help!

Jos van Uden

2006-07-08, 7:56 am

crescent_au@yahoo.com wrote:

> preg_match_all("/(<a\ class\=link\ href\=(.*)> )(<\/a> )/", $lines,
> $matches1);
>

The above RE gives me "Shopping Cart" plus "New Age" as well. I just
> want "Shopping Cart". What am I doing wrong? My RE is somehow ignoring
> </a> tag right after Shopping Cart and instead accepting </a> after New
> Age. Please help!


By default the multipliers are "greedy" and match as
much as possible. You can stop this by placing a question
mark behind the multiplier like (.*?)

Then it will match as little a possible.

Jos

PS. This little prog may be useful if you have trouble
with Regexes: http://www.regexbuddy.com/

(not mine)

Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com