Code Comments

Programming Forum and web based access to our favorite programming groups.
For Programmers: Free Programming Magazines | New: Database administration forum
Registration is free! Edit your profileCalendarFind other membersFrequently Asked QuestionsSearch -> 
Post New Thread











Thread
Author

matching multiple occurrences in the same line
Hi,
I have a problem with pattern matching:
i have one very long line, and i'm looking of all occurrences of this
string : <td class="year" rowspan="2">
in the line. Actually, after each iccurrence of this string there is a
number which i need to parse and print, for example i need to extract
345 from this:
<td class="year" rowspan="2">345

i wrote the follow:

while(<FILE> ){
chomp($_);
if (~ m/<td class="year" rowspan="2">(\d+).+</) {print OUT "\t$1";}
}

but it just give me the first occurrence of the pattern.
what's wrong in this?

thanks a lot for your help

Michal


Report this thread to moderator Post Follow-up to this message
Old Post
michal.shmueli@gmail.com
04-27-05 08:58 PM


Re: matching multiple occurrences in the same line
> string : <td class="year" rowspan="2">

Since that looks a lot like HTML, why not use HTML::TokeParser and save
yourself from the regex hassles?

JS



Report this thread to moderator Post Follow-up to this message
Old Post
JayEs
04-27-05 08:58 PM


Re: matching multiple occurrences in the same line
michal.shmueli@gmail.com wrote:
> i have one very long line, and i'm looking of all occurrences of this
> string : <td class="year" rowspan="2">
> in the line. Actually, after each iccurrence of this string there is a
> number which i need to parse and print, for example i need to extract
> 345 from this:
> <td class="year" rowspan="2">345
>
> i wrote the follow:
>
> while(<FILE> ){
>      chomp($_);

Why do you chomp()?

> 	if (~ m/<td class="year" rowspan="2">(\d+).+</) {print OUT "\t$1";}
------------^
What's that?

Use while instead of if, and add the /g modifier. Furthermore, the

.+<

part is not only redundant, but since regular expressions are greedy by
default, also that part prevents you from finding more than one occurrence.

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

Report this thread to moderator Post Follow-up to this message
Old Post
Gunnar Hjalmarsson
04-28-05 01:57 AM


Re: matching multiple occurrences in the same line
JayEs wrote: 
>
> Since that looks a lot like HTML, why not use HTML::TokeParser and save
> yourself from the regex hassles?

The OP is looking for *all* occurrences of that fixed string. The fact
that it's HTML does not make the OP's problem a HTML parsing problem
that 'requires' a parsing module. It can easily be handled using a
regex, even if the string in question starts with '<' and ends with '>'. ;-)

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

Report this thread to moderator Post Follow-up to this message
Old Post
Gunnar Hjalmarsson
04-28-05 01:57 AM


Re: matching multiple occurrences in the same line
JayEs wrote: 
>
> Since that looks a lot like HTML, why not use HTML::TokeParser and
save
> yourself from the regex hassles?
>
> JS

i've tried the following code but it's not working...

use HTML::TokeParser;

$file="res.html"
$p = HTML::TokeParser->new($file);
if ($p->get_tag("td")) {
my $td = $p->get_trimmed_text;
print "Td: $td\n";
}

Am i missing something?

thanks again


Report this thread to moderator Post Follow-up to this message
Old Post
michal.shmueli@gmail.com
04-28-05 01:57 AM


Re: matching multiple occurrences in the same line
yap.. sorry. i've changed a bit and it's working properly...

thanks


Report this thread to moderator Post Follow-up to this message
Old Post
michal.shmueli@gmail.com
04-28-05 01:57 AM


Re: matching multiple occurrences in the same line
Actually, i don't want to use the html parser- it's ok, but i need to
parse more patterns which are not part of the table. so anyway i tried
the follow as you suggested:
while(<FILE> ){
while(~ gm/<td class="year" rowspan="2">(\d+)./) {print OUT "\t$1";}

now i get some compliation errors.
the original line (part) is : <td class="year" rowspan="2">2004</td><td
class="veh" rowspan="2"><a

many thanks


Report this thread to moderator Post Follow-up to this message
Old Post
michal.shmueli@gmail.com
04-28-05 01:57 AM


Re: matching multiple occurrences in the same line
 
>
> The OP is looking for *all* occurrences of that fixed string. The fact
> that it's HTML does not make the OP's problem a HTML parsing problem

<SNIP>

Entirely correct! I simply offered another solution for the same problem.
Tim Toady? ;-)
The fact that the OP is looking for a value (ALL of them) that is prefixed
with the same HTML tag, makes TokeParser a good alternative IMHO. Later the
OP states that he can't use TokeParser because he needs to do more string
matching on non-HTML, but I didn't have that info at the time...

Anyway, both suggestions work on the original problem :-)

JS



Report this thread to moderator Post Follow-up to this message
Old Post
JayEs
04-28-05 01:57 AM


Re: matching multiple occurrences in the same line
michal.shmueli@gmail.com wrote:
> Actually, i don't want to use the html parser- it's ok, but i need to
> parse more patterns which are not part of the table.

Not sure I follow you. The more complex the task is, the more likely a
parsing module is suitable.

> so anyway i tried the follow as you suggested:
> while(<FILE> ){
>   while(~ gm/<td class="year" rowspan="2">(\d+)./) {print OUT "\t$1";}
----------^-^------------------------------------^

That's not what I suggested.
- The '~' character is still there. (I suppose you don't know what
it's supposed to do.)
- Modifiers shall be appended, not prepended, to the regex.
- The dot is still redundant.

For a regex to be a suitable alternative to a module (in certain cases),
you need to know how regexes work. It's obvious that you need to read up
on it:

perldoc perlrequick
perldoc perlretut
perldoc perlre

Good luck!

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

Report this thread to moderator Post Follow-up to this message
Old Post
Gunnar Hjalmarsson
04-28-05 01:57 AM


Re: matching multiple occurrences in the same line
michal.shmueli@gmail.com uitte de volgende tekst op 27/04/2005 21:17:
> Actually, i don't want to use the html parser- it's ok, but i need to
> parse more patterns which are not part of the table. so anyway i tried
> the follow as you suggested:
> while(<FILE> ){
>   while(~ gm/<td class="year" rowspan="2">(\d+)./) {print OUT "\t$1";}
>
> now i get some compliation errors.
> the original line (part) is : <td class="year" rowspan="2">2004</td><td
> class="veh" rowspan="2"><a

The g should come at the end:

while(<FILE> ){
while(~ m/<td class="year" rowspan="2">(\d+)./g) {print OUT "\t$1";}

Furthermore, I don't see what this ~ is doing there, and you don't need
the final dot:

while(m/<td class="year" rowspan="2">(\d+)/g) {print OUT "\t$1"}

or, morre perlish

print OUT "\t$1" while(m/<td class="year" rowspan="2">(\d+)/g);

HTH, H.

--
Hendrik Maryns

Interesting websites:
www.lieverleven.be	(I cooperate)
www.eu04.com		European Referendum Campaign
aouw.org		The Art Of Urban Warfare

Report this thread to moderator Post Follow-up to this message
Old Post
Hendrik Maryns
04-28-05 01:57 AM


Sponsored Links




Last Thread Next Thread Next
Pages (2): [1] 2 »
Search this forum -> 
Post New Thread

PERL Miscellaneous archive

Show a Printable Version Send to friend Email This Page to Someone! subscribe to this thread Receive updates to this thread
Computer Consultants
Programming Jobs
Visual Basic Controls
SQL Server Programming
Webservices
Java Security
Visual Studio
C# Programming
Visual J++
Software engineering
Open source Software
Perl Programming
PHP Programming
ASP Programming
ASP .NET Programming
Visual Basic Programming
Windows Scripting Host
Java Programming
Java Help
Java Beans
VBScript
Cobol
MAC Applications
Unix Programming
Forum Jump:
All times are GMT. The time now is 07:30 PM.

 
Free MCSE Braindumps | Real Estate Topics

Programming forum archive

Copyrights CodeComments.com 2004 - 2006

Powered by vBulletin Copyright 2000-2006 Jelsoft Enterprises Limited.