For Programmers: Free Programming Magazines  


Home > Archive > PERL Beginners > August 2005 > Multi Line Pattern Matching









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Multi Line Pattern Matching
Jose Malacara

2005-08-25, 9:55 pm

Can someone please provide some assitance with a multi-line matching
problem? I have a datafile that looks like this:

#### Input file ########
START
foo1
bar1
END
START
foo2
bar2
END
START
foo3
bar3
END

I am trying to capture the contents between the START and END
delineators. Here is what I have so far:

#### Script ########
#!/usr/bin/perl
$input_file =3D $ARGV[0];
open (INPUT, "<$input_file") or die "Cannot open $input_file file: $!";
local $/;
while (<INPUT> ){
if ( $_ =3D~ m/^\s*START\s+(.+?)\s+END$/smi ) {
print "$1\n";
}
}
close (INPUT);

However, when I run the script it only appears to match the START/END
sequence the first time. This is what I get:

#### Result ########
# ./test.pl file
foo1
bar1

This is what I'm trying to achieve:
# ./test.pl file
foo1
bar1
foo2
bar2
foo3
bar3

Is the While loop required since I am actually slurping the file
contents into a single variable? Is there a (better) mechanism to
continue matching against the string if its matched once? Any help
would be greatly appreciated!

Thank you,
Jose
John W. Krahn

2005-08-25, 9:55 pm

Jose Malacara wrote:
> Can someone please provide some assitance with a multi-line matching
> problem? I have a datafile that looks like this:
>
> #### Input file ########
> START
> foo1
> bar1
> END
> START
> foo2
> bar2
> END
> START
> foo3
> bar3
> END
>
> I am trying to capture the contents between the START and END
> delineators. Here is what I have so far:
>
> #### Script ########
> #!/usr/bin/perl
> $input_file = $ARGV[0];
> open (INPUT, "<$input_file") or die "Cannot open $input_file file: $!";
> local $/;
> while (<INPUT> ){
> if ( $_ =~ m/^\s*START\s+(.+?)\s+END$/smi ) {
> print "$1\n";
> }
> }
> close (INPUT);
>
> However, when I run the script it only appears to match the START/END
> sequence the first time. This is what I get:
>
> #### Result ########
> # ./test.pl file
> foo1
> bar1
>
> This is what I'm trying to achieve:
> # ./test.pl file
> foo1
> bar1
> foo2
> bar2
> foo3
> bar3
>
> Is the While loop required since I am actually slurping the file
> contents into a single variable? Is there a (better) mechanism to
> continue matching against the string if its matched once? Any help
> would be greatly appreciated!


Here is one way to do it:

#!/usr/bin/perl
use warnings;
use strict;

$input_file = shift;
open INPUT, '<', $input_file or die "Cannot open $input_file file: $!";

local $/ = "END\n";
while ( <INPUT> ) {
s/.*START\n//;
print;
}

close INPUT;

__END__



John
--
use Perl;
program
fulfillment
Xavier Noria

2005-08-25, 9:55 pm

On Aug 26, 2005, at 0:43, Jose Malacara wrote:

> Can someone please provide some assitance with a multi-line matching
> problem? I have a datafile that looks like this:
>
> #### Input file ########
> START
> foo1
> bar1
> END
> START
> foo2
> bar2
> END
> START
> foo3
> bar3
> END
>
> I am trying to capture the contents between the START and END
> delineators. Here is what I have so far:
>
> #### Script ########
> #!/usr/bin/perl
> $input_file = $ARGV[0];
> open (INPUT, "<$input_file") or die "Cannot open $input_file file:
> $!";
> local $/;
> while (<INPUT> ){
> if ( $_ =~ m/^\s*START\s+(.+?)\s+END$/smi ) {
> print "$1\n";
> }
> }
> close (INPUT);
>
> However, when I run the script it only appears to match the START/END
> sequence the first time. This is what I get:


Yes, since $/ is undef, the first call to <INPUT> puts the entire
file into $_, we enter the while only ONCE in consequence, and in the
while block we just do a m//. That's why you get only one match,
there's no looping going on.

A possible fix:

local $/;
my $contents = <INPUT>;
while ($contents =~ m/^\s*START\s+(.+?)\s+END$/smig) {
print "$1\n";
}

Just for the record, the so-called flip-flop operator is really handy
for this kind of files. The flip-flop operator is ".." in scalar
context, and it is documented in perlop.

-- fxn
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com