Home > Archive > PERL Beginners > August 2005 > Multi Line Pattern Matching
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Multi Line Pattern Matching
|
|
| Jose Malacara 2005-08-25, 9:55 pm |
| Can someone please provide some assitance with a multi-line matching
problem? I have a datafile that looks like this:
#### Input file ########
START
foo1
bar1
END
START
foo2
bar2
END
START
foo3
bar3
END
I am trying to capture the contents between the START and END
delineators. Here is what I have so far:
#### Script ########
#!/usr/bin/perl
$input_file =3D $ARGV[0];
open (INPUT, "<$input_file") or die "Cannot open $input_file file: $!";
local $/;
while (<INPUT> ){
if ( $_ =3D~ m/^\s*START\s+(.+?)\s+END$/smi ) {
print "$1\n";
}
}
close (INPUT);
However, when I run the script it only appears to match the START/END
sequence the first time. This is what I get:
#### Result ########
# ./test.pl file
foo1
bar1
This is what I'm trying to achieve:
# ./test.pl file
foo1
bar1
foo2
bar2
foo3
bar3
Is the While loop required since I am actually slurping the file
contents into a single variable? Is there a (better) mechanism to
continue matching against the string if its matched once? Any help
would be greatly appreciated!
Thank you,
Jose
| |
| John W. Krahn 2005-08-25, 9:55 pm |
| Jose Malacara wrote:
> Can someone please provide some assitance with a multi-line matching
> problem? I have a datafile that looks like this:
>
> #### Input file ########
> START
> foo1
> bar1
> END
> START
> foo2
> bar2
> END
> START
> foo3
> bar3
> END
>
> I am trying to capture the contents between the START and END
> delineators. Here is what I have so far:
>
> #### Script ########
> #!/usr/bin/perl
> $input_file = $ARGV[0];
> open (INPUT, "<$input_file") or die "Cannot open $input_file file: $!";
> local $/;
> while (<INPUT> ){
> if ( $_ =~ m/^\s*START\s+(.+?)\s+END$/smi ) {
> print "$1\n";
> }
> }
> close (INPUT);
>
> However, when I run the script it only appears to match the START/END
> sequence the first time. This is what I get:
>
> #### Result ########
> # ./test.pl file
> foo1
> bar1
>
> This is what I'm trying to achieve:
> # ./test.pl file
> foo1
> bar1
> foo2
> bar2
> foo3
> bar3
>
> Is the While loop required since I am actually slurping the file
> contents into a single variable? Is there a (better) mechanism to
> continue matching against the string if its matched once? Any help
> would be greatly appreciated!
Here is one way to do it:
#!/usr/bin/perl
use warnings;
use strict;
$input_file = shift;
open INPUT, '<', $input_file or die "Cannot open $input_file file: $!";
local $/ = "END\n";
while ( <INPUT> ) {
s/.*START\n//;
print;
}
close INPUT;
__END__
John
--
use Perl;
program
fulfillment
| |
| Xavier Noria 2005-08-25, 9:55 pm |
| On Aug 26, 2005, at 0:43, Jose Malacara wrote:
> Can someone please provide some assitance with a multi-line matching
> problem? I have a datafile that looks like this:
>
> #### Input file ########
> START
> foo1
> bar1
> END
> START
> foo2
> bar2
> END
> START
> foo3
> bar3
> END
>
> I am trying to capture the contents between the START and END
> delineators. Here is what I have so far:
>
> #### Script ########
> #!/usr/bin/perl
> $input_file = $ARGV[0];
> open (INPUT, "<$input_file") or die "Cannot open $input_file file:
> $!";
> local $/;
> while (<INPUT> ){
> if ( $_ =~ m/^\s*START\s+(.+?)\s+END$/smi ) {
> print "$1\n";
> }
> }
> close (INPUT);
>
> However, when I run the script it only appears to match the START/END
> sequence the first time. This is what I get:
Yes, since $/ is undef, the first call to <INPUT> puts the entire
file into $_, we enter the while only ONCE in consequence, and in the
while block we just do a m//. That's why you get only one match,
there's no looping going on.
A possible fix:
local $/;
my $contents = <INPUT>;
while ($contents =~ m/^\s*START\s+(.+?)\s+END$/smig) {
print "$1\n";
}
Just for the record, the so-called flip-flop operator is really handy
for this kind of files. The flip-flop operator is ".." in scalar
context, and it is documented in perlop.
-- fxn
|
|
|
|
|