Home > Archive > PERL Beginners > August 2004 > Regular Expressions - multiplelines
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Regular Expressions - multiplelines
|
|
| Roman Hanousek 2004-07-28, 8:56 pm |
|
Hi
I have a txt file that contains the following info:
Text start here --------------------------------
$/Dev/something/something.com/blah1:
default.asp user Exc 26/07/04 1:42p
[TEST-DEV]F:\content\blah\wwwroot\blah1
$/Dev/something/something.com/something:
test.asp user Exc 26/07/04 3:09p
[TEST-DEV] F:\content\blah\wwwroot\something
$/Dev/something/something.com/blah:
Blah.inc user Exc 23/07/04 11:13a
[BAGEL-DEV] F:\content\blah\wwwroot\blah
$/Dev/something/something.com/blah/blah/20GB:
Something-were.htm user Exc 23/07/04 11:24a
[TEST- DEV]F:\content\blah\wwwroot\blah\blah\20
GB
Text end here --------------------------------
The result I am trying to get is the this:
$/Dev/something/something.com/blah1/default.asp
$/Dev/something/something.com/something/test.asp
$/Dev/something/something.com/blah/Blah.inc
$/Dev/something/something.com/blah/blah/20GB/Something-were.htm
I been trying to various combinations of the below. If i could work out how
to match the file name on the line below.
Perl code start --
use strict;
my $file = shift;
my $line;
open(IN, "<$file") || die "$!";
while($line =<IN> )
{
$line =~ /(\$\S+):.(\S+)/is;
print "match1: $1; match2: $2 \n";
}
Perl code end --
| |
| Damon Allen Davison 2004-07-28, 8:56 pm |
| Hi Roman,
Roman wrote:
> use strict;
>
> my $file = shift;
> my $line;
>
>
> open(IN, "<$file") || die "$!";
> while($line =<IN> )
> {
> $line =~ /(\$\S+):.(\S+)/is;
>
> print "match1: $1; match2: $2 \n";
> }
I would redefine the record separator for your while loop to process
each of the four records in your example as a whole unit, instead of
going line-by-line. You can then write a regular expression to capture
the path and the file name.
#!/usr/bin/perl
use strict;
use warnings;
local $/ = "\n\n"; # record separator = two consecutive line breaks
while (<> ) {
m/ ^ # beginning of line
( # start capture
[^:]+ # capture all non-colon chars
)
: # colon
\s+ # space chars, including line break
( # second capture
[^.]+ # one or more non-period chars
\. # a period
\S+ # one or more non-space chars
)
/x; # m/^([^:]+):\s+([^.]+\.\S+)/;
print $1,'/',$2,"\n";
}
Using shift to set $file to the file name and explicitly defining $line
are not necessary. This script does the same thing as yours. This is
Perl's way of making things easier for you. ;)
I would also suggest you 'use warnings' when writing code. It'll catch
a lot of errors for you, such as a 'Use of uninitialized value in
concatenation (.) or string' in your code. You can get a (partial)
explanation by adding a 'use diagnostics' line to the beginning of your
program.
Best,
Damon
--
Damon Allen DAVISON
http://www.allolex.net
| |
| John W. Krahn 2004-07-28, 8:56 pm |
| Roman Hanousek wrote:
> Hi
Hello,
> I have a txt file that contains the following info:
>
> Text start here --------------------------------
>
> $/Dev/something/something.com/blah1:
> default.asp user Exc 26/07/04 1:42p
> [TEST-DEV]F:\content\blah\wwwroot\blah1
>
> $/Dev/something/something.com/something:
> test.asp user Exc 26/07/04 3:09p
> [TEST-DEV] F:\content\blah\wwwroot\something
>
> $/Dev/something/something.com/blah:
> Blah.inc user Exc 23/07/04 11:13a
> [BAGEL-DEV] F:\content\blah\wwwroot\blah
>
> $/Dev/something/something.com/blah/blah/20GB:
> Something-were.htm user Exc 23/07/04 11:24a
> [TEST- DEV]F:\content\blah\wwwroot\blah\blah\20
GB
>
> Text end here --------------------------------
>
>
> The result I am trying to get is the this:
>
> $/Dev/something/something.com/blah1/default.asp
> $/Dev/something/something.com/something/test.asp
> $/Dev/something/something.com/blah/Blah.inc
> $/Dev/something/something.com/blah/blah/20GB/Something-were.htm
>
>
>
> I been trying to various combinations of the below. If i could work out how
> to match the file name on the line below.
>
> Perl code start --
> use strict;
>
> my $file = shift;
> my $line;
>
>
> open(IN, "<$file") || die "$!";
> while($line =<IN> )
> {
> $line =~ /(\$\S+):.(\S+)/is;
>
> print "match1: $1; match2: $2 \n";
> }
> Perl code end --
This works for me with the data provided:
#!/usr/bin/perl
use warnings;
use strict;
my $file = shift or die "usage: $0 filename\n";
open IN, '<', $file or die "Cannot open $file: $!";
$/ = ''; # set $/ to paragraph mode
while ( <IN> ) {
s!:\n!/!;
print /^(\S+)/, "\n";
}
__END__
John
--
use Perl;
program
fulfillment
| |
| Zeus Odin 2004-08-01, 8:55 am |
| Another alternative:
#!/usr/bin/perl
use warnings;
use strict;
my $eol = '[\n\r\x0A\x0D]';
$/ = '';
while (<DATA> ) {
print /^(.*):$eol([^ ]*)/ ? "$1/$2\n" : $_;
}
__DATA__
$/Dev/something/something.com/blah1:
default.asp user Exc 26/07/04 1:42p
[TEST-DEV]F:\content\blah\wwwroot\blah1
$/Dev/something/something.com/something:
test.asp user Exc 26/07/04 3:09p
[TEST-DEV] F:\content\blah\wwwroot\something
$/Dev/something/something.com/blah:
Blah.inc user Exc 23/07/04 11:13a
[BAGEL-DEV] F:\content\blah\wwwroot\blah
$/Dev/something/something.com/blah/blah/20GB:
Something-were.htm user Exc 23/07/04 11:24a
[TEST- DEV]F:\content\blah\wwwroot\blah\blah\20
GB
|
|
|
|
|