For Programmers: Free Programming Magazines  


Home > Archive > PERL Beginners > August 2004 > Regular Expressions - multiplelines









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Regular Expressions - multiplelines
Roman Hanousek

2004-07-28, 8:56 pm


Hi
I have a txt file that contains the following info:


Text start here --------------------------------

$/Dev/something/something.com/blah1:
default.asp user Exc 26/07/04 1:42p
[TEST-DEV]F:\content\blah\wwwroot\blah1

$/Dev/something/something.com/something:
test.asp user Exc 26/07/04 3:09p
[TEST-DEV] F:\content\blah\wwwroot\something

$/Dev/something/something.com/blah:
Blah.inc user Exc 23/07/04 11:13a
[BAGEL-DEV] F:\content\blah\wwwroot\blah

$/Dev/something/something.com/blah/blah/20GB:
Something-were.htm user Exc 23/07/04 11:24a
[TEST- DEV]F:\content\blah\wwwroot\blah\blah\20
GB

Text end here --------------------------------


The result I am trying to get is the this:

$/Dev/something/something.com/blah1/default.asp
$/Dev/something/something.com/something/test.asp
$/Dev/something/something.com/blah/Blah.inc
$/Dev/something/something.com/blah/blah/20GB/Something-were.htm



I been trying to various combinations of the below. If i could work out how
to match the file name on the line below.

Perl code start --
use strict;

my $file = shift;
my $line;


open(IN, "<$file") || die "$!";
while($line =<IN> )
{
$line =~ /(\$\S+):.(\S+)/is;

print "match1: $1; match2: $2 \n";
}
Perl code end --

Damon Allen Davison

2004-07-28, 8:56 pm

Hi Roman,

Roman wrote:

> use strict;
>
> my $file = shift;
> my $line;
>
>
> open(IN, "<$file") || die "$!";
> while($line =<IN> )
> {
> $line =~ /(\$\S+):.(\S+)/is;
>
> print "match1: $1; match2: $2 \n";
> }


I would redefine the record separator for your while loop to process
each of the four records in your example as a whole unit, instead of
going line-by-line. You can then write a regular expression to capture
the path and the file name.

#!/usr/bin/perl

use strict;
use warnings;

local $/ = "\n\n"; # record separator = two consecutive line breaks

while (<> ) {
m/ ^ # beginning of line
( # start capture
[^:]+ # capture all non-colon chars
)
: # colon
\s+ # space chars, including line break
( # second capture
[^.]+ # one or more non-period chars
\. # a period
\S+ # one or more non-space chars
)
/x; # m/^([^:]+):\s+([^.]+\.\S+)/;
print $1,'/',$2,"\n";
}

Using shift to set $file to the file name and explicitly defining $line
are not necessary. This script does the same thing as yours. This is
Perl's way of making things easier for you. ;)

I would also suggest you 'use warnings' when writing code. It'll catch
a lot of errors for you, such as a 'Use of uninitialized value in
concatenation (.) or string' in your code. You can get a (partial)
explanation by adding a 'use diagnostics' line to the beginning of your
program.

Best,

Damon


--

Damon Allen DAVISON
http://www.allolex.net
John W. Krahn

2004-07-28, 8:56 pm

Roman Hanousek wrote:
> Hi


Hello,

> I have a txt file that contains the following info:
>
> Text start here --------------------------------
>
> $/Dev/something/something.com/blah1:
> default.asp user Exc 26/07/04 1:42p
> [TEST-DEV]F:\content\blah\wwwroot\blah1
>
> $/Dev/something/something.com/something:
> test.asp user Exc 26/07/04 3:09p
> [TEST-DEV] F:\content\blah\wwwroot\something
>
> $/Dev/something/something.com/blah:
> Blah.inc user Exc 23/07/04 11:13a
> [BAGEL-DEV] F:\content\blah\wwwroot\blah
>
> $/Dev/something/something.com/blah/blah/20GB:
> Something-were.htm user Exc 23/07/04 11:24a
> [TEST- DEV]F:\content\blah\wwwroot\blah\blah\20
GB
>
> Text end here --------------------------------
>
>
> The result I am trying to get is the this:
>
> $/Dev/something/something.com/blah1/default.asp
> $/Dev/something/something.com/something/test.asp
> $/Dev/something/something.com/blah/Blah.inc
> $/Dev/something/something.com/blah/blah/20GB/Something-were.htm
>
>
>
> I been trying to various combinations of the below. If i could work out how
> to match the file name on the line below.
>
> Perl code start --
> use strict;
>
> my $file = shift;
> my $line;
>
>
> open(IN, "<$file") || die "$!";
> while($line =<IN> )
> {
> $line =~ /(\$\S+):.(\S+)/is;
>
> print "match1: $1; match2: $2 \n";
> }
> Perl code end --


This works for me with the data provided:

#!/usr/bin/perl
use warnings;
use strict;

my $file = shift or die "usage: $0 filename\n";
open IN, '<', $file or die "Cannot open $file: $!";
$/ = ''; # set $/ to paragraph mode
while ( <IN> ) {
s!:\n!/!;
print /^(\S+)/, "\n";
}

__END__



John
--
use Perl;
program
fulfillment
Zeus Odin

2004-08-01, 8:55 am

Another alternative:

#!/usr/bin/perl
use warnings;
use strict;

my $eol = '[\n\r\x0A\x0D]';
$/ = '';

while (<DATA> ) {
print /^(.*):$eol([^ ]*)/ ? "$1/$2\n" : $_;
}

__DATA__
$/Dev/something/something.com/blah1:
default.asp user Exc 26/07/04 1:42p
[TEST-DEV]F:\content\blah\wwwroot\blah1

$/Dev/something/something.com/something:
test.asp user Exc 26/07/04 3:09p
[TEST-DEV] F:\content\blah\wwwroot\something

$/Dev/something/something.com/blah:
Blah.inc user Exc 23/07/04 11:13a
[BAGEL-DEV] F:\content\blah\wwwroot\blah

$/Dev/something/something.com/blah/blah/20GB:
Something-were.htm user Exc 23/07/04 11:24a
[TEST- DEV]F:\content\blah\wwwroot\blah\blah\20
GB


Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com