For Programmers: Free Programming Magazines  


Home > Archive > PERL Beginners > September 2006 > Matching a sub pattern and processing results









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Matching a sub pattern and processing results
David Gilden

2006-09-25, 7:58 am

Greetings,

I am having a little trouble understanding matching and getting the sub pat=
tern=20
saved to a Var. so that I can do a munge. I want to take the line returns a=
nd change them into=20
pipe characters '|' =20
All data records start with a date i.e. 01/01/2006 =20
But there are fields in between that are one multiple lines. Sample data is=
below the script.

Here's what I have so far....=20

#!/usr/bin/perl -w


open(FI,"test.txt") || die "Read in, Could not find File, $!";
my @files =3D <FI>;
close(FI);

for (@files){

chomp($_);

$_ =3D~m|(\d+/\d+/\d+)(?s)(.+?)\d+/\d+/\d+|;

$2 =3D~tr/\n/|/;

print "$1$2\n";

# save out modified result...
}

Given this data:

09/01/2006|03:29AM Password for qsocordovam reset by Self Reset utility=20
Username: qsocordovam=20
ClientNumber: 77927=20
IP address: 24.248.1.241 =20
09/01/2006|07:53AM Failed reset attempt=20
Username: tmpcollic03=20
ClientNumber: 110330=20
IP address: 152.121.16.7=20
=46ailed challenge question(s) =20
09/01/2006|07:55AM Failed reset attempt=20
Username: tmpcollic03=20
ClientNumber: 110330=20
IP address: 152.121.16.7=20
=46ailed challenge question(s) =20
09/01/2006|08:03AM Failed reset attempt=20

Desired result:=20
09/01/2006|03:29AM Password for qsocordovam reset by Self Reset utility|Use=
rname: qsocordovam|ClientNumber: 77927|IP address: 24.248.1.241 =20
#Next record.... starts with a date of format : 09/01/2006


Thanks for any input on my output :)

Dave Gilden =20
(kora musician / audiophile / webmaster @ www.coraconnection.com / Ft. Wor=
th, TX, USA)



Mumia W.

2006-09-25, 7:58 am

On 09/24/2006 07:04 PM, David Gilden wrote:
> Greetings,
>
> I am having a little trouble understanding matching and getting the sub pattern
> saved to a Var. so that I can do a munge. I want to take the line returns and change them into
> pipe characters '|'
> All data records start with a date i.e. 01/01/2006
> But there are fields in between that are one multiple lines. Sample data is below the script.
>
> Here's what I have so far....
>
> #!/usr/bin/perl -w
>
>
> open(FI,"test.txt") || die "Read in, Could not find File, $!";
> my @files = <FI>;
> close(FI);
>
> for (@files){
>
> chomp($_);
>
> $_ =~m|(\d+/\d+/\d+)(?s)(.+?)\d+/\d+/\d+|;
>
> $2 =~tr/\n/|/;
>
> print "$1$2\n";
>
> # save out modified result...
> }
>
> Given this data:
>
> 09/01/2006|03:29AM Password for qsocordovam reset by Self Reset utility
> Username: qsocordovam
> ClientNumber: 77927
> IP address: 24.248.1.241
> 09/01/2006|07:53AM Failed reset attempt
> Username: tmpcollic03
> ClientNumber: 110330
> IP address: 152.121.16.7
> Failed challenge question(s)
> 09/01/2006|07:55AM Failed reset attempt
> Username: tmpcollic03
> ClientNumber: 110330
> IP address: 152.121.16.7
> Failed challenge question(s)
> 09/01/2006|08:03AM Failed reset attempt
>
> Desired result:
> 09/01/2006|03:29AM Password for qsocordovam reset by Self Reset utility|Username: qsocordovam|ClientNumber: 77927|IP address: 24.248.1.241
> #Next record.... starts with a date of format : 09/01/2006
>
>
> Thanks for any input on my output :)
>
> Dave Gilden
> (kora musician / audiophile / webmaster @ www.coraconnection.com / Ft. Worth, TX, USA)
>
>
>
>


If the file is small enough to slurp into memory, you could do something
like this:

#!/usr/bin/perl

use strict;
use warnings;

my $data = join '', <DATA>;
my @blocks = $data =~ m/^([0-9\/]+.*?(?=^[0-9\/]+))/gsm;
print "\n";

s/\n/\|/g for @blocks;
print "$_\n\n" for @blocks;


__DATA__
09/01/2006|03:29AM Password for qsocordovam reset by Self Reset utility
Username: qsocordovam
ClientNumber: 77927
IP address: 24.248.1.241
09/01/2006|07:53AM Failed reset attempt
Username: tmpcollic03
ClientNumber: 110330
IP address: 152.121.16.7
Failed challenge question(s)
09/01/2006|07:55AM Failed reset attempt
Username: tmpcollic03
ClientNumber: 110330
IP address: 152.121.16.7
Failed challenge question(s)
09/01/2006|08:03AM Failed reset attempt


John W. Krahn

2006-09-25, 7:58 am

David Gilden wrote:
> Greetings,


Hello,

> I am having a little trouble understanding matching and getting the sub pattern
> saved to a Var. so that I can do a munge. I want to take the line returns and change them into
> pipe characters '|'
> All data records start with a date i.e. 01/01/2006
> But there are fields in between that are one multiple lines. Sample data is below the script.
>
> Here's what I have so far....
>
> #!/usr/bin/perl -w
>
>
> open(FI,"test.txt") || die "Read in, Could not find File, $!";
> my @files = <FI>;
> close(FI);
>
> for (@files){
>
> chomp($_);
>
> $_ =~m|(\d+/\d+/\d+)(?s)(.+?)\d+/\d+/\d+|;
>
> $2 =~tr/\n/|/;
>
> print "$1$2\n";
>
> # save out modified result...
> }
>
> Given this data:
>
> 09/01/2006|03:29AM Password for qsocordovam reset by Self Reset utility
> Username: qsocordovam
> ClientNumber: 77927
> IP address: 24.248.1.241
> 09/01/2006|07:53AM Failed reset attempt
> Username: tmpcollic03
> ClientNumber: 110330
> IP address: 152.121.16.7
> Failed challenge question(s)
> 09/01/2006|07:55AM Failed reset attempt
> Username: tmpcollic03
> ClientNumber: 110330
> IP address: 152.121.16.7
> Failed challenge question(s)
> 09/01/2006|08:03AM Failed reset attempt
>
> Desired result:
> 09/01/2006|03:29AM Password for qsocordovam reset by Self Reset utility|Username: qsocordovam|ClientNumber: 77927|IP address: 24.248.1.241
> #Next record.... starts with a date of format : 09/01/2006


This should do what you want:


open FI, 'test.txt' or die "Could not open 'test.txt' $!";

my @lines;
while ( <FI> ) {
s/\s+\z//;
if ( m!^\d\d/\d\d/\d{4}\|! ) {
print join( '|', splice @lines ), "\n";
}
push @lines, $_;
}
print join( '|', @lines ), "\n";

close FI;




John
--
use Perl;
program
fulfillment
John W. Krahn

2006-09-25, 7:58 am

Mumia W. wrote:
> On 09/24/2006 07:04 PM, David Gilden wrote:
>
> If the file is small enough to slurp into memory, you could do something
> like this:
>
> #!/usr/bin/perl
>
> use strict;
> use warnings;
>
> my $data = join '', <DATA>;
> my @blocks = $data =~ m/^([0-9\/]+.*?(?=^[0-9\/]+))/gsm;
> print "\n";
>
> s/\n/\|/g for @blocks;
> print "$_\n\n" for @blocks;


Your code puts a | at the end of the record as well as between the fields and
it doesn't remove trailing whitespace from the fields and it skips the last
record. To fix:

my $data = do { local $/; <DATA> };
chomp( my @blocks = $data =~ m!^([\d/]+.*?(?=^[\d/]+|\z))!gsm );
print "\n";

s/\s*\n/|/g, s/\s*\z// for @blocks;
print "$_\n\n" for @blocks;




John
--
use Perl;
program
fulfillment
John W. Krahn

2006-09-25, 7:58 am

John W. Krahn wrote:
>
> This should do what you want:
>
>
> open FI, 'test.txt' or die "Could not open 'test.txt' $!";
>
> my @lines;
> while ( <FI> ) {
> s/\s+\z//;
> if ( m!^\d\d/\d\d/\d{4}\|! ) {


Correction:

if ( @lines && m!^\d\d/\d\d/\d{4}\|! ) {


> print join( '|', splice @lines ), "\n";
> }
> push @lines, $_;
> }
> print join( '|', @lines ), "\n";
>
> close FI;



John
--
use Perl;
program
fulfillment
Mumia W.

2006-09-25, 6:57 pm

On 09/25/2006 06:47 AM, John W. Krahn wrote:
> Mumia W. wrote:
>
> Your code puts a | at the end of the record as well as between the fields and
> it doesn't remove trailing whitespace from the fields and it skips the last
> record. To fix:
>
> my $data = do { local $/; <DATA> };
> chomp( my @blocks = $data =~ m!^([\d/]+.*?(?=^[\d/]+|\z))!gsm );
> print "\n";
>
> s/\s*\n/|/g, s/\s*\z// for @blocks;
> print "$_\n\n" for @blocks;
>
>
>
>
> John


Uhhh. I should've looked more closely at my output. Thanks.


Rob Dixon

2006-09-25, 6:57 pm

David Gilden wrote:
>
> I am having a little trouble understanding matching and getting the sub pattern
> saved to a Var. so that I can do a munge. I want to take the line returns and

change them into
> pipe characters '|'
> All data records start with a date i.e. 01/01/2006
> But there are fields in between that are one multiple lines. Sample data is

below the script.
>
> Here's what I have so far....
>
> #!/usr/bin/perl -w
>
>
> open(FI,"test.txt") || die "Read in, Could not find File, $!";
> my @files = <FI>;
> close(FI);
>
> for (@files){
>
> chomp($_);
>
> $_ =~m|(\d+/\d+/\d+)(?s)(.+?)\d+/\d+/\d+|;
>
> $2 =~tr/\n/|/;
>
> print "$1$2\n";
>
> # save out modified result...
> }
>
> Given this data:
>
> 09/01/2006|03:29AM Password for qsocordovam reset by Self Reset utility
> Username: qsocordovam
> ClientNumber: 77927
> IP address: 24.248.1.241
> 09/01/2006|07:53AM Failed reset attempt
> Username: tmpcollic03
> ClientNumber: 110330
> IP address: 152.121.16.7
> Failed challenge question(s)
> 09/01/2006|07:55AM Failed reset attempt
> Username: tmpcollic03
> ClientNumber: 110330
> IP address: 152.121.16.7
> Failed challenge question(s)
> 09/01/2006|08:03AM Failed reset attempt
>
> Desired result:
> 09/01/2006|03:29AM Password for qsocordovam reset by Self Reset

utility|Username: qsocordovam|ClientNumber: 77927|IP address: 24.248.1.241
> #Next record.... starts with a date of format : 09/01/2006


The code below will do the trick.

HTH,

Rob


use strict;
use warnings;

my @lines;

while (<DATA> ) {
s/\s*$//;
if (m#^\d\d/\d\d/\d{4}#) {
push @lines, $_;
}
else {
$lines[-1] .= "|$_";
}
}

print "$_\n" foreach @lines;


*OUTPUT*

09/01/2006|03:29AM Password for qsocordovam reset by Self Reset
utility|Username: qsocordovam|ClientNumber: 77927|IP address: 24.248.1.241
09/01/2006|07:53AM Failed reset attempt|Username: tmpcollic03|ClientNumber:
110330|IP address: 152.121.16.7|Failed challenge question(s)
09/01/2006|07:55AM Failed reset attempt|Username: tmpcollic03|ClientNumber:
110330|IP address: 152.121.16.7|Failed challenge question(s)
09/01/2006|08:03AM Failed reset attempt
David Gilden

2006-09-25, 6:57 pm

Dear Perl Gurus,

Still struggling here...
The problem is the data in the middle of the match is on multiple lines.
Please reply directly and CC the list.
Thanks,
D.G.
(kora musician / audiophile / webmaster @ www.coraconnection.com / Ft. Wor=
th, TX, USA)[color=darkred]
I am having a little trouble understanding matching and getting the sub pat=
tern=20
saved to a Var. so that I can do a munge. I want to take the line returns a=
nd change them into=20
pipe characters '|' =20
All data records start with a date i.e. 01/01/2006 =20
But there are fields in between that are one multiple lines. Sample data is=
below the script.

Here's what I have so far.... Revised sill not working!

#!/usr/bin/perl -w


open(FI,"test.txt") || die "Read in, Could not find File, $!";
my @files =3D <FI>;
close(FI);

for (@files){

chomp($_);

#!/usr/bin/perl -w

open(FI,"test.txt") || die "Read in, Could not find File, $!";
my @files =3D <FI>;
close(FI);

for (@files){

chomp($_);

@match =3D ($_ =3D~ m!(\d+\/\d+\/\d+)(.+?)(?=3D\d+/\d+/\d+)!);

$match[1]=3D~tr/\n/|/;

print "$match[0]$match[1]\n";
print "#"x10, "#";=20
#rename($old,$_);
}


# save out modified result...
}

Given this data:

09/01/2006|03:29AM Password for qsocordovam reset by Self Reset utility=20
Username: qsocordovam=20
ClientNumber: 77927=20
IP address: 24.248.1.241 =20
09/01/2006|07:53AM Failed reset attempt=20
Username: tmpcollic03=20
ClientNumber: 110330=20
IP address: 152.121.16.7=20
=46ailed challenge question(s) =20
09/01/2006|07:55AM Failed reset attempt=20
Username: tmpcollic03=20
ClientNumber: 110330=20
IP address: 152.121.16.7=20
=46ailed challenge question(s) =20
09/01/2006|08:03AM Failed reset attempt=20

Desired result:=20
09/01/2006|03:29AM Password for qsocordovam reset by Self Reset utility|Use=
rname: qsocordovam|ClientNumber: 77927|IP address: 24.248.1.241 =20
#Next record.... starts with a date of format : 09/01/2006


__END__


Rob Dixon

2006-09-26, 6:57 pm

David Gilden wrote:
>
> Dear Perl Gurus,
>
> Still struggling here...
> The problem is the data in the middle of the match is on multiple lines.
> Please reply directly and CC the list.


Not sure where you're headed David. There were three posts, all of which solved
your problem as I understand it. What problem are you having now, and how have
we misunderstood your data?

By the way, you need to be registered to post to this list, so you shoudl also
be getting all of the submissions sent to you. So why do you need a reply CCd to
you as well as to the list?

Rob
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com