For Programmers: Free Programming Magazines  


Home > Archive > PERL Beginners > October 2006 > regexp inside hashes









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author regexp inside hashes
Michael Alipio

2006-10-03, 9:58 pm

Hi,

There was this example given to me:

while ( <LOGFILE> ) {
my %extr = (
Start => '',
IP => '',
User => '',
End => '',
/^(Start|IP|User|End)=(.+)/mg
);
print "Start:$extr{Start} IP:$extr{IP}
User:$extr{User} End:$extr{End}\n\n";
}

Reading my logfile in paragraph mode, it has these
lines:

Start: blablablah
IP: blah blah blah
User: blah blah blah
End: blah blah blah

Start: blablablah2
IP: blah blah blah2
User: blah blah blah2
End: blah blah blah2

What the above code does (specifically inside the hash
is to assign the found pattern into the hash keys
using this I guess...
....
/^(Start|IP|User|End)=(.+)/mg
);
....

I just wanted to know how fast did it happened..
Any idea?

My new logfile contains these lines (each is one
continuous line):

Jul 1 01:06:33 my.hostname.com
date=2006-07- 01,time=01:06:46,device_id=FG200A2105403
175,log_id=0101023002,type
=event,subtype=ipsec,pri=notice,vd=root,
loc_ip=192.168.0.4,loc_port=4500,rem_ip=192.168.1.14,rem_port=33552,out_if=wan1,vp
n_tunnel=AxisGlobal,action=negotiate,sta
tus=success,msg="XAUTH
user: ricky authentication successful"

Jul 1 04:45:58 ppp130.dyn242.pacific.net.ph
date=2006-07- 01,time=04:46:09,device_id=FG200A2105403
175,log_id=0101023002,type
=event,subtype=ipsec,pri=notice,vd=root,
loc_ip=192.168.0.5,loc_port=4500,rem_ip=192.16.3.97,rem_port=36036,out_if=wan1
,vpn_tunnel=AxisGlobal,action=negotiate,
status=success,msg="XAUTH
user: susan authentication successful"


Now, my goal is to adapt that code, particularly
obtaining only Start, IP, User. However, those three
targets are not anymore located at the beginning of a
line.

"Start" is the date=.time= combination,
"IP" is found after rem_ip=
"User" is found after "user: "

I'm not really sure how to put my regexp inside my
hash..

while ( <LOGFILE> ) {
my %extr = (
Start => '',
IP => '',
User => '',
/what should i put here??/mg
);
print "Start:$extr{Start} IP:$extr{IP}
User:$extr{User}\n\n";
}


Hope you can help me..
thanks!
-jay

________________________________________
__________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
John W. Krahn

2006-10-04, 3:57 am

Michael Alipio wrote:
> Hi,


Hello,

[ snip ]

> Now, my goal is to adapt that code, particularly
> obtaining only Start, IP, User. However, those three
> targets are not anymore located at the beginning of a
> line.
>
> "Start" is the date=.time= combination,
> "IP" is found after rem_ip=
> "User" is found after "user: "
>
> I'm not really sure how to put my regexp inside my
> hash..
>
> while ( <LOGFILE> ) {
> my %extr = (
> Start => '',
> IP => '',
> User => '',
> /what should i put here??/mg
> );
> print "Start:$extr{Start} IP:$extr{IP}
> User:$extr{User}\n\n";
> }


You don't really need a hash, you could probably do something like this:

$/ = '';
while ( <LOGFILE> ) {
print
'Start:', /date=([^,\s]+)/,
' ', /time=([^,\s]+)/,
' IP:', /rem_ip=([^,\s]+)/,
' User:', /user:\s*(\S+)/,
"\n\n";
}



John
--
Perl isn't a toolbox, but a small machine shop where you can special-order
certain sorts of tools at low cost and in short order. -- Larry Wall
Klaus

2006-10-04, 3:57 am

Michael Alipio wrote:
> There was this example given to me:
>
> while ( <LOGFILE> ) {
> my %extr = (
> Start => '',
> IP => '',
> User => '',
> End => '',
> /^(Start|IP|User|End)=(.+)/mg
> );
> print "Start:$extr{Start} IP:$extr{IP}
> User:$extr{User} End:$extr{End}\n\n";
> }
>
> Reading my logfile in paragraph mode,


Dont speak English, speak Perl.
I assume you mean: local $/ = "";

> it has these lines:
>
> Start: blablablah
> IP: blah blah blah
> User: blah blah blah
> End: blah blah blah
>
> Start: blablablah2
> IP: blah blah blah2
> User: blah blah blah2
> End: blah blah blah2
>
> What the above code does (specifically inside the hash
> is to assign the found pattern into the hash keys
> using this I guess...
> ...
> /^(Start|IP|User|End)=(.+)/mg
> );
> ...


No, it does not.

It reads in paragraphs, one by one, ignores its content and prints out
a fixed string "Start: IP: User: End:\n\n" as many times as there are
paragraphs in the file.

> I just wanted to know how fast did it happened..
> Any idea?


Reading in paragraphs, ignoring its content and printing out a fixed
string would happen quite fast, but I guess this can even be improved
by removing the hash alltogether, like this:

local $/ = "";
while (<LOGFILE> ) {
print "Start: IP: User: End:\n\n";
}

> My new logfile contains these lines (each is one
> continuous line):
>
> Jul 1 01:06:33 my.hostname.com
> date=2006-07- 01,time=01:06:46,device_id=FG200A2105403
175,log_id=0101023002,type
> =event,subtype=ipsec,pri=notice,vd=root,
loc_ip=192.168.0.4,loc_port=4500,rem_ip=192.168.1.14,rem_port=33552,out_if=wan1,vp
> n_tunnel=AxisGlobal,action=negotiate,sta
tus=success,msg="XAUTH
> user: ricky authentication successful"
>
> Jul 1 04:45:58 ppp130.dyn242.pacific.net.ph
> date=2006-07- 01,time=04:46:09,device_id=FG200A2105403
175,log_id=0101023002,type
> =event,subtype=ipsec,pri=notice,vd=root,
loc_ip=192.168.0.5,loc_port=4500,rem_ip=192.16.3.97,rem_port=36036,out_if=wan1
> ,vpn_tunnel=AxisGlobal,action=negotiate,
status=success,msg="XAUTH
> user: susan authentication successful"
>
>
> Now, my goal is to adapt that code, particularly
> obtaining only Start, IP, User. However, those three
> targets are not anymore located at the beginning of a
> line.
>
> "Start" is the date=.time= combination,
> "IP" is found after rem_ip=
> "User" is found after "user: "


Ok, now we are talking! Here we go:

============================
use strict;
use warnings;

while (<LOGFILE> ) {
my ($date, $time) = /date=([^,]+),time=([^,]+)/
or die "Can't find date/time in '$_'";
my $Start = $date.' '.$time;
my ($IP) = /rem_ip=([^,]+)/
or die "Can't find rem_ip in '$_'";
my ($User) = /user: (\w+)/
or die "Can't find user in '$_'";
print "Start: $Start, IP: $IP, User: $User\n";
}
============================

> I'm not really sure how to put my regexp inside my
> hash..


As your program develops more complex, I imagine you might want to put
the result of your regexp inside a hash. But please tell me, why on
earth do you want to put the regexp itself inside a hash ??

Well, I suppose, technically, you could put a regexp inside a hash.

If you really, really must, here is an example:

============================
use strict;
use warnings;

my %extr = (
datetime => qr/date=([^,]+),time=([^,]+)/,
IP => qr/rem_ip=([^,]+)/,
User => qr/user: (\w+)/,
);

while (<LOGFILE> ) {
my ($date, $time) = $_ =~ $extr{datetime}
or die "Can't find date/time in '$_'";
my $Start = $date.' '.$time;
my ($IP) = $_ =~ $extr{IP}
or die "Can't find rem_ip in '$_'";
my ($User) = $_ =~ $extr{User}
or die "Can't find user in '$_'";
print "Start: $Start, IP: $IP, User: $User\n";
}
============================

Paul Lalli

2006-10-04, 7:58 am

Klaus wrote:
> Michael Alipio wrote:
>
> Dont speak English, speak Perl.
> I assume you mean: local $/ = "";


Which, by the way, the official Perl documentation refers to as
"paragraph mode" multiple times. The OP *was* speaking Perl.

>
> No, it does not.
>
> It reads in paragraphs, one by one, ignores its content and prints out
> a fixed string "Start: IP: User: End:\n\n" as many times as there are
> paragraphs in the file.


I don't know what code you're reading. The OP's code does not in any
way "ignore" the content. The pattern match is being evaluating in
list context, which combined with the /g modifier means that it returns
all of its parenthesized subcaptures. So the hash contains four
key/value pairs. Any of the four keys which were not found in the
paragraph have values of the empty string, any that were found have
values of the second sub-capture in the regexp.

The string that's then printed out correctly interpolates the hash
values that were just created.

>
>
> Reading in paragraphs, ignoring its content and printing out a fixed
> string would happen quite fast, but I guess this can even be improved
> by removing the hash alltogether, like this:
>
> local $/ = "";
> while (<LOGFILE> ) {
> print "Start: IP: User: End:\n\n";
> }


Except that it bares no resemblance of any kind to what the original
code does, that's great.

>
> As your program develops more complex, I imagine you might want to put
> the result of your regexp inside a hash. But please tell me, why on
> earth do you want to put the regexp itself inside a hash ??


Because it worked exactly as the OP claimed it did? Do you understand
what a pattern match does in list context?

Paul Lalli

Klaus

2006-10-04, 7:58 am

Paul Lalli wrote:
> Klaus wrote:
>
> Which, by the way, the official Perl documentation refers to as
> "paragraph mode" multiple times. The OP *was* speaking Perl.


I stand corrected.

>
> I don't know what code you're reading. The OP's code does not in any
> way "ignore" the content. The pattern match is being evaluating in
> list context, which combined with the /g modifier means that it returns
> all of its parenthesized subcaptures. So the hash contains four
> key/value pairs. Any of the four keys which were not found in the
> paragraph have values of the empty string, any that were found have
> values of the second sub-capture in the regexp.
>
> The string that's then printed out correctly interpolates the hash
> values that were just created.


I stand corrected again.

>
> Except that it bares no resemblance of any kind to what the original
> code does, that's great.
>
>
> Because it worked exactly as the OP claimed it did? Do you understand
> what a pattern match does in list context?


My previous mail was a complete blunder and I apologize to the OP.

I am afraid I will have to do some homework now:

1. read the posting guidelines, in particular the netiquette

2. improve on my perl skills by reading the documentation

--
Klaus

John W. Krahn

2006-10-05, 6:58 pm

Jay Savage wrote:
> On 10/4/06, John W. Krahn <krahnj@telus.net> wrote:
>
> Probably not, but can someone explain what's going on here? It looks
> to me like the code creates a list of k/v pairs to initialize some
> hash keys with empty values, and then continues the list with a series
> of k/v pairs, returned from the match captures, which immediately
> overrides the vlaues for the keys that were just declared. In other
> words, there are two assignments for the same keys in the same list?
> Or is there some magic that happens when a hash is passed a regex in
> list context so that the assignments really only happen once?
> Wouldn't
>
> my %extr = (/^(Start|IP|User|End)=(.+)/mg);
>
> on its own achieve the same result as
>
> my %extr = (
> Start => '',
> IP => '',
> User => '',
> End => '',
> /^(Start|IP|User|End)=(.+)/mg
> );
>
> Or am I missing something?


If one (or more) of the keys is missing, say 'User', and you print
"User:$extr{User}" you will get a warning. This way all the values will have
a string value and there will be no warning.



John
--
Perl isn't a toolbox, but a small machine shop where you can special-order
certain sorts of tools at low cost and in short order. -- Larry Wall
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2009 codecomments.com