For Programmers: Free Programming Magazines  


Home > Archive > PERL Beginners > February 2007 > Record separator and regex switch









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Record separator and regex switch
Beginner

2007-02-26, 7:00 pm

Hi,

I am trying to parse some dhcp-lease files to extract the ip, mac and
hostname.

I am struggling to get either, the regex of the $/, correct. I am not
sure which combination of these I should use.

There is some sample data and my best effort below. Can anyone offer
any pointers?

TIA,
Dp.


======= Sample Data =========
....
lease 196.222.237.209 {
starts 5 2007/02/23 17:53:57;
ends 6 2007/02/24 17:53:57;
binding state active;
next binding state free;
hardware ethernet 00:60:04:28:28:01;
client-hostname "lab.mydomain.com";
}
lease 196.222.237.209 {
starts 5 2007/02/23 17:53:57;
ends 6 2007/02/24 17:53:57;
binding state active;
next binding state free;
hardware ethernet 00:60:04:38:38:01;
client-hostname "lab.mydomain.com";
}
lease 196.222.237.195 {
starts 5 2007/02/23 17:54:04;
ends 6 2007/02/24 17:54:04;
binding state active;
next binding state free;
hardware ethernet 00:0c:c1:33:31:0d;
uid "\001\000\014\361\3231\015";
client-hostname "puck";
}
=============================

============== My effort ===========
#!/usr/bin/perl

use strict;
use warnings;

my $file = '/var/lib/dhcp3/dhcpd.leases';
my ($ip,$mac,$host);

#$/ = "}\n";
$/ = '';

open(FH,$file) or die "Can't open $file: $!\n";

while (<FH> ) {
chomp;
($ip,$mac,$host) = ($_ =~
/lease\s+(\d{3}\.\d{3}\.\d{3}\.\d+).*thernet\s+(\d{2}:\d{2}:\d{2}:\d{2
}:\d{2}:\d{2}).*ostname\s+\
"(\w+\.scien.*)"/smg);

print "$ip $mac $host\n";

}
=======================
D. Bolliger

2007-02-26, 7:00 pm

Beginner am Montag, 26. Februar 2007 14:50:
> Hi,


Hi

> I am trying to parse some dhcp-lease files to extract the ip, mac and
> hostname.
>
> I am struggling to get either, the regex of the $/, correct. I am not
> sure which combination of these I should use.
>
> There is some sample data and my best effort below. Can anyone offer
> any pointers?
>
> TIA,
> Dp.
>
>
> ======= Sample Data =========

[moved to __DATA__ section below]
> ============== My effort ===========
> #!/usr/bin/perl
>
> use strict;
> use warnings;
>
> my $file = '/var/lib/dhcp3/dhcpd.leases';
> my ($ip,$mac,$host);
>
> #$/ = "}\n";


used below :-)

> $/ = '';
>
> open(FH,$file) or die "Can't open $file: $!\n";
>
> while (<FH> ) {
> chomp;
> ($ip,$mac,$host) = ($_ =~
> /lease\s+(\d{3}\.\d{3}\.\d{3}\.\d+).*thernet\s+(\d{2}:\d{2}:\d{2}:\d{2
> }:\d{2}:\d{2}).*ostname\s+\
> "(\w+\.scien.*)"/smg);
>
> print "$ip $mac $host\n";
>
> }


To keep the demonstration script short, I use a short regex that should be
more specific

#!/usr/bin/perl

use strict;
use warnings;

{
local $/="}\n";
for (<DATA> ) {
my ($ip,$mac,$host)=
/lease\s+(\S+).*
ethernet\s+(\S+);.*
hostname\s+(\S+);
/sx;
print "IP $ip - MAC $mac - HOST $host\n";
}
}

__DATA__
lease 196.222.237.209 {
starts 5 2007/02/23 17:53:57;
ends 6 2007/02/24 17:53:57;
binding state active;
next binding state free;
hardware ethernet 00:60:04:28:28:01;
client-hostname "lab.mydomain.com";
}
lease 196.222.237.209 {
starts 5 2007/02/23 17:53:57;
ends 6 2007/02/24 17:53:57;
binding state active;
next binding state free;
hardware ethernet 00:60:04:38:38:01;
client-hostname "lab.mydomain.com";
}
lease 196.222.237.195 {
starts 5 2007/02/23 17:54:04;
ends 6 2007/02/24 17:54:04;
binding state active;
next binding state free;
hardware ethernet 00:0c:c1:33:31:0d;
uid "\001\000\014\361\3231\015";
client-hostname "puck";
}
----8<----

IP 196.222.237.209 - MAC 00:60:04:28:28:01 - HOST "lab.mydomain.com"
IP 196.222.237.209 - MAC 00:60:04:38:38:01 - HOST "lab.mydomain.com"
IP 196.222.237.195 - MAC 00:0c:c1:33:31:0d - HOST "puck"

Dani
Beginner

2007-02-26, 7:00 pm

On 26 Feb 2007 at 15:58, D. Bolliger wrote:

> Beginner am Montag, 26. Februar 2007 14:50:
>
> Hi
>
> [moved to __DATA__ section below]
>
> used below :-)
>
>
> To keep the demonstration script short, I use a short regex that should be
> more specific
>
> #!/usr/bin/perl
>
> use strict;
> use warnings;
>
> {
> local $/="}\n";
> for (<DATA> ) {
> my ($ip,$mac,$host)=
> /lease\s+(\S+).*
> ethernet\s+(\S+);.*
> hostname\s+(\S+);
> /sx;
> print "IP $ip - MAC $mac - HOST $host\n";
> }
> }
>
> __DATA__
> lease 196.222.237.209 {
> starts 5 2007/02/23 17:53:57;
> ends 6 2007/02/24 17:53:57;
> binding state active;
> next binding state free;
> hardware ethernet 00:60:04:28:28:01;
> client-hostname "lab.mydomain.com";

.....
....

> Dani
>


Thanx Dani,

That's worked a treat. Just to complete the learning curve, where was
I going wrong?

Thanx,
Dp.



John W. Krahn

2007-02-26, 7:00 pm

Beginner wrote:
> On 26 Feb 2007 at 15:58, D. Bolliger wrote:
>
> That's worked a treat. Just to complete the learning curve, where was
> I going wrong?



> while (<FH> ) {
> chomp;
> ($ip,$mac,$host) = ($_ =~
> /lease\s+(\d{3}\.\d{3}\.\d{3}\.\d+)


IP addresses can contain one, two or three digits for each octet so \d{3} will
not match all addresses.


>. *thernet\s+(\d{2}:\d{2}:\d{2}:\d{2}:\d{2
}:\d{2})


MAC addresses consist of hexadecimal digits but \d only matches decimal digits.


>.*ostname\s+\"(\w+\.scien.*)"/smg);


None of the host names in your example contained the string 'scien'.


> print "$ip $mac $host\n";
>
> }




John
--
Perl isn't a toolbox, but a small machine shop where you can special-order
certain sorts of tools at low cost and in short order. -- Larry Wall
D. Bolliger

2007-02-26, 7:00 pm

Beginner am Montag, 26. Februar 2007 17:02:
> On 26 Feb 2007 at 15:58, D. Bolliger wrote:

Hi
[color=darkred]
[color=darkred]
[snipped]
[color=darkred]
> Thanx Dani,
>
> That's worked a treat. Just to complete the learning curve, where was
> I going wrong?


That should have been part of my first answer, sorry. I'll try and hope I
don't mess with the test versions and the english language :-)

Looking at the regex:

It only matches MACs without a-c hex digits (and there's no 'scien' string in
the data, but that's probably due to the altered hostnames, and I assume that
the \d{3} in the IP matching part is intended.).

In the script version with $/ = '':

This reads all records at once, and this seems to be the reason why you used
the /g modifier. But then, after matching the IP of the first lease, the
first .* will skip all leases except the last one for the mac/host matching.
So you (or I respectively) get a single wrong result with the IP of the
first lease and mac/host of the last lease (after correcting the mac regex
part).
So you would have to alter the .* into the non-greedy .*? (to avoid
skipping leases) and also append a .*? at the end of the regex to match what
is "between the leases" [argh!] (to avoid only one match).

But even if the regex is correct, and the ip/mac/host of every lease is
matched, only the first 3 captured matches get stored into ($ip,$mac,$host),
and the others are discarded (which would not be the case with
@data=/....../g).

When you tried with $/ = "}\n":

I don't know neither how the regex looked like in this case, nor what went
wrong then, so I can't say much.

Other notes:

The $_=~ is not necessary, because if the left side of =~ is missing, $_ is
used by default.

To read all leases, you could also do it like so to avoid the somehow
misleading while loop:
{
local $/; # sets $/ to undef
my $data=<FH>; # slurp entire file
# ...match with /g modifier...
}

chomp is not necessary.

Hope that covers the most important issues :-)

Dani
Beginner

2007-02-26, 7:00 pm

On 26 Feb 2007 at 18:13, D. Bolliger wrote:

> Beginner am Montag, 26. Februar 2007 17:02:
>
> Hi
>
...snip[color=darkred]
>
>
> That should have been part of my first answer, sorry. I'll try and hope I
> don't mess with the test versions and the english language :-)
>
> Looking at the regex:
>
> It only matches MACs without a-c hex digits (and there's no 'scien' string in
> the data, but that's probably due to the altered hostnames, and I assume that
> the \d{3} in the IP matching part is intended.).
>
> In the script version with $/ = '':
>
> This reads all records at once, and this seems to be the reason why you used
> the /g modifier. But then, after matching the IP of the first lease, the
> first .* will skip all leases except the last one for the mac/host matching.
> So you (or I respectively) get a single wrong result with the IP of the
> first lease and mac/host of the last lease (after correcting the mac regex
> part).
> So you would have to alter the .* into the non-greedy .*? (to avoid
> skipping leases) and also append a .*? at the end of the regex to match what
> is "between the leases" [argh!] (to avoid only one match).
>
> But even if the regex is correct, and the ip/mac/host of every lease is
> matched, only the first 3 captured matches get stored into ($ip,$mac,$host),
> and the others are discarded (which would not be the case with
> @data=/....../g).
>
> When you tried with $/ = "}\n":
>
> I don't know neither how the regex looked like in this case, nor what went
> wrong then, so I can't say much.
>
> Other notes:
>
> The $_=~ is not necessary, because if the left side of =~ is missing, $_ is
> used by default.
>
> To read all leases, you could also do it like so to avoid the somehow
> misleading while loop:
> {
> local $/; # sets $/ to undef
> my $data=<FH>; # slurp entire file
> # ...match with /g modifier...
> }
>
> chomp is not necessary.
>
> Hope that covers the most important issues :-)
>
> Dani


Thanx Dani (and John) that's a very complete reply with a lot of
useful stuff to bear in mind.

There were a couple of errors that slipped in after I tried to
ofuscate my domain and ip range but I was missing a great deal.

Thanx again.
Dp.

Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com