Home > Archive > PERL Beginners > February 2007 > Record separator and regex switch
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Record separator and regex switch
|
|
| Beginner 2007-02-26, 7:00 pm |
| Hi,
I am trying to parse some dhcp-lease files to extract the ip, mac and
hostname.
I am struggling to get either, the regex of the $/, correct. I am not
sure which combination of these I should use.
There is some sample data and my best effort below. Can anyone offer
any pointers?
TIA,
Dp.
======= Sample Data =========
....
lease 196.222.237.209 {
starts 5 2007/02/23 17:53:57;
ends 6 2007/02/24 17:53:57;
binding state active;
next binding state free;
hardware ethernet 00:60:04:28:28:01;
client-hostname "lab.mydomain.com";
}
lease 196.222.237.209 {
starts 5 2007/02/23 17:53:57;
ends 6 2007/02/24 17:53:57;
binding state active;
next binding state free;
hardware ethernet 00:60:04:38:38:01;
client-hostname "lab.mydomain.com";
}
lease 196.222.237.195 {
starts 5 2007/02/23 17:54:04;
ends 6 2007/02/24 17:54:04;
binding state active;
next binding state free;
hardware ethernet 00:0c:c1:33:31:0d;
uid "\001\000\014\361\3231\015";
client-hostname "puck";
}
=============================
============== My effort ===========
#!/usr/bin/perl
use strict;
use warnings;
my $file = '/var/lib/dhcp3/dhcpd.leases';
my ($ip,$mac,$host);
#$/ = "}\n";
$/ = '';
open(FH,$file) or die "Can't open $file: $!\n";
while (<FH> ) {
chomp;
($ip,$mac,$host) = ($_ =~
/lease\s+(\d{3}\.\d{3}\.\d{3}\.\d+).*thernet\s+(\d{2}:\d{2}:\d{2}:\d{2
}:\d{2}:\d{2}).*ostname\s+\
"(\w+\.scien.*)"/smg);
print "$ip $mac $host\n";
}
=======================
| |
| D. Bolliger 2007-02-26, 7:00 pm |
| Beginner am Montag, 26. Februar 2007 14:50:
> Hi,
Hi
> I am trying to parse some dhcp-lease files to extract the ip, mac and
> hostname.
>
> I am struggling to get either, the regex of the $/, correct. I am not
> sure which combination of these I should use.
>
> There is some sample data and my best effort below. Can anyone offer
> any pointers?
>
> TIA,
> Dp.
>
>
> ======= Sample Data =========
[moved to __DATA__ section below]
> ============== My effort ===========
> #!/usr/bin/perl
>
> use strict;
> use warnings;
>
> my $file = '/var/lib/dhcp3/dhcpd.leases';
> my ($ip,$mac,$host);
>
> #$/ = "}\n";
used below :-)
> $/ = '';
>
> open(FH,$file) or die "Can't open $file: $!\n";
>
> while (<FH> ) {
> chomp;
> ($ip,$mac,$host) = ($_ =~
> /lease\s+(\d{3}\.\d{3}\.\d{3}\.\d+).*thernet\s+(\d{2}:\d{2}:\d{2}:\d{2
> }:\d{2}:\d{2}).*ostname\s+\
> "(\w+\.scien.*)"/smg);
>
> print "$ip $mac $host\n";
>
> }
To keep the demonstration script short, I use a short regex that should be
more specific
#!/usr/bin/perl
use strict;
use warnings;
{
local $/="}\n";
for (<DATA> ) {
my ($ip,$mac,$host)=
/lease\s+(\S+).*
ethernet\s+(\S+);.*
hostname\s+(\S+);
/sx;
print "IP $ip - MAC $mac - HOST $host\n";
}
}
__DATA__
lease 196.222.237.209 {
starts 5 2007/02/23 17:53:57;
ends 6 2007/02/24 17:53:57;
binding state active;
next binding state free;
hardware ethernet 00:60:04:28:28:01;
client-hostname "lab.mydomain.com";
}
lease 196.222.237.209 {
starts 5 2007/02/23 17:53:57;
ends 6 2007/02/24 17:53:57;
binding state active;
next binding state free;
hardware ethernet 00:60:04:38:38:01;
client-hostname "lab.mydomain.com";
}
lease 196.222.237.195 {
starts 5 2007/02/23 17:54:04;
ends 6 2007/02/24 17:54:04;
binding state active;
next binding state free;
hardware ethernet 00:0c:c1:33:31:0d;
uid "\001\000\014\361\3231\015";
client-hostname "puck";
}
----8<----
IP 196.222.237.209 - MAC 00:60:04:28:28:01 - HOST "lab.mydomain.com"
IP 196.222.237.209 - MAC 00:60:04:38:38:01 - HOST "lab.mydomain.com"
IP 196.222.237.195 - MAC 00:0c:c1:33:31:0d - HOST "puck"
Dani
| |
| Beginner 2007-02-26, 7:00 pm |
| On 26 Feb 2007 at 15:58, D. Bolliger wrote:
> Beginner am Montag, 26. Februar 2007 14:50:
>
> Hi
>
> [moved to __DATA__ section below]
>
> used below :-)
>
>
> To keep the demonstration script short, I use a short regex that should be
> more specific
>
> #!/usr/bin/perl
>
> use strict;
> use warnings;
>
> {
> local $/="}\n";
> for (<DATA> ) {
> my ($ip,$mac,$host)=
> /lease\s+(\S+).*
> ethernet\s+(\S+);.*
> hostname\s+(\S+);
> /sx;
> print "IP $ip - MAC $mac - HOST $host\n";
> }
> }
>
> __DATA__
> lease 196.222.237.209 {
> starts 5 2007/02/23 17:53:57;
> ends 6 2007/02/24 17:53:57;
> binding state active;
> next binding state free;
> hardware ethernet 00:60:04:28:28:01;
> client-hostname "lab.mydomain.com";
.....
....
> Dani
>
Thanx Dani,
That's worked a treat. Just to complete the learning curve, where was
I going wrong?
Thanx,
Dp.
| |
| John W. Krahn 2007-02-26, 7:00 pm |
| Beginner wrote:
> On 26 Feb 2007 at 15:58, D. Bolliger wrote:
>
> That's worked a treat. Just to complete the learning curve, where was
> I going wrong?
> while (<FH> ) {
> chomp;
> ($ip,$mac,$host) = ($_ =~
> /lease\s+(\d{3}\.\d{3}\.\d{3}\.\d+)
IP addresses can contain one, two or three digits for each octet so \d{3} will
not match all addresses.
>. *thernet\s+(\d{2}:\d{2}:\d{2}:\d{2}:\d{2
}:\d{2})
MAC addresses consist of hexadecimal digits but \d only matches decimal digits.
>.*ostname\s+\"(\w+\.scien.*)"/smg);
None of the host names in your example contained the string 'scien'.
> print "$ip $mac $host\n";
>
> }
John
--
Perl isn't a toolbox, but a small machine shop where you can special-order
certain sorts of tools at low cost and in short order. -- Larry Wall
| |
| D. Bolliger 2007-02-26, 7:00 pm |
| Beginner am Montag, 26. Februar 2007 17:02:
> On 26 Feb 2007 at 15:58, D. Bolliger wrote:
Hi
[color=darkred]
[color=darkred]
[snipped]
[color=darkred]
> Thanx Dani,
>
> That's worked a treat. Just to complete the learning curve, where was
> I going wrong?
That should have been part of my first answer, sorry. I'll try and hope I
don't mess with the test versions and the english language :-)
Looking at the regex:
It only matches MACs without a-c hex digits (and there's no 'scien' string in
the data, but that's probably due to the altered hostnames, and I assume that
the \d{3} in the IP matching part is intended.).
In the script version with $/ = '':
This reads all records at once, and this seems to be the reason why you used
the /g modifier. But then, after matching the IP of the first lease, the
first .* will skip all leases except the last one for the mac/host matching.
So you (or I respectively) get a single wrong result with the IP of the
first lease and mac/host of the last lease (after correcting the mac regex
part).
So you would have to alter the .* into the non-greedy .*? (to avoid
skipping leases) and also append a .*? at the end of the regex to match what
is "between the leases" [argh!] (to avoid only one match).
But even if the regex is correct, and the ip/mac/host of every lease is
matched, only the first 3 captured matches get stored into ($ip,$mac,$host),
and the others are discarded (which would not be the case with
@data=/....../g).
When you tried with $/ = "}\n":
I don't know neither how the regex looked like in this case, nor what went
wrong then, so I can't say much.
Other notes:
The $_=~ is not necessary, because if the left side of =~ is missing, $_ is
used by default.
To read all leases, you could also do it like so to avoid the somehow
misleading while loop:
{
local $/; # sets $/ to undef
my $data=<FH>; # slurp entire file
# ...match with /g modifier...
}
chomp is not necessary.
Hope that covers the most important issues :-)
Dani
| |
| Beginner 2007-02-26, 7:00 pm |
| On 26 Feb 2007 at 18:13, D. Bolliger wrote:
> Beginner am Montag, 26. Februar 2007 17:02:
>
> Hi
>
...snip[color=darkred]
>
>
> That should have been part of my first answer, sorry. I'll try and hope I
> don't mess with the test versions and the english language :-)
>
> Looking at the regex:
>
> It only matches MACs without a-c hex digits (and there's no 'scien' string in
> the data, but that's probably due to the altered hostnames, and I assume that
> the \d{3} in the IP matching part is intended.).
>
> In the script version with $/ = '':
>
> This reads all records at once, and this seems to be the reason why you used
> the /g modifier. But then, after matching the IP of the first lease, the
> first .* will skip all leases except the last one for the mac/host matching.
> So you (or I respectively) get a single wrong result with the IP of the
> first lease and mac/host of the last lease (after correcting the mac regex
> part).
> So you would have to alter the .* into the non-greedy .*? (to avoid
> skipping leases) and also append a .*? at the end of the regex to match what
> is "between the leases" [argh!] (to avoid only one match).
>
> But even if the regex is correct, and the ip/mac/host of every lease is
> matched, only the first 3 captured matches get stored into ($ip,$mac,$host),
> and the others are discarded (which would not be the case with
> @data=/....../g).
>
> When you tried with $/ = "}\n":
>
> I don't know neither how the regex looked like in this case, nor what went
> wrong then, so I can't say much.
>
> Other notes:
>
> The $_=~ is not necessary, because if the left side of =~ is missing, $_ is
> used by default.
>
> To read all leases, you could also do it like so to avoid the somehow
> misleading while loop:
> {
> local $/; # sets $/ to undef
> my $data=<FH>; # slurp entire file
> # ...match with /g modifier...
> }
>
> chomp is not necessary.
>
> Hope that covers the most important issues :-)
>
> Dani
Thanx Dani (and John) that's a very complete reply with a lot of
useful stuff to bear in mind.
There were a couple of errors that slipped in after I tried to
ofuscate my domain and ip range but I was missing a great deal.
Thanx again.
Dp.
|
|
|
|
|