Home > Archive > PERL Beginners > June 2007 > grep from one file and write to another
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
grep from one file and write to another
|
|
| Vahid Moghaddasi 2007-06-23, 10:02 pm |
| Hi all,
I am trying to read a colon delimited text file (filter.in) then
search for each field in another file (/etc/passwd) and if it is found
then write that line in the third file (passwd.out). Here is what I
have written so far but it is not given me the correct result. Thanks
for any help.
#!/bin/perl
#
# the format of filter.in is user1:user2:user3:user4:
#
use File::Copy;
use strict;
use warnings;
$|=1; # flush output buffer
open (FILTERfh, "< filter.in") || die "Can not open filter.in: $!\n";
open PASSWDfh, '</etc/passwd' or die "Can not open the file: $!\n";
open PASSWDFILfh, ">passwd.out";
while (<FILTERfh> ) {
chomp;
my @input = split /:/, $_;
for (my $user = 1; $user <= $#input ; $user++) {
print "$input[$user] is being added.\n";
while (<PASSWDfh> ) {
my %seen;
next if (m/^#/); # Skip comments
next if (m/^\s*$/); # Skip blank lines
my ($field1) = /([^:]+)/;
# print PASSWDFILfh $_ unless $seen{$field1} or warn
# "WARNING: User $input[$user] does not exist!\n";
print PASSWDFILfh $_ unless $input[$user] or warn
"WARNING: User $input[$user] does not exist!\n";
# print PASSWDFILfh $_ if("$field1" eq "$input[$user]");
# print PASSWDFILfh $_ if( grep(/$field1:/, $_ )) or warn
"WARNING: User $input[$user] does not exist!\n";
} # while
} # for
} # while
close FILTERfh;
close PASSWDFILfh;
close PASSWDfh;
| |
| Tom Phoenix 2007-06-23, 10:02 pm |
| On 6/23/07, Vahid Moghaddasi <vahid.moghaddasi@gmail.com> wrote:
> I am trying to read a colon delimited text file (filter.in) then
> search for each field in another file (/etc/passwd) and if it is found
> then write that line in the third file (passwd.out).
> use File::Copy;
Are you actually using File::Copy? I didn't find any call to it in
your posted code.
> use strict;
> use warnings;
That's good....
> $|=1; # flush output buffer
> open (FILTERfh, "< filter.in") || die "Can not open filter.in: $!\n";
> open PASSWDfh, '</etc/passwd' or die "Can not open the file: $!\n";
> open PASSWDFILfh, ">passwd.out";
I can't say that I like the style of having filehandle names ending in
"fh", but it's one way to do it. But please standardize the way you
open files; I'd adopt a style most like the second one. By the way,
the "output buffer" that your comment refers to is the buffer for
STDOUT. Is somebody waiting for the output on STDOUT? You didn't
mention that in the task description.
> while (<FILTERfh> ) {
> chomp;
> my @input = split /:/, $_;
> for (my $user = 1; $user <= $#input ; $user++) {
Although you may use the three-part for loop to do this, you'll be
more likely to get it correct if you use a range instead:
for my $user (0..$#input) { # Not 1..$#input, is it?
And, unless you needed an index, you'll be even more likely to get it
correct if you use a foreach directly on the array:
for my $user (@input) { # Now $user is the user, not the index
> print "$input[$user] is being added.\n";
> while (<PASSWDfh> ) {
Now you're reading one file in a loop, inside the loop on FILTERfh. Do
you mean to re-read the password file for every line in the outer
loop's file? That sounds slow, but you could do it. (You'll either
need to reopen the file, or use s () to get back to the start.)
A better algorithm would read through the entire FILTERfh datastream,
storing away what you'll need. Later, when you read the password file
in a single pass, you can compare the data in memory to the data read
from the file.
Does that get you closer to a solution? Good luck with it!
--Tom Phoenix
Stonehenge Perl Training
| |
| Vahid Moghaddasi 2007-06-24, 7:59 am |
| On 6/23/07, Tom Phoenix <tom@stonehenge.com> wrote:
>
> Are you actually using File::Copy? I didn't find any call to it in
> your posted code.
Sorry, I left it in by mistake. This code is a small part of a very
large program.
>
>
> That's good....
>
>
> I can't say that I like the style of having filehandle names ending in
> "fh", but it's one way to do it.
I din't want to make mistakes, maybe later I change.
The actual program writes to STDOUT as well.
>
> for my $user (0..$#input) { # Not 1..$#input, is it?
If that gives me the same result, then it is less typing. I use field
0 for something else.
>
> And, unless you needed an index, you'll be even more likely to get it
> correct if you use a foreach directly on the array:
>
> for my $user (@input) { # Now $user is the user, not the index
>
>
> Now you're reading one file in a loop, inside the loop on FILTERfh. Do
> you mean to re-read the password file for every line in the outer
> loop's file? That sounds slow, but you could do it.
For each field (user) in the filter.in file, I will have to find the
user in passwd file, wouldn't I need to re-read the passwd file as
much as there are fields in filter.in file?
> (You'll either need to reopen the file, or use s () to get back to the start.)
>
> A better algorithm would read through the entire FILTERfh datastream,
> storing away what you'll need. Later, when you read the password file
> in a single pass, you can compare the data in memory to the data read
> from the file.
I am not sure how much I can read into memory space without affecting
other programs but the entire FILTERfh could be a pretty large. Each
line could have up to 100 fields (users) and there could be 3 or 5
lines. How would I read them into memory? In an array?
>
> Does that get you closer to a solution? Good luck with it!
>
Hope so. Thanks.
| |
| Tom Phoenix 2007-06-24, 7:59 am |
| On 6/23/07, Vahid Moghaddasi <vahid.moghaddasi@gmail.com> wrote:
> For each field (user) in the filter.in file, I will have to find the
> user in passwd file, wouldn't I need to re-read the passwd file as
> much as there are fields in filter.in file?
Probably not. For one solution, you might be able to use getpwnam to
get the information for each username individually, so you never need
to read from the password file at all.
my($name, $passwd, $uid, $gid,
$quota, $comment, $gcos, $dir, $shell, $expire)
= getpwnam($username);
But maybe you need the actual password file.
> I am not sure how much I can read into memory space without affecting
> other programs but the entire FILTERfh could be a pretty large. Each
> line could have up to 100 fields (users) and there could be 3 or 5
> lines. How would I read them into memory? In an array?
300 to 500 usernames? If every username is meant to be unique, this
sounds like it's asking to be a hash. A hash with hundreds of
key-value pairs is easy for Perl to handle, so unless each value is
very large you shouldn't have memory issues.
Once you've built the hash, you can traverse the password file (or use
getpwent) and quickly identify the matching usernames from the hash.
Cheers!
--Tom Phoenix
Stonehenge Perl Training
| |
| John W. Krahn 2007-06-24, 7:59 am |
| Vahid Moghaddasi wrote:
> Hi all,
Hello,
> I am trying to read a colon delimited text file (filter.in) then
> search for each field in another file (/etc/passwd) and if it is found
> then write that line in the third file (passwd.out). Here is what I
> have written so far but it is not given me the correct result. Thanks
> for any help.
>
>
> #!/bin/perl
> #
> # the format of filter.in is user1:user2:user3:user4:
> #
> use File::Copy;
> use strict;
> use warnings;
> $|=1; # flush output buffer
> open (FILTERfh, "< filter.in") || die "Can not open filter.in: $!\n";
> open PASSWDfh, '</etc/passwd' or die "Can not open the file: $!\n";
> open PASSWDFILfh, ">passwd.out";
> while (<FILTERfh> ) {
> chomp;
> my @input = split /:/, $_;
> for (my $user = 1; $user <= $#input ; $user++) {
> print "$input[$user] is being added.\n";
> while (<PASSWDfh> ) {
> my %seen;
> next if (m/^#/); # Skip comments
> next if (m/^\s*$/); # Skip blank lines
> my ($field1) = /([^:]+)/;
> # print PASSWDFILfh $_ unless $seen{$field1} or warn
> # "WARNING: User $input[$user] does not exist!\n";
> print PASSWDFILfh $_ unless $input[$user] or warn
> "WARNING: User $input[$user] does not exist!\n";
> # print PASSWDFILfh $_ if("$field1" eq "$input[$user]");
> # print PASSWDFILfh $_ if( grep(/$field1:/, $_ )) or warn
> "WARNING: User $input[$user] does not exist!\n";
> } # while
> } # for
> } # while
> close FILTERfh;
> close PASSWDFILfh;
> close PASSWDfh;
Something like this should work:
#!/bin/perl
#
# the format of filter.in is user1:user2:user3:user4:
#
use strict;
use warnings;
open FILTERfh, '<', 'filter.in' or die "Can not open filter.in: $!\n";
open PASSWDFILfh, '>', 'passwd.out' or die "Can not open passwd.out: $!\n";
while ( <FILTERfh> ) {
chomp;
my @input = split /:/;
for ( @input ) {
if ( getpwnam $_ ) {
print PASSWDFILfh $_;
}
else {
warn "WARNING: User $_ does not exist!\n";
}
}
}
close FILTERfh;
close PASSWDFILfh;
John
--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order. -- Larry Wall
| |
| Vahid Moghaddasi 2007-06-24, 6:59 pm |
| On 6/24/07, Tom Phoenix <tom@stonehenge.com> wrote:
> On 6/23/07, Vahid Moghaddasi <vahid.moghaddasi@gmail.com> wrote:
>
>
> But maybe you need the actual password file.
>
You got it, I have to read /etc/passwd file only.
>
> 300 to 500 usernames? If every username is meant to be unique, this
> sounds like it's asking to be a hash. A hash with hundreds of
> key-value pairs is easy for Perl to handle, so unless each value is
> very large you shouldn't have memory issues.
The actual password file is over 10000 lines but I need to find at
most maybe 200 users in it and dump the line in another file.
Thanks.
| |
|
|
|
|
|