Home > Archive > PERL Miscellaneous > July 2005 > Extract lines by name
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Extract lines by name
|
|
| Fred Hare 2005-07-29, 5:03 pm |
| I have textfiles with 6000 lines (about 674k) and I want to print lines
in groups containing the same name. The following code (with 3 names)
works OK. But I actually use a script with 24 names and I would like to
find a way where I have to open and close the infile only once.
#!/usr/bin/perl
use strict ;
use warnings ;
my $infile = 'D:\aa\headers.txt';
my $outfile = 'D:\aa\outfile.txt' ;
open(OUT, "> $outfile")
or die "Could not open $outfile for writing: $!\n";
open(IN, "< $infile")
or die "Could not open $infile for reading: $!\n";
while(<IN> ){
if (m/adam/ig ) {
print OUT $_;
}
}
print OUT "\n" ;
close(IN) or die "could not close IN: $!";
open(IN, "< $infile")
or die "Could not open $infile for reading: $!\n";
while(<IN> ){
if (m/baker/ig ) {
print OUT $_;
}
}
print OUT "\n" ;
close(IN) or die "could not close IN: $!";
open(IN, "< $infile")
or die "Could not open $infile for reading: $!\n";
while(<IN> ){
if (m/Charly/ig ) {
print OUT $_;
}
}
print OUT "\n" ;
close(IN) or die "could not close IN: $!";
close(OUT) or die "could not close OUT: $!";
| |
| Paul Lalli 2005-07-29, 5:03 pm |
| Fred Hare wrote:
> I have textfiles with 6000 lines (about 674k) and I want to print lines
> in groups containing the same name. The following code (with 3 names)
> works OK. But I actually use a script with 24 names and I would like to
> find a way where I have to open and close the infile only once.
Is the specification really that the line "contains" the same name? As
in, anywhere in the line and you don't necessarily know where?
If that's true, I'd probably resort to something like this (using the
__DATA__ marker and STDOUT rather than the two filehandles - converting
it is an excercise for the reader):
#!/usr/bin/perl
use strict;
use warnings;
my %lines;
my @names = qw/adam baker charlie/;
for my $name (@names){
$lines{$name} = [];
}
while (<DATA> ){
for my $name (@names){
push @{$lines{$name}}, $_ if /$name/;
}
}
for my $name (@names){
print "===Lines with $name:===\n";
print @{$lines{$name}};
}
__DATA__
foo bar adam
charlie is good
how is baker?
adam and baker did well.
Hey there charlie? How's adam?
OUTPUT:
===Lines with adam:===
foo bar adam
adam and baker did well.
Hey there charlie? How's adam?
===Lines with baker:===
how is baker?
adam and baker did well.
===Lines with charlie:===
charlie is good
Hey there charlie? How's adam?
We create a hash, and create three positions in the hash, intializing
each to an empty array. (This initialization is necessary only if
we're considering the posibility that one or more names might not be
contained in any lines. If that's not true, it could safely be omitted
and autovivification would take care of it for us). We then loop
through each line of the file, and test to see if it should be in each
group. If it should, we add that line to the group (that is, the array
referenced by that name's position in the hash). Once we've read all
lines, we simply print out each group.
Note that this code assumes that a line may be in more than one group.
If this is not the case, you may wish to add a 'last;' statement in one
of the for loops. Consider that another excercise for the reader.
Paul Lalli
| |
| Jim Gibson 2005-07-29, 5:03 pm |
| In article <t_udnfoxIc47-3ffRVnyjw@giganews.com>, Fred Hare
<dreamer@cox.net> wrote:
> I have textfiles with 6000 lines (about 674k) and I want to print lines
> in groups containing the same name. The following code (with 3 names)
> works OK. But I actually use a script with 24 names and I would like to
> find a way where I have to open and close the infile only once.
>
>
[program snipped]
Store your lines arrays which are referred to by elements of a hash.
This will work as long as your text lines fit into memory:
#!/usr/local/bin/perl
#
use warnings;
use strict;
my %text;
while (<DATA> ) {
if( /(adam|baker|Charly)/ ) {
push( @{$text{$1}}, $_ );
}
}
for my $name ( keys %text ) {
print "\n$name:\n\n";
print @{$text{$name}};
}
__DATA__
line 1
line 2 adam
line 3
line 4 baker
line 5
line 6 Charly
line 7
line 8 baker
line 9
__OUTPUT__
adam:
line 2 adam
Charly:
line 6 Charly
baker:
line 4 baker
line 8 baker
Note: there is a way to generate the regex (adam|baker|Charly|...) from
a list of names, but I have forgotten what it is and cannot reconstruct
it.
----== Posted via Newsfeeds.Com - Unlimited-Uncensored-Secure Usenet News==----
http://www.newsfeeds.com The #1 Newsgroup Service in the World! 120,000+ Newsgroups
----= East and West-Coast Server Farms - Total Privacy via Encryption =----
| |
| Fred Hare 2005-07-29, 5:04 pm |
| Paul Lalli wrote:
> Fred Hare wrote:
>
>
>
> Is the specification really that the line "contains" the same name? As
> in, anywhere in the line and you don't necessarily know where?
>
> If that's true, I'd probably resort to something like this (using the
> __DATA__ marker and STDOUT rather than the two filehandles - converting
> it is an excercise for the reader):
>
> #!/usr/bin/perl
> use strict;
> use warnings;
>
> my %lines;
> my @names = qw/adam baker charlie/;
> for my $name (@names){
> $lines{$name} = [];
> }
> while (<DATA> ){
> for my $name (@names){
> push @{$lines{$name}}, $_ if /$name/;
> }
> }
>
> for my $name (@names){
> print "===Lines with $name:===\n";
> print @{$lines{$name}};
> }
>
> __DATA__
> foo bar adam
> charlie is good
> how is baker?
> adam and baker did well.
> Hey there charlie? How's adam?
>
> OUTPUT:
> ===Lines with adam:===
> foo bar adam
> adam and baker did well.
> Hey there charlie? How's adam?
> ===Lines with baker:===
> how is baker?
> adam and baker did well.
> ===Lines with charlie:===
> charlie is good
> Hey there charlie? How's adam?
>
> We create a hash, and create three positions in the hash, intializing
> each to an empty array. (This initialization is necessary only if
> we're considering the posibility that one or more names might not be
> contained in any lines. If that's not true, it could safely be omitted
> and autovivification would take care of it for us). We then loop
> through each line of the file, and test to see if it should be in each
> group. If it should, we add that line to the group (that is, the array
> referenced by that name's position in the hash). Once we've read all
> lines, we simply print out each group.
>
> Note that this code assumes that a line may be in more than one group.
> If this is not the case, you may wish to add a 'last;' statement in one
> of the for loops. Consider that another excercise for the reader.
>
> Paul Lalli
>
Paul: Thank you very much for the suggested code and for the
explanation. I changed my script, it works OK and much faster than before.
-- Fred Hare
| |
| Fred Hare 2005-07-29, 5:04 pm |
| Jim Gibson wrote:
> In article <t_udnfoxIc47-3ffRVnyjw@giganews.com>, Fred Hare
> <dreamer@cox.net> wrote:
>
>
>
> [program snipped]
>
> Store your lines arrays which are referred to by elements of a hash.
> This will work as long as your text lines fit into memory:
>
> #!/usr/local/bin/perl
> #
> use warnings;
> use strict;
> my %text;
> while (<DATA> ) {
> if( /(adam|baker|Charly)/ ) {
> push( @{$text{$1}}, $_ );
> }
> }
> for my $name ( keys %text ) {
> print "\n$name:\n\n";
> print @{$text{$name}};
> }
> __DATA__
> line 1
> line 2 adam
> line 3
> line 4 baker
> line 5
> line 6 Charly
> line 7
> line 8 baker
> line 9
>
> __OUTPUT__
>
> adam:
>
> line 2 adam
>
> Charly:
>
> line 6 Charly
>
> baker:
>
> line 4 baker
> line 8 baker
>
> Note: there is a way to generate the regex (adam|baker|Charly|...) from
> a list of names, but I have forgotten what it is and cannot reconstruct
> it.
>
Jim, I tried jour suggested code im my script and I got a Perl error
message:
Scalar value @{text{$1} better written as ${text{$1} at sort.pl line 18.
Type of arg 1 to push must be array (not hash slice) at sort.pl line 18,
near "$_) "
Execution of sort.pl aborted due to compilation errors.
I tried a few changes but could not get rid of the error messages. Since
the code proposed by Paul Lally works fine for me I will stay with
that code...
-- Fred Hare
| |
| Tad McClellan 2005-07-30, 9:01 am |
| Fred Hare <dreamer@cox.net> wrote:
> Jim Gibson wrote:
^^^^^^^^^^^
^^^^^^^^^^^ 2 open curlies, 2 close curlies
[color=darkred]
> Jim, I tried jour suggested code im my script
That may have been your intention, but I think you were not using
that exact code.
When you put it into your Perl program, did you copy/paste it,
or did you (attempt to) retype it?
> and I got a Perl error
> message:
>
> Scalar value @{text{$1} better written as ${text{$1} at sort.pl line 18.
^^^^^^^^^
^^^^^^^^^ 2 open curlies, *1* close curly!
(don't retype messages either, use copy/paste to avoid introducing typos.)
That one is not an error message. It is a warning message.
You can look up both error and warning messages in:
perldoc perldiag
> Type of arg 1 to push must be array (not hash slice) at sort.pl line 18,
> near "$_) "
To help fix a syntax error, we will need to see the *exact* code that
produced the error.
If you post a short and complete program that we can run that demonstrates
your problem, someone here can surely help you fix the problem.
> I tried a few changes
Try it _without_ changes first.
Jim's code works fine for me.
Have you seen the Posting Guidelines that are posted here frequently?
--
Tad McClellan SGML consulting
tadmc@augustmail.com Perl programming
Fort Worth, Texas
| |
| Fred Hare 2005-07-30, 5:00 pm |
| Tad McClellan wrote:
> Fred Hare <dreamer@cox.net> wrote:
>
>
>
>
> ^^^^^^^^^^^
> ^^^^^^^^^^^ 2 open curlies, 2 close curlies
>
>
>
>
>
> That may have been your intention, but I think you were not using
> that exact code.
>
> When you put it into your Perl program, did you copy/paste it,
> or did you (attempt to) retype it?
>
>
>
>
> ^^^^^^^^^
> ^^^^^^^^^ 2 open curlies, *1* close curly!
>
> (don't retype messages either, use copy/paste to avoid introducing typos.)
>
>
> That one is not an error message. It is a warning message.
>
> You can look up both error and warning messages in:
>
> perldoc perldiag
>
>
>
>
>
>
> To help fix a syntax error, we will need to see the *exact* code that
> produced the error.
>
> If you post a short and complete program that we can run that demonstrates
> your problem, someone here can surely help you fix the problem.
>
>
>
>
>
>
> Try it _without_ changes first.
>
> Jim's code works fine for me.
>
>
>
> Have you seen the Posting Guidelines that are posted here frequently?
>
Very embarassing - I typed where I should have pasted ...
Jim's code works fine as posted and also when used in my script
-- Fred
|
|
|
|
|