Home > Archive > PERL Beginners > July 2006 > RE: reading input file, sorting then writing output file
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
RE: reading input file, sorting then writing output file
|
|
| Macromedia 2006-07-25, 9:57 pm |
|
Hi,
I can't seem to get my script to sort properly. Below is my code along with a sample input.txt file. I also have what the output.txt file should look like. Also note any duplicate should be striped out
which seems to work ok.
Something is getting messed up when I have the numerials along with alpha etc. I can't seem to get the results of the OUTPUT.TXT file below.
<code>
#!/usr/bin/perl -w
require 5.000;
use warnings;
use strict;
use POSIX;
my %tags = ();
my $input = $ARGV[0];
my $output = $ARGV[1];
open (FILE, "< $input") or die "cannot open $input: $!\n";
while (my $tag = <FILE> ) {
$tag =~ m/<tag id=(\w+)>/;
$tags{$1} = $tag;
}
open (NEWFILE, "> $output");
foreach my $id ( map { $_->[0] }
sort { $a->[0] cmp $b->[0] || $a->[7] <=> $b->[7] }
map { [ $_, ( isdigit( $_ ) ? $_ : 0 ) ] }
keys %tags )
{
print NEWFILE $tags{$id};
close NEWFILE;
close FILE;
</code>
INPUT.TXT file
------------------------
<tag id=1>Test.</tag>
<tag id=16ab>Test.</tag>
<tag id=aa>Test.</tag>
<tag id=16zz>Test.</tag>
<tag id=39a>Test.</tag>
<tag id=cc>Test.</tag>
<tag id=de>Test.</tag>
<tag id=16bc>Test.</tag>
<tag id=zz>Test..</tag>
<tag id=2>Test.</tag>
<tag id=3>Test.</tag>
<tag id=4>Test.</tag>
<tag id=5>Test.</tag>
<tag id=5a>Test.</tag>
<tag id=5za>Test.</tag>
<tag id=6>Test.</tag>
<tag id=40>Test.</tag>
<tag id=41>Test.</tag>
<tag id=40>Test.</tag>
<tag id=45>Test.</tag>
<tag id=10ba>Test.</tag>
<tag id=15xx>Test.</tag>
<tag id=cc>Test..</tag>
<tag id=ff>Test..</tag>
<tag id=50>Test.</tag>
<tag id=54>Test.</tag>
<tag id=7>Test.</tag>
<tag id=8>Test.</tag>
<tag id=16yy>Test.</tag>
<tag id=16ya>Test.</tag>
OUTPUT.TXT file
-----------------------------
<tag id=1>Test.</tag>
<tag id=2>Test.</tag>
<tag id=3>Test.</tag>
<tag id=4>Test.</tag>
<tag id=5>Test.</tag>
<tag id=5a>Test.</tag>
<tag id=5za>Test.</tag>
<tag id=6>Test.</tag>
<tag id=7>Test.</tag>
<tag id=8>Test.</tag>
<tag id=10ba>Test.</tag
<tag id=15xx>Test.</tag
<tag id=16ab>Test.</tag
<tag id=16bc>Test.</tag
<tag id=16ya>Test.</tag
<tag id=16yy>Test.</tag
<tag id=16zz>Test.</tag
<tag id=39a>Test.</tag>
<tag id=40>Test.</tag>
<tag id=41>Test.</tag>
<tag id=45>Test.</tag>
<tag id=50>Test.</tag>
<tag id=54>Test.</tag>
<tag id=aa>Test.</tag>
<tag id=cc>Test..</tag>
<tag id=de>Test.</tag>
<tag id=ff>Test..</tag>
<tag id=zz>Test..</tag>
| |
| Macromedia 2006-07-25, 9:57 pm |
|
Opps.
Once you save my code to a file the syntax woiuld be:
perl mycode.pl input.txt output.txt
Below is what I get now when I run my code on the input.txt file. As you can see its not sorting the way I like. It should sort like the OUTPUT.TXT file inmy previsou email.
<tag id=1>Test.</tag>
<tag id=10ba>Test.</tag
<tag id=15xx>Test.</tag
<tag id=16ab>Test.</tag
<tag id=16bc>Test.</tag
<tag id=16ya>Test.</tag
<tag id=16yy>Test.</tag
<tag id=16zz>Test.</tag
<tag id=2>Test.</tag>
<tag id=3>Test.</tag>
<tag id=39a>Test.</tag>
<tag id=4>Test.</tag>
<tag id=40>Test.</tag>
<tag id=41>Test.</tag>
<tag id=45>Test.</tag>
<tag id=5>Test.</tag>
<tag id=50>Test.</tag>
<tag id=54>Test.</tag>
<tag id=5a>Test.</tag>
<tag id=5za>Test.</tag>
<tag id=6>Test.</tag>
<tag id=7>Test.</tag>
<tag id=8>Test.</tag>
<tag id=aa>Test.</tag>
<tag id=cc>Test..</tag>
<tag id=de>Test.</tag>
<tag id=ff>Test..</tag>
<tag id=zz>Test..</tag>
On Tue, 25 Jul 2006 21:32:26 -0400, macromedia wrote:
>Hi,
>I can't seem to get my script to sort properly. Below is my code along with a sample input.txt file. I also have what the output.txt file should look like. Also note any duplicate should be striped out
>which seems to work ok.
>Something is getting messed up when I have the numerials along with alpha etc. I can't seem to get the results of the OUTPUT.TXT file below.
><code>
>#!/usr/bin/perl -w
>require 5.000;
>use warnings;
>use strict;
>use POSIX;
>my %tags = ();
>my $input = $ARGV[0];
>my $output = $ARGV[1];
>open (FILE, "< $input") or die "cannot open $input: $!\n";
> while (my $tag = <FILE> ) {
> $tag =~ m/<tag id=(\w+)>/;
> $tags{$1} = $tag;
> }
> open (NEWFILE, "> $output");
> foreach my $id ( map { $_->[0] }
> sort { $a->[0] cmp $b->[0] || $a->[7] <=> $b->[7] }
> map { [ $_, ( isdigit( $_ ) ? $_ : 0 ) ] }
> keys %tags )
> {
> print NEWFILE $tags{$id};
>close NEWFILE;
>close FILE;
></code>
>INPUT.TXT file
>------------------------
><tag id=1>Test.</tag>
><tag id=16ab>Test.</tag>
><tag id=aa>Test.</tag>
><tag id=16zz>Test.</tag>
><tag id=39a>Test.</tag>
><tag id=cc>Test.</tag>
><tag id=de>Test.</tag>
><tag id=16bc>Test.</tag>
><tag id=zz>Test..</tag>
><tag id=2>Test.</tag>
><tag id=3>Test.</tag>
><tag id=4>Test.</tag>
><tag id=5>Test.</tag>
><tag id=5a>Test.</tag>
><tag id=5za>Test.</tag>
><tag id=6>Test.</tag>
><tag id=40>Test.</tag>
><tag id=41>Test.</tag>
><tag id=40>Test.</tag>
><tag id=45>Test.</tag>
><tag id=10ba>Test.</tag>
><tag id=15xx>Test.</tag>
><tag id=cc>Test..</tag>
><tag id=ff>Test..</tag>
><tag id=50>Test.</tag>
><tag id=54>Test.</tag>
><tag id=7>Test.</tag>
><tag id=8>Test.</tag>
><tag id=16yy>Test.</tag>
><tag id=16ya>Test.</tag>
>OUTPUT.TXT file
>-----------------------------
><tag id=1>Test.</tag>
><tag id=2>Test.</tag>
><tag id=3>Test.</tag>
><tag id=4>Test.</tag>
><tag id=5>Test.</tag>
><tag id=5a>Test.</tag>
><tag id=5za>Test.</tag>
><tag id=6>Test.</tag>
><tag id=7>Test.</tag>
><tag id=8>Test.</tag>
><tag id=10ba>Test.</tag
><tag id=15xx>Test.</tag
><tag id=16ab>Test.</tag
><tag id=16bc>Test.</tag
><tag id=16ya>Test.</tag
><tag id=16yy>Test.</tag
><tag id=16zz>Test.</tag
><tag id=39a>Test.</tag>
><tag id=40>Test.</tag>
><tag id=41>Test.</tag>
><tag id=45>Test.</tag>
><tag id=50>Test.</tag>
><tag id=54>Test.</tag>
><tag id=aa>Test.</tag>
><tag id=cc>Test..</tag>
><tag id=de>Test.</tag>
><tag id=ff>Test..</tag>
><tag id=zz>Test..</tag>
>--
>To unsubscribe, e-mail: beginners-unsubscribe@perl.org
>For additional commands, e-mail: beginners-help@perl.org
><http://learn.perl.org/> <http://learn.perl.org/first-response>
| |
| Mumia W. 2006-07-26, 3:57 am |
| On 07/25/2006 08:32 PM, macromedia wrote:
> Hi,
>
> I can't seem to get my script to sort properly. Below
> is my code [...]
> sort { $a->[0] cmp $b->[0] || $a->[7] <=> $b->[7] }
> [...]
"Cmp" does string comparisons. Use "<=>" for numeric
comparisons. Read "perldoc perlop".
| |
| Uri Guttman 2006-07-26, 3:57 am |
| >>>>> "DS" == DJ Stunks <DJStunks@gmail.com> writes:
DS> My First Sort::Maker, by Jake Peavy.
i feel like a proud papa!
DS> make_sorter( ST => 1,
DS> name => 'tag_sort',
DS> init_code => 'my ($num,$str);',
DS> number => 'do{
DS> ($num,$str) = m{^<tag id=(\d*)([a-z]*)>};
DS> $num eq q{}
DS> }',
DS> number => '$num',
DS> string => '$str',
DS> ) or die "unable to make_sorter: $@\n";
an excellent use of init_code. at first i didn't study the data so i
wondered why the first key was $num eq q{} but it makes sense to me now.
DS> print for tag_sort( <DATA> );
why the for? print works fine with the sorted list.
i may borrow this for the docs as it shows an interesting use of
init_code.
hopefully the OP will understand it and why it is easier to code sorts
by key extraction than by explicit comparisons.
uri
--
Uri Guttman ------ uri@stemsystems.com -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
| |
| John W. Krahn 2006-07-26, 7:56 am |
| macromedia wrote:
> Hi,
Hello,
> I can't seem to get my script to sort properly.
Yes, it is a bit tricky to get right.
> Below is my code along with
> a sample input.txt file. I also have what the output.txt file should look
> like. Also note any duplicate should be striped out
> which seems to work ok.
>
> Something is getting messed up when I have the numerials along with alpha
> etc. I can't seem to get the results of the OUTPUT.TXT file below.
>
> #!/usr/bin/perl -w
>
> require 5.000;
>
> use warnings;
> use strict;
> use POSIX;
>
> my %tags = ();
>
> my $input = $ARGV[0];
> my $output = $ARGV[1];
>
> open (FILE, "< $input") or die "cannot open $input: $!\n";
> while (my $tag = <FILE> ) {
> $tag =~ m/<tag id=(\w+)>/;
> $tags{$1} = $tag;
You shouldn't use $1 unless you are sure the pattern matched.
> }
> open (NEWFILE, "> $output");
You should verify that this file opened correctly too.
open (NEWFILE, "> $output") or die "cannot open $output: $!\n";
> foreach my $id ( map { $_->[0] }
> sort { $a->[0] cmp $b->[0] || $a->[7] <=> $b->[7] }
> map { [ $_, ( isdigit( $_ ) ? $_ : 0 ) ] }
> keys %tags )
> {
> print NEWFILE $tags{$id};
Missing a closing } here.
This appears to do what you want:
while ( my $tag = <FILE> ) {
next unless $tag =~ /<tag id=(\d*)([^>]*)>/;
$tags{ sprintf '%04d%s', $1 || 9999, $2 } = $tag;
# if you are expecting numbers larger than 9999 then
# use a larger constant and a larger sprintf format
}
open (NEWFILE, "> $output") or die "cannot open $output: $!\n";
foreach my $id ( sort keys %tags ) {
print NEWFILE $tags{ $id };er
}
John
--
use Perl;
program
fulfillment
| |
| Mumia W. 2006-07-26, 6:57 pm |
| On 07/25/2006 10:07 PM, Mumia W. wrote:
> On 07/25/2006 08:32 PM, macromedia wrote:
>
> "Cmp" does string comparisons. Use "<=>" for numeric
> comparisons. Read "perldoc perlop".
>
No, this doesn't quite do it. I see your data is a little more
complicated than what I thought before. When you have unusual
sorting requirements, you need unusual sorting keys. I decided
to use sprintf() to give me keys that are formatted perfectly
for sorting:
require 5.000;
use warnings;
use strict;
use POSIX;
my %tags = ();
my $input = 'sort_tags.dat';
my $output = 'sort_tags.out';
open (FILE, "< $input") or die "cannot open $input: $!\n";
while (my $tag = <FILE> ) {
$tag =~ m/<tag id=(\d*)([[:alpha:]]*)>/;
$tags{sprintf("%04d%6s",$1 || 999,$2)} = $tag;
}
open (NEWFILE, "> $output");
foreach my $id ( sort keys %tags )
{
print NEWFILE $tags{$id};
}
close NEWFILE;
close FILE;
| |
| Mumia W. 2006-07-26, 6:57 pm |
| On 07/25/2006 08:32 PM, macromedia wrote:
> Hi,
>
> I can't seem to get my script to sort properly. [...]
Here is a shortened version of your program:
use strict;
use warnings;
use File::Slurp;
my %tags;
foreach my $line (read_file 'sort_tags.dat') {
if ($line =~ m/id=(\d*)([[:alpha:]]*)/) {
$tags{sprintf("%04d%6s",$1 || 999,$2)} = $line;
}
}
write_file('sort_tags.out',
map $tags{$_}, sort keys %tags);
__END__
As in the other program, I used sprintf() to create
easily-sorted keys.
|
|
|
|
|