For Programmers: Free Programming Magazines  


Home > Archive > PERL Beginners > July 2006 > RE: reading input file, sorting then writing output file









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author RE: reading input file, sorting then writing output file
Macromedia

2006-07-25, 9:57 pm


Hi,

I can't seem to get my script to sort properly. Below is my code along with a sample input.txt file. I also have what the output.txt file should look like. Also note any duplicate should be striped out
which seems to work ok.

Something is getting messed up when I have the numerials along with alpha etc. I can't seem to get the results of the OUTPUT.TXT file below.




<code>

#!/usr/bin/perl -w

require 5.000;

use warnings;
use strict;
use POSIX;

my %tags = ();

my $input = $ARGV[0];
my $output = $ARGV[1];

open (FILE, "< $input") or die "cannot open $input: $!\n";
while (my $tag = <FILE> ) {
$tag =~ m/<tag id=(\w+)>/;
$tags{$1} = $tag;
}
open (NEWFILE, "> $output");
foreach my $id ( map { $_->[0] }
sort { $a->[0] cmp $b->[0] || $a->[7] <=> $b->[7] }
map { [ $_, ( isdigit( $_ ) ? $_ : 0 ) ] }
keys %tags )
{
print NEWFILE $tags{$id};


close NEWFILE;
close FILE;

</code>


INPUT.TXT file
------------------------

<tag id=1>Test.</tag>
<tag id=16ab>Test.</tag>
<tag id=aa>Test.</tag>
<tag id=16zz>Test.</tag>
<tag id=39a>Test.</tag>
<tag id=cc>Test.</tag>
<tag id=de>Test.</tag>
<tag id=16bc>Test.</tag>
<tag id=zz>Test..</tag>
<tag id=2>Test.</tag>
<tag id=3>Test.</tag>
<tag id=4>Test.</tag>
<tag id=5>Test.</tag>
<tag id=5a>Test.</tag>
<tag id=5za>Test.</tag>
<tag id=6>Test.</tag>
<tag id=40>Test.</tag>
<tag id=41>Test.</tag>
<tag id=40>Test.</tag>
<tag id=45>Test.</tag>
<tag id=10ba>Test.</tag>
<tag id=15xx>Test.</tag>
<tag id=cc>Test..</tag>
<tag id=ff>Test..</tag>
<tag id=50>Test.</tag>
<tag id=54>Test.</tag>
<tag id=7>Test.</tag>
<tag id=8>Test.</tag>
<tag id=16yy>Test.</tag>
<tag id=16ya>Test.</tag>


OUTPUT.TXT file
-----------------------------

<tag id=1>Test.</tag>
<tag id=2>Test.</tag>
<tag id=3>Test.</tag>
<tag id=4>Test.</tag>
<tag id=5>Test.</tag>
<tag id=5a>Test.</tag>
<tag id=5za>Test.</tag>
<tag id=6>Test.</tag>
<tag id=7>Test.</tag>
<tag id=8>Test.</tag>
<tag id=10ba>Test.</tag
<tag id=15xx>Test.</tag
<tag id=16ab>Test.</tag
<tag id=16bc>Test.</tag
<tag id=16ya>Test.</tag
<tag id=16yy>Test.</tag
<tag id=16zz>Test.</tag
<tag id=39a>Test.</tag>
<tag id=40>Test.</tag>
<tag id=41>Test.</tag>
<tag id=45>Test.</tag>
<tag id=50>Test.</tag>
<tag id=54>Test.</tag>
<tag id=aa>Test.</tag>
<tag id=cc>Test..</tag>
<tag id=de>Test.</tag>
<tag id=ff>Test..</tag>
<tag id=zz>Test..</tag>





Macromedia

2006-07-25, 9:57 pm


Opps.

Once you save my code to a file the syntax woiuld be:


perl mycode.pl input.txt output.txt


Below is what I get now when I run my code on the input.txt file. As you can see its not sorting the way I like. It should sort like the OUTPUT.TXT file inmy previsou email.

<tag id=1>Test.</tag>
<tag id=10ba>Test.</tag
<tag id=15xx>Test.</tag
<tag id=16ab>Test.</tag
<tag id=16bc>Test.</tag
<tag id=16ya>Test.</tag
<tag id=16yy>Test.</tag
<tag id=16zz>Test.</tag
<tag id=2>Test.</tag>
<tag id=3>Test.</tag>
<tag id=39a>Test.</tag>
<tag id=4>Test.</tag>
<tag id=40>Test.</tag>
<tag id=41>Test.</tag>
<tag id=45>Test.</tag>
<tag id=5>Test.</tag>
<tag id=50>Test.</tag>
<tag id=54>Test.</tag>
<tag id=5a>Test.</tag>
<tag id=5za>Test.</tag>
<tag id=6>Test.</tag>
<tag id=7>Test.</tag>
<tag id=8>Test.</tag>
<tag id=aa>Test.</tag>
<tag id=cc>Test..</tag>
<tag id=de>Test.</tag>
<tag id=ff>Test..</tag>
<tag id=zz>Test..</tag>


On Tue, 25 Jul 2006 21:32:26 -0400, macromedia wrote:


>Hi,


>I can't seem to get my script to sort properly. Below is my code along with a sample input.txt file. I also have what the output.txt file should look like. Also note any duplicate should be striped out
>which seems to work ok.


>Something is getting messed up when I have the numerials along with alpha etc. I can't seem to get the results of the OUTPUT.TXT file below.





><code>


>#!/usr/bin/perl -w


>require 5.000;


>use warnings;
>use strict;
>use POSIX;


>my %tags = ();


>my $input = $ARGV[0];
>my $output = $ARGV[1];


>open (FILE, "< $input") or die "cannot open $input: $!\n";
> while (my $tag = <FILE> ) {
> $tag =~ m/<tag id=(\w+)>/;
> $tags{$1} = $tag;
> }
> open (NEWFILE, "> $output");
> foreach my $id ( map { $_->[0] }
> sort { $a->[0] cmp $b->[0] || $a->[7] <=> $b->[7] }
> map { [ $_, ( isdigit( $_ ) ? $_ : 0 ) ] }
> keys %tags )
> {
> print NEWFILE $tags{$id};



>close NEWFILE;
>close FILE;


></code>



>INPUT.TXT file
>------------------------


><tag id=1>Test.</tag>
><tag id=16ab>Test.</tag>
><tag id=aa>Test.</tag>
><tag id=16zz>Test.</tag>
><tag id=39a>Test.</tag>
><tag id=cc>Test.</tag>
><tag id=de>Test.</tag>
><tag id=16bc>Test.</tag>
><tag id=zz>Test..</tag>
><tag id=2>Test.</tag>
><tag id=3>Test.</tag>
><tag id=4>Test.</tag>
><tag id=5>Test.</tag>
><tag id=5a>Test.</tag>
><tag id=5za>Test.</tag>
><tag id=6>Test.</tag>
><tag id=40>Test.</tag>
><tag id=41>Test.</tag>
><tag id=40>Test.</tag>
><tag id=45>Test.</tag>
><tag id=10ba>Test.</tag>
><tag id=15xx>Test.</tag>
><tag id=cc>Test..</tag>
><tag id=ff>Test..</tag>
><tag id=50>Test.</tag>
><tag id=54>Test.</tag>
><tag id=7>Test.</tag>
><tag id=8>Test.</tag>
><tag id=16yy>Test.</tag>
><tag id=16ya>Test.</tag>



>OUTPUT.TXT file
>-----------------------------


><tag id=1>Test.</tag>
><tag id=2>Test.</tag>
><tag id=3>Test.</tag>
><tag id=4>Test.</tag>
><tag id=5>Test.</tag>
><tag id=5a>Test.</tag>
><tag id=5za>Test.</tag>
><tag id=6>Test.</tag>
><tag id=7>Test.</tag>
><tag id=8>Test.</tag>
><tag id=10ba>Test.</tag
><tag id=15xx>Test.</tag
><tag id=16ab>Test.</tag
><tag id=16bc>Test.</tag
><tag id=16ya>Test.</tag
><tag id=16yy>Test.</tag
><tag id=16zz>Test.</tag
><tag id=39a>Test.</tag>
><tag id=40>Test.</tag>
><tag id=41>Test.</tag>
><tag id=45>Test.</tag>
><tag id=50>Test.</tag>
><tag id=54>Test.</tag>
><tag id=aa>Test.</tag>
><tag id=cc>Test..</tag>
><tag id=de>Test.</tag>
><tag id=ff>Test..</tag>
><tag id=zz>Test..</tag>







>--
>To unsubscribe, e-mail: beginners-unsubscribe@perl.org
>For additional commands, e-mail: beginners-help@perl.org
><http://learn.perl.org/> <http://learn.perl.org/first-response>





Mumia W.

2006-07-26, 3:57 am

On 07/25/2006 08:32 PM, macromedia wrote:
> Hi,
>
> I can't seem to get my script to sort properly. Below
> is my code [...]
> sort { $a->[0] cmp $b->[0] || $a->[7] <=> $b->[7] }
> [...]


"Cmp" does string comparisons. Use "<=>" for numeric
comparisons. Read "perldoc perlop".






Uri Guttman

2006-07-26, 3:57 am

>>>>> "DS" == DJ Stunks <DJStunks@gmail.com> writes:

DS> My First Sort::Maker, by Jake Peavy.

i feel like a proud papa!

DS> make_sorter( ST => 1,
DS> name => 'tag_sort',
DS> init_code => 'my ($num,$str);',
DS> number => 'do{
DS> ($num,$str) = m{^<tag id=(\d*)([a-z]*)>};
DS> $num eq q{}
DS> }',
DS> number => '$num',
DS> string => '$str',
DS> ) or die "unable to make_sorter: $@\n";

an excellent use of init_code. at first i didn't study the data so i
wondered why the first key was $num eq q{} but it makes sense to me now.

DS> print for tag_sort( <DATA> );

why the for? print works fine with the sorted list.

i may borrow this for the docs as it shows an interesting use of
init_code.


hopefully the OP will understand it and why it is easier to code sorts
by key extraction than by explicit comparisons.

uri

--
Uri Guttman ------ uri@stemsystems.com -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
John W. Krahn

2006-07-26, 7:56 am

macromedia wrote:
> Hi,


Hello,

> I can't seem to get my script to sort properly.


Yes, it is a bit tricky to get right.

> Below is my code along with
> a sample input.txt file. I also have what the output.txt file should look
> like. Also note any duplicate should be striped out
> which seems to work ok.
>
> Something is getting messed up when I have the numerials along with alpha
> etc. I can't seem to get the results of the OUTPUT.TXT file below.
>
> #!/usr/bin/perl -w
>
> require 5.000;
>
> use warnings;
> use strict;
> use POSIX;
>
> my %tags = ();
>
> my $input = $ARGV[0];
> my $output = $ARGV[1];
>
> open (FILE, "< $input") or die "cannot open $input: $!\n";
> while (my $tag = <FILE> ) {
> $tag =~ m/<tag id=(\w+)>/;
> $tags{$1} = $tag;


You shouldn't use $1 unless you are sure the pattern matched.


> }
> open (NEWFILE, "> $output");


You should verify that this file opened correctly too.

open (NEWFILE, "> $output") or die "cannot open $output: $!\n";


> foreach my $id ( map { $_->[0] }
> sort { $a->[0] cmp $b->[0] || $a->[7] <=> $b->[7] }
> map { [ $_, ( isdigit( $_ ) ? $_ : 0 ) ] }
> keys %tags )
> {
> print NEWFILE $tags{$id};


Missing a closing } here.


This appears to do what you want:


while ( my $tag = <FILE> ) {
next unless $tag =~ /<tag id=(\d*)([^>]*)>/;
$tags{ sprintf '%04d%s', $1 || 9999, $2 } = $tag;
# if you are expecting numbers larger than 9999 then
# use a larger constant and a larger sprintf format
}

open (NEWFILE, "> $output") or die "cannot open $output: $!\n";

foreach my $id ( sort keys %tags ) {
print NEWFILE $tags{ $id };er
}



John
--
use Perl;
program
fulfillment
Mumia W.

2006-07-26, 6:57 pm

On 07/25/2006 10:07 PM, Mumia W. wrote:
> On 07/25/2006 08:32 PM, macromedia wrote:
>
> "Cmp" does string comparisons. Use "<=>" for numeric
> comparisons. Read "perldoc perlop".
>


No, this doesn't quite do it. I see your data is a little more
complicated than what I thought before. When you have unusual
sorting requirements, you need unusual sorting keys. I decided
to use sprintf() to give me keys that are formatted perfectly
for sorting:

require 5.000;

use warnings;
use strict;
use POSIX;

my %tags = ();

my $input = 'sort_tags.dat';
my $output = 'sort_tags.out';

open (FILE, "< $input") or die "cannot open $input: $!\n";
while (my $tag = <FILE> ) {
$tag =~ m/<tag id=(\d*)([[:alpha:]]*)>/;
$tags{sprintf("%04d%6s",$1 || 999,$2)} = $tag;
}
open (NEWFILE, "> $output");
foreach my $id ( sort keys %tags )
{
print NEWFILE $tags{$id};
}

close NEWFILE;
close FILE;


Mumia W.

2006-07-26, 6:57 pm

On 07/25/2006 08:32 PM, macromedia wrote:
> Hi,
>
> I can't seem to get my script to sort properly. [...]



Here is a shortened version of your program:

use strict;
use warnings;
use File::Slurp;

my %tags;
foreach my $line (read_file 'sort_tags.dat') {
if ($line =~ m/id=(\d*)([[:alpha:]]*)/) {
$tags{sprintf("%04d%6s",$1 || 999,$2)} = $line;
}
}

write_file('sort_tags.out',
map $tags{$_}, sort keys %tags);

__END__


As in the other program, I used sprintf() to create
easily-sorted keys.

Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com