For Programmers: Free Programming Magazines  


Home > Archive > PERL Beginners > January 2008 > Last line issue









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Last line issue
Andrej Kastrin

2008-01-26, 8:05 am

Dear all,

to pre-process my XML dataset in run simple Perl script on it, which
extract Id identifier from XML data and paste the whole XML record to
it. For example, the input data looks like:

<NoteSet>
<Note>
<Id>001</Id>
<To>Thomas</To>
<From>Joana</From>
</Note>
<Note>
<Id>002</Id>
<To>John</To>
<From>Paula</From>
</Note>
<Note>
<Id>003</Id>
<To>Andrew</To>
<From>Maria</From>
</Note>
</NoteSet>

and the desire output using the script should be:

001 <Note><Id>001</Id><To>Thomas</To><From>Joana</From></Note>
002 <Note><Id>002</Id><To>John</To><From>Paula</From></Note>
003 <Note><Id>003</Id><To>Andrew</To><From>Maria</From></Note>

But I can't figure why the script below omit the last record in the
input dataset, e.g.:

001 <Note><Id>001</Id><To>Thomas</To><From>Joana</From></Note>
002 <Note><Id>002</Id><To>John</To><From>Paula</From></Note>

I'd appreciate any suggestions or pointers.
Best, Andrej


## test.pl ##
use strict;
my $FNI = shift;
my $FNO = "$FNI.dat";
my $started = 0;
my $chunk;
my @chunk;

open OUT, ">$FNO";
open IN, "$FNI";
while (<IN> ) {
s/^\s+//g;
s/\s+$//g;
if (m/\<Note>/) {
if ($started) {
my $clob = join("", @chunk);
&process_chunk($clob);
} else {
$started = 1;
}
@chunk = ();
push (@chunk, $_);
while (1) {
$chunk = <IN>;
$chunk =~ s/^\s+//g;
$chunk =~ s/\s+$//g;
push (@chunk, $chunk);
last if ($chunk =~ m/\<\/Note>/);
}
}
}
close IN;
close OUT;

sub process_chunk {
my $clob = shift;
$clob =~ s/\t+/ /g;
my $id;
if ($clob =~ m/\<Id>(\d+)\<\/Id>/) {
$id = $1;
}
print OUT "$id\t$clob\n";
}

John W. Krahn

2008-01-26, 8:05 am

Andrej Kastrin wrote:
> Dear all,


Hello,

> to pre-process my XML dataset in run simple Perl script on it, which
> extract Id identifier from XML data and paste the whole XML record to
> it. For example, the input data looks like:
>
> <NoteSet>
> <Note>
> <Id>001</Id>
> <To>Thomas</To>
> <From>Joana</From>
> </Note>
> <Note>
> <Id>002</Id>
> <To>John</To>
> <From>Paula</From>
> </Note>
> <Note>
> <Id>003</Id>
> <To>Andrew</To>
> <From>Maria</From>
> </Note>
> </NoteSet>
>
> and the desire output using the script should be:
>
> 001 <Note><Id>001</Id><To>Thomas</To><From>Joana</From></Note>
> 002 <Note><Id>002</Id><To>John</To><From>Paula</From></Note>
> 003 <Note><Id>003</Id><To>Andrew</To><From>Maria</From></Note>


This should do what you want:

#!/usr/bin/perl
use warnings;
use strict;

my $FNI = shift;
my $FNO = "$FNI.dat";

open my $OUT, '>', $FNO or die "Cannot open '$FNO' $!";
open my $IN, '<', $FNI or die "Cannot open '$FNI' $!";

my ( $id, $line );
while ( <$IN> ) {
if ( m!<Note>! .. m!</Note>! ) {
( $id, $line ) = ( $1, '' ) if m!<Id>(\d+)</Id>!;
s/\A\s+//;
s/\s+\z//;
tr/\t/ /s; # more efficient than s/\t+/ /g
$line .= $_ if /Id|To|From/;
print $OUT "$id\t$line\n" if m!/Note!;
}
}

close $IN;
close $OUT;



> But I can't figure why the script below omit the last record in the
> input dataset, e.g.:


Your second while loop is eating up the third record without outputting
anything.



John
--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order. -- Larry Wall
Andrej Kastrin

2008-01-26, 10:09 pm

Dear Jonh,

many, many thanks for your quick answer.

I modified your script a bit:

to:
$line .= $_ if m!<Note>! .. m!</Note>!;
print $OUT "$id\t$line\n" if m!</Note>!;


but some problem still persists with the output:
001
<Id>001</Id><To>Thomas</To><From>Joana</From><Message>foo</Message></Note>
002 <Id>002</Id><To>John</To><From>Paula</From><Message>foo</Message></Note>
003
<Id>003</Id><To>Andrew</To><From>Maria</From><Message>foo</Message></Note>

Note that there is no opening <Note> tag at the beginning.

Best, Andrej






John W. Krahn wrote:[color=darkred]
> Andrej Kastrin wrote:
>
> Hello,
>
>
> This should do what you want:
>
> #!/usr/bin/perl
> use warnings;
> use strict;
>
> my $FNI = shift;
> my $FNO = "$FNI.dat";
>
> open my $OUT, '>', $FNO or die "Cannot open '$FNO' $!";
> open my $IN, '<', $FNI or die "Cannot open '$FNI' $!";
>
> my ( $id, $line );
> while ( <$IN> ) {
> if ( m!<Note>! .. m!</Note>! ) {
> ( $id, $line ) = ( $1, '' ) if m!<Id>(\d+)</Id>!;
> s/\A\s+//;
> s/\s+\z//;
> tr/\t/ /s; # more efficient than s/\t+/ /g
> $line .= $_ if /Id|To|From/;
> print $OUT "$id\t$line\n" if m!/Note!;
> }
> }
>
> close $IN;
> close $OUT;
>
>
>
>
> Your second while loop is eating up the third record without outputting
> anything.
>
>
>
> John

John W. Krahn

2008-01-26, 10:12 pm

Andrej Kastrin wrote:
>
> John W. Krahn wrote:
>
> many, many thanks for your quick answer.
>
> I modified your script a bit:
>
> to:
> $line .= $_ if m!<Note>! .. m!</Note>!;
> print $OUT "$id\t$line\n" if m!</Note>!;
>
>
> but some problem still persists with the output:
> 001
> <Id>001</Id><To>Thomas</To><From>Joana</From><Message>foo</Message></Note>
> 002
> <Id>002</Id><To>John</To><From>Paula</From><Message>foo</Message></Note>
> 003
> <Id>003</Id><To>Andrew</To><From>Maria</From><Message>foo</Message></Note>
>
> Note that there is no opening <Note> tag at the beginning.


This should work better:

my ( $id, $line );
while ( <$IN> ) {
if ( m!<Note>! .. m!</Note>! ) {
s/\A\s+//;
s/\s+\z//;
tr/\t/ /s;
$id = $1 if m!<Id>(\d+)</Id>!;
if ( m!<Note>! ) {
$line = $_;
}
else {
$line .= $_;
}
print $OUT "$id\t$line\n" if m!/Note!;
}
}



John
--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order. -- Larry Wall
Dr.Ruud

2008-01-27, 8:06 am

"John W. Krahn" schreef:

> tr/\t/ /s;


To also squash adjacent space characters:

tr/\t / /s;

--
Affijn, Ruud

"Gewoon is een tijger."
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com