Home > Archive > PERL Miscellaneous > March 2008 > get rid of non xml compliant lines from a file
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
get rid of non xml compliant lines from a file
|
|
| Mr_Noob 2008-03-26, 8:10 am |
| Hi all,
I try to write a perl script that would delete all non xml complient
lines (ie beginning with "<" and ending ">").
Here is what i succeded to put down so far :
sub delete_non_xml_lines
{
my $search = new File::List($xmldir);
my @files = @{ $search->find("textfile") };
foreach (@files)
{
my $file = $_;
open(FILE, "< $file") or die "Can't open $file : $!";
while(<FILE> )
{
print if $_ =~ />$/;
}
close FILE;
}
}
But how can I redirect the output for each processed file into an xml
file ?
thanks in advance for helping
Regards
| |
| RedGrittyBrick 2008-03-26, 8:10 am |
| Mr_Noob wrote:
>
> I try to write a perl script that would delete all non xml complient
> lines (ie beginning with "<" and ending ">").
<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook V4.1//EN">
<article>
<sect1>
<title>Observations on XML structure</title>
<para>This is a valid XML document.
Most of the lines don't start with an < symbol.
Some of the lines don't end with an > symbol.
Yet it is still valid XML.</para>
</sect1>
</article>
--
RGB
| |
| Ben Morrow 2008-03-26, 7:32 pm |
|
Quoth Mr_Noob <gniagnia@gmail.com>:
>
> I try to write a perl script that would delete all non xml complient
> lines (ie beginning with "<" and ending ">").
> Here is what i succeded to put down so far :
>
> sub delete_non_xml_lines
> {
> my $search = new File::List($xmldir);
Indirect object syntax (new Foo) is unreliable and can parse
incorrectly. Use
my $search = File::List->new($xmldir);
instead.
> my @files = @{ $search->find("textfile") };
>
> foreach (@files)
> {
> my $file = $_;
This is silly. Use
foreach my $file (@files) {
instead.
> open(FILE, "< $file") or die "Can't open $file : $!";
It is safer to use lexical filehandles and three-arg open.
open(my $FILE, '<', $file) or die ...;
[...from below the code...]
> But how can I redirect the output for each processed file into an xml
> file ?
To write the output to a new file, you need
open(my $XML, '>', "$file.xml") or die ...;
select $XML;
Note that this will leave $XML selected as your default output
filehandle. If you are expecting to write to STDOUT later, you will need
to select it again. Alternatively, you could use SelectSaver:
my $ss = SelectSaver->new($XML);
which will re-select STDOUT when $ss goes out of scope.
> while(<FILE> )
> {
> print if $_ =~ />$/;
$_ is the default match, so
print if />$/;
> }
> close FILE;
If you use lexical filehandles, there's no need to explicitly close
files opened for reading. Files opened for writing should be explicitly
closed, and the return value of close checked, to catch errors writing
(such as a full disk). close will return an error if any of the writes
failed, so there's no need to check each print (unless you are expecting
errors and want to abort early).
close $XML or die "can't write to $file.xml: $!";
Ben
| |
|
| Ben Morrow wrote:
> Quoth Mr_Noob <gniagnia@gmail.com>:
>
> Indirect object syntax (new Foo) is unreliable and can parse
> incorrectly. Use
I don't deny that that is god advise, though I personally have never had
any problems creating an option using "my $o = new Foo(...);" as opposed
to "my $o = Foo->new(...);"... as long as you know the potential
problems, they are easy to avoid. Namely, watch those parens :-)
--
szr
|
|
|
|
|