Home > Archive > PERL Miscellaneous > July 2004 > Hrs of work on regex: please help
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Hrs of work on regex: please help
|
|
| Robert 2004-07-28, 9:01 pm |
| After this message text is a pasted xml file I've been working
(wrestling) with.
The goal is to remove text from the file that begins with:
"<ns0:ErrorDetails>" and ends with "</ns0:ErrorDetails>".
I have done several other s/// type operations to this file to remove
other text parts, and it was no problem. I've heard the 'devil is in
the details' and I believe it now, hehe.
I have copy 'n pasted the text surrounding the target before and
after, and made a string of it in a simple Perl script. I had to use
single quotes, due to the numerous double quotes in the text. I used
the same s/// operation and it printed as I want! Wonderful, I
thought, now to do it on the file contents. But, it just will not do a
replace. It is getting beyond the point where I can think on this
problem without my brain feeling a spinning motion. I humbly submit my
problem for discussion.
My code follows:
#!/usr/bin/perl
my $results_dir = $ARGV[0];
my $expected_results_dir = "$results_dir/expectedresults";
my $cleaned_results_dir = "$results_dir/cleanedresults";
my $cleaned_expected_results_dir =
"$results_dir/expectedresults/cleanedexpectedresults";
my $cleaned_xml = "";
my $clean_file = "";
my $Line = "";
opendir(BIN, $results_dir) or die "Can't open directory: $dir: $!";
FILE_CLEAN: while( defined ($file = readdir BIN) )
{
next FILE_CLEAN if $file =~ /^\.\.?$/; # skip . and ..
next FILE_CLEAN if (-d "$results_dir/$file");# skip if it is
directory
open(To_Clean, "$results_dir/$file") or die "Can't open $To_Clean:
$!\n";
my @data = <To_Clean>; #read file contents
close(To_Clean); #close file
$clean_file = "$cleaned_results_dir/$file";
for (my $i = 0; $i < scalar(@data); ++$i) {
$Line = $data[$i];
#replace whitespaces at beginning and end with nothing
chomp $Line;
$Line =~ tr/\t/ /;
$Line =~ s/\t//g;
$Line =~ s/\<ns0:ErrorDetails\>.*?\<\/ns0:ErrorDetails\>//g;
$cleaned_xml = $cleaned_xml . $Line;
$Line = "";
};#END FOR
open(CLEANFILE, ">$clean_file") or die "Can't open $clean_file:
$!\n";
print CLEANFILE $cleaned_xml;
close(CLEANFILE);
$cleaned_xml = "";
};#END WHILE
print "...DONE\n";
closedir(BIN);
########################################
########################################
<?xml version="1.0" encoding="UTF-8"?>
<ns0:BOBEntitlementRoot xmlns:ns0="http://www.noco.com/BOBEntitlement"
version="NA"><ns0:ApplicationArea><ns0:CreationDateTime>2004-07-26T14:07:02.248-07:00</ns0:CreationDateTime><ns0:SourceSystem>HANDSHAKE</ns0:SourceSystem><ns0:Operation><ns0:Name>UnknownOperation</ns0:Name><ns0:Version>NA</ns0:Version></ns0:Operation></ns
0:ApplicationArea><ns0:DataArea><ns0:Status><ns0:StatusCode>Failure</ns0:StatusCode><ns0:Error><ns0:ErrorCode>2101</ns0:ErrorCode><ns0:ErrorSever
ty>Error</ns0:ErrorSeverity><ns0:ErrorCategory>InputFormatError</ns0:ErrorCategory><ns0:ErrorDescription>Invalid
XML request. </ns0:ErrorDescription><ns0:ErrorDetails>Job-4296 Error
in [Processes/Integration_Interfaces/getEntitlement/getBHAPIJMSRequest_1.process/Group
(1)/Group/Parse XML]
Output data invalid
at com.tibco.pe.core.TaskImpl.a(TaskImpl.java:501)
at com.tibco.pe.core.TaskImpl.eval(TaskImpl.java:428)
at com.tibco.pe.core.Job.a(Job.java:591)
at com.tibco.pe.core.Job.if(Job.java:443)
at com.tibco.pe.core.JobDispatcher$a.a(JobDispatcher.java:270)
at com.tibco.pe.core.JobDispatcher$a.run(JobDispatcher.java:218)
caused by: org.xml.sax.SAXException: validation error: unexpected
content "{http://www.noco.com/BOBEntitlement}Sku"; expected
"{http://www.noco.com/BOBEntitlement}Name" or
"{http://www.noco.com/BOBEntitlement}Description" or
"{http://www.noco.com/BOBEntitlement}DomainType" or
"{http://www.noco.com/BOBEntitlement}PropertyTypeStatus" or
"{http://www.noco.com/BOBEntitlement}ChangeDate" or
"{http://www.noco.com/BOBEntitlement}DefaultValue" or
"{http://www.noco.com/BOBEntitlement}UsageType"
({com.tibco.xml. validation}COMPLEX_E_UNEXPECTED_CONTENT)
at
/BOBEntitlementRoot[1]/DataArea[1]/BOBEntitlement[1]/OfferingProperty[1]/OfferingPropertyType[1]/Sku[1]
java.lang.Exception: unexpected content
"{http://www.noco.com/BOBEntitlement}Sku"; expected
"{http://www.noco.com/BOBEntitlement}Name" or
"{http://www.noco.com/BOBEntitlement}Description" or
"{http://www.noco.com/BOBEntitlement}DomainType" or
"{http://www.noco.com/BOBEntitlement}PropertyTypeStatus" or
"{http://www.noco.com/BOBEntitlement}ChangeDate" or
"{http://www.noco.com/BOBEntitlement}DefaultValue" or
"{http://www.noco.com/BOBEntitlement}UsageType"
at com.tibco.xml.validation.helpers.d.a(XmlContentValidatorElementContext.java:348)
at com.tibco.xml.validation.helpers.h.if(XmlContentValidator.java:753)
at com.tibco.xml.validation.helpers.h.text(XmlContentValidator.java:1601)
at com.tibco.xml.datamodel.nodes.Text.content(Text.java:327)
at com.tibco.xml.datamodel.nodes.Element.content(Element.java:1101)
at com.tibco.xml.datamodel.nodes.Element.content(Element.java:1101)
at com.tibco.xml.datamodel.nodes.Element.content(Element.java:1101)
at com.tibco.xml.datamodel.nodes.Element.content(Element.java:1101)
at com.tibco.xml.datamodel.nodes.Element.content(Element.java:1101)
at com.tibco.xml.datamodel.nodes.Element.content(Element.java:1101)
at com.tibco.xml.datamodel.nodes.Document.content(Document.java:226)
at com.tibco.xml.datamodel.nodes.Document.serialize(Document.java:242)
at com.tibco.xml.xdata.bind.BindingRunner.validate(BindingRunner.java:302)
at com.tibco.xml.xdata.bind.OutputBindingRunner.validate(OutputBindingRunner.java:47)
at com.tibco.pe.core.TaskImpl.a(TaskImpl.java:489)
at com.tibco.pe.core.TaskImpl.eval(TaskImpl.java:428)
at com.tibco.pe.core.Job.a(Job.java:591)
at com.tibco.pe.core.Job.if(Job.java:443)
at com.tibco.pe.core.JobDispatcher$a.a(JobDispatcher.java:270)
at com.tibco.pe.core.JobDispatcher$a.run(JobDispatcher.java:218)
validation error: no declaration for element
"{http://www.noco.com/BOBEntitlement}Sku"
({com.tibco.xml. validation}COMPLEX_E_MISSING_ELEMENT_DEC
LARATION) at
/BOBEntitlementRoot[1]/DataArea[1]/BOBEntitlement[1]/OfferingProperty[1]/OfferingPropertyType[1]/Sku[1]
java.lang.Exception: no declaration for element
"{http://www.noco.com/BOBEntitlement}Sku"
at com.tibco.xml.validation.helpers.d.if(XmlContentValidatorElementContext.java:615)
at com.tibco.xml.validation.helpers.d.a(XmlContentValidatorElementContext.java:180)
at com.tibco.xml.validation.helpers.h.if(XmlContentValidator.java:818)
at com.tibco.xml.validation.helpers.h.text(XmlContentValidator.java:1601)
at com.tibco.xml.datamodel.nodes.Text.content(Text.java:327)
at com.tibco.xml.datamodel.nodes.Element.content(Element.java:1101)
at com.tibco.xml.datamodel.nodes.Element.content(Element.java:1101)
at com.tibco.xml.datamodel.nodes.Element.content(Element.java:1101)
at com.tibco.xml.datamodel.nodes.Element.content(Element.java:1101)
at com.tibco.xml.datamodel.nodes.Element.content(Element.java:1101)
at com.tibco.xml.datamodel.nodes.Element.content(Element.java:1101)
at com.tibco.xml.datamodel.nodes.Document.content(Document.java:226)
at com.tibco.xml.datamodel.nodes.Document.serialize(Document.java:242)
at com.tibco.xml.xdata.bind.BindingRunner.validate(BindingRunner.java:302)
at com.tibco.xml.xdata.bind.OutputBindingRunner.validate(OutputBindingRunner.java:47)
at com.tibco.pe.core.TaskImpl.a(TaskImpl.java:489)
at com.tibco.pe.core.TaskImpl.eval(TaskImpl.java:428)
at com.tibco.pe.core.Job.a(Job.java:591)
at com.tibco.pe.core.Job.if(Job.java:443)
at com.tibco.pe.core.JobDispatcher$a.a(JobDispatcher.java:270)
at com.tibco.pe.core.JobDispatcher$a.run(JobDispatcher.java:218)
validation error: unexpected end of content
({com.tibco.xml. validation}COMPLEX_E_UNEXPECTED_END_OF_C
ONTENT) at
/BOBEntitlementRoot[1]/DataArea[1]/BOBEntitlement[1]/OfferingProperty[1]/OfferingPropertyType[1]
java.lang.Exception: unexpected end of content
at com.tibco.xml.validation.helpers.d.case(XmlContentValidatorElementContext.java:414)
at com.tibco.xml.validation.helpers.h.a(XmlContentValidator.java:1182)
at com.tibco.xml.validation.helpers.h.endElement(XmlContentValidator.java:1034)
at com.tibco.xml.datamodel.nodes.Element.content(Element.java:1108)
at com.tibco.xml.datamodel.nodes.Element.content(Element.java:1101)
at com.tibco.xml.datamodel.nodes.Element.content(Element.java:1101)
at com.tibco.xml.datamodel.nodes.Element.content(Element.java:1101)
at com.tibco.xml.datamodel.nodes.Element.content(Element.java:1101)
at com.tibco.xml.datamodel.nodes.Document.content(Document.java:226)
at com.tibco.xml.datamodel.nodes.Document.serialize(Document.java:242)
at com.tibco.xml.xdata.bind.BindingRunner.validate(BindingRunner.java:302)
at com.tibco.xml.xdata.bind.OutputBindingRunner.validate(OutputBindingRunner.java:47)
at com.tibco.pe.core.TaskImpl.a(TaskImpl.java:489)
at com.tibco.pe.core.TaskImpl.eval(TaskImpl.java:428)
at com.tibco.pe.core.Job.a(Job.java:591)
at com.tibco.pe.core.Job.if(Job.java:443)
at com.tibco.pe.core.JobDispatcher$a.a(JobDispatcher.java:270)
at com.tibco.pe.core.JobDispatcher$a.run(JobDispatcher.java:218)
at com.tibco.xml.xdata.bind.BindingRemarkHandler.assertNoErrors(BindingRemarkHandler.java:43)
at com.tibco.xml.xdata.bind.BindingRunner.validate(BindingRunner.java:319)
at com.tibco.xml.xdata.bind.OutputBindingRunner.validate(OutputBindingRunner.java:47)
at com.tibco.pe.core.TaskImpl.a(TaskImpl.java:489)
at com.tibco.pe.core.TaskImpl.eval(TaskImpl.java:428)
at com.tibco.pe.core.Job.a(Job.java:591)
at com.tibco.pe.core.Job.if(Job.java:443)
at com.tibco.pe.core.JobDispatcher$a.a(JobDispatcher.java:270)
at com.tibco.pe.core.JobDispatcher$a.run(JobDispatcher.java:218)
</ns0:ErrorDetails></ns0:Error></ns0:Status></ns0:DataArea></ns0:BOBEntitlementRoot>
| |
| Gunnar Hjalmarsson 2004-07-28, 9:01 pm |
| Robert wrote:
> The goal is to remove text from the file that begins with:
> "<ns0:ErrorDetails>" and ends with "</ns0:ErrorDetails>".
Hmm.. Far too much code for my taste. ;-)
<snip>
> my @data = <To_Clean>; #read file contents
Here you slurp the file into an array, where each line is a separate
element.
<snip>
> for (my $i = 0; $i < scalar(@data); ++$i) {
Here you start various operations for each line.
<snip>
> $Line =~ s/\<ns0:ErrorDetails\>.*?\<\/ns0:ErrorDetails\>//g;
Since the start and end tags appear on different lines, that pattern
will never match.
Try slurping the file into a scalar variable instead, and add the /s
modifier to the s/// operator.
--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
| |
| Robert 2004-07-28, 9:01 pm |
| Thanks for the reply. Just to close the loop, what I ended up doing
was using the join function on the @data variable. I then used the
tr/// function to replace tabs and newlines with a space char. Now,
everything is set for the substituion, and the resulting files are
still able to be viewed as xml!
The main thing I have learned is when I spend more than an hour on a
problem, look at it from a different direction.
Thanks, again.
Gunnar Hjalmarsson <noreply@gunnar.cc> wrote in message news:<2mlootFonsi8U1@uni-berlin.de>...
> Robert wrote:
>
> Hmm.. Far too much code for my taste. ;-)
>
> <snip>
>
>
> Here you slurp the file into an array, where each line is a separate
> element.
>
> <snip>
>
>
> Here you start various operations for each line.
>
> <snip>
>
>
> Since the start and end tags appear on different lines, that pattern
> will never match.
>
> Try slurping the file into a scalar variable instead, and add the /s
> modifier to the s/// operator.
| |
| Jim Gibson 2004-07-28, 9:01 pm |
| In article <f46a37bb.0407261651.417c7469@posting.google.com>, Robert
<robert_bondi@intuit.com> wrote:
[problem description snipped]
>
> My code follows:
> #!/usr/bin/perl
You should have 'use strict;' here. You are declaring variables with
'my', so why not ask Perl for help.
> my $results_dir = $ARGV[0];
> my $expected_results_dir = "$results_dir/expectedresults";
> my $cleaned_results_dir = "$results_dir/cleanedresults";
> my $cleaned_expected_results_dir =
> "$results_dir/expectedresults/cleanedexpectedresults";
The above three lines are irrelevant to your problem, and should be
eliminated from your posted code. Please try to always post the
shortest possible program that illustrates the problem you are having.
> my $cleaned_xml = "";
> my $clean_file = "";
> my $Line = "";
Unneeded initialization of variables.
> opendir(BIN, $results_dir) or die "Can't open directory: $dir: $!";
You probably want $results_dir instead of $dir in your error message.
> FILE_CLEAN: while( defined ($file = readdir BIN) )
You don't need to name this loop because you are not using any loop
control in any inner loop.
> {
> next FILE_CLEAN if $file =~ /^\.\.?$/; # skip . and ..
> next FILE_CLEAN if (-d "$results_dir/$file");# skip if it is
You can just use 'next if ...' here.
> directory
> open(To_Clean, "$results_dir/$file") or die "Can't open $To_Clean:
> $!\n";
The variable $To_Clean contains nothing. Had you put 'use strict;' at
the beginning of your program, perl would have shown you this error.
> my @data = <To_Clean>; #read file contents
As Gunnar pointed out, you probably want to replace this with 'my $data
= <To_Clean>;'
> close(To_Clean); #close file
> $clean_file = "$cleaned_results_dir/$file";
> for (my $i = 0; $i < scalar(@data); ++$i) {
You can more simply say 'for my $Line ( @data ) {', although if you
follows Gunnar's advice you won't be using @data.
> $Line = $data[$i];
> #replace whitespaces at beginning and end with nothing
> chomp $Line;
> $Line =~ tr/\t/ /;
> $Line =~ s/\t//g;
You just changed all of the tab characters to spaces. Why are you now
trying to change them to nothings.
> $Line =~ s/\<ns0:ErrorDetails\>.*?\<\/ns0:ErrorDetails\>//g;
You don't need to escape < and > as they are not special in a
double-quotish context. If you don't use forward slashes to delimit
your substitution strings, you won't have to escape the forward slashes
in your strings, either.
If you have slurped your file into a single variable, you can use
something like (untested):
$data =~ s|<ns0:ErrorDetails>.*?</ns0:ErrorDetails>||g;
[rest of program and way too much test data snipped]
Good luck!
| |
| Gunnar Hjalmarsson 2004-07-29, 3:57 pm |
| Jim Gibson wrote:
> Robert wrote:
>
> As Gunnar pointed out, you probably want to replace this with 'my
> $data = <To_Clean>;'
That must be combined with enabling "slurp" mode:
local $/;
--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
| |
| Gunnar Hjalmarsson 2004-07-29, 3:57 pm |
| Robert wrote:
> Gunnar Hjalmarsson wrote:
>
> Thanks for the reply. Just to close the loop, what I ended up doing
> was using the join function on the @data variable.
You could have skipped the @data array by just doing:
my $data = do { local $/; <To_Clean> };
> I then used the tr/// function to replace tabs and newlines with a
> space char.
Why? I suspect that the reason is that you are unfamiliar with the /s
modifier. Read about it in "perldoc perlre".
--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
|
|
|
|
|