Home > Archive > PERL Beginners > February 2006 > Repeated Regex over a line with double quotes
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Repeated Regex over a line with double quotes
|
|
| Wagner, David --- Senior Programmer Analyst --- WG 2006-02-21, 6:57 pm |
| here is a small snippet of code(LABEL1) which appears to remove a comma wh=
ich lies between two double quotes. I run it and and display output and the=
one line of code which does have the comma is cleaned up.
In LABEL2 , is a snippet of code which does not work, but in all appearanc=
es is the same as my small snippet of code.
The working code is AS 5.8.3 on Windows XP while the the failing is on Sun=
and is also 5.8.3.
I am receiving some data and and need to clean up and also split. I prefer=
to not have to load any type of csv handler and works for the most part.
I don't see the difference in the code other than two different systems.
Note: Moved this same code over ( didn't occur to try it, but head must be=
stuck). It runs and removes the , from within the double quotes.
Has to be something simple that I am missing. Though been doing Perl for q=
uite a while, never really been good at the regex processing.
=09
Thanks.
Wags ;)
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
3D=3D=3D=3D=3D=3D=3D=3D
LABEL1:
#!perl
use strict;
use warnings;
my $MyIn =3D 0;
my $MyOut =3D 0;
my $MyHldData;
my $MyWrkFld;
my $MyWrkFldUpd;
while ( <DATA> ) {
chomp;
s/\r//g;
next if ( /^\s*$/ );
my $MyHldData =3D $_;
if ( /"/ ) {
printf "*1a* Looking at line with quotes\n";
while ( /("[^"]+")/ ) {
$MyWrkFld =3D $1;
printf "*1* <%s>",
$1;
$MyWrkFldUpd =3D $MyWrkFld;
if ( $MyWrkFld =3D~ /,/ ) {
printf "<--Comma hit!!";
$MyWrkFldUpd =3D~ s/[,"]//g;
s/$MyWrkFld/$MyWrkFldUpd/g;
}
else {
$MyWrkFldUpd =3D~ s/"//g;
s/$MyWrkFld/$MyWrkFldUpd/g;
}
printf "\n";
}
}
else {
printf "No quotes in line %d\n",
$.;
next;
}
printf "ln:<%5d>\nor:<%s>\nmd:<%s>\n",
$.,
$MyHldData,
$_
}
__DATA__
2006-02-18 12:00EE,"TBUTHGHN - GGTT","TEN DIGGB","TEN DIGGB","FHEZGG PEINT =
TD","7077 CBNTBLIDETGD GEY",2006-02-14 12:00EE,15,"10:05","0152785","273752=
6",1,1250,10,"892913494",1,25=20
2006-02-18 12:00EE,"TBUTHGHN - GGTT","TEN DIGGB","TEN DIGGB","EETGH EGCHENI=
CEL","7840 BELBBE EVG",2006-02-15 12:00EE,16,"11:27","0107405","2846954",1,=
1167,3,"916708540",1,25=20
2006-02-18 12:00EE,"TBUTHGHN - GGTT","TEN DIGGB","TEN DIGGB","EETGH EGCHENI=
CEL","7840 BELBBE EVG",2006-02-15 12:00EE,17,"13:47","0107405","2846954",1,=
456,1,"916708557",1,25=20
2006-02-18 12:00EE,"TBUTHGHN - GGTT","TEN DIGGB","TEN DIGGB","NEVEL DGPBT L=
GVGL HGPEIH EGGNT","N46433 FLGGT TT, BLDG 661-3",2006-02-16 12:00EE,18,"11:=
40","0164109","2500058",1,529,1,"1078754644",1,25
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
3D=3D=3D
LABEL2:
....
INPUTTP: while (<MYFILEIN> ) {
chomp;
$in++;
s/\r//g;
next if ( /^\s*$/ ); # bypass blank lines
if ( ! /,(\d+)$/ ) {
printf "Expecting a csv line ending with the total number of ti=
mes associated with\n";
printf "a terminal, but did not get a hit!\n";
printf "Data(%d):\<%-s>\n",
$.,
$_;
diet(5, $MyFileIn);
}
$MyDtlCnt =3D $1;
undef @MyWorka;
undef @MyUnSortedData;
if ( /"/ ) {
printf "*1a* Looking at line with quotes\n";
while ( /("[^"]+")/ ) {
$MyWrkFld =3D $1;
$MyWrkFldUpd =3D $MyWrkFld;
if ( $MyWrkFld =3D~ /,/ ) {
$MyWrkFldUpd =3D~ s/[,"]//g;
s/$MyWrkFld/$MyWrkFldUpd/g;
}
else {
$MyWrkFldUpd =3D~ s/"//g;
s/$MyWrkFld/$MyWrkFldUpd/g;
}
}
}
.....
****************************************
***************
This message contains information that is confidential
and proprietary to FedEx Freight or its affiliates.
It is intended only for the recipient named and for
the express purpose(s) described therein.
Any other use is prohibited.
****************************************
***************
| |
| Timothy Johnson 2006-02-21, 6:57 pm |
|
I don't have time right now to run it, but I notice that you left out
the portion of the code in LABEL2: where you actually print whether or
not it worked. Is it possible that it really is working but you forgot
this line?
printf "<--Comma hit!!";
Sorry if that seems like a stupid question, but I've gotten stuck on
worse.
----Original Message-----
From: Wagner, David --- Senior Programmer Analyst --- WGO
[mailto:David.Wagner@freight.fedex.com]=20
Sent: Tuesday, February 21, 2006 2:56 PM
To: Beginner Perl
Subject: Repeated Regex over a line with double quotes
here is a small snippet of code(LABEL1) which appears to remove
a comma which lies between two double quotes. I run it and and display
output and the one line of code which does have the comma is cleaned up.
In LABEL2 , is a snippet of code which does not work, but in all
appearances is the same as my small snippet of code.
The working code is AS 5.8.3 on Windows XP while the the failing
is on Sun and is also 5.8.3.
I am receiving some data and and need to clean up and also
split. I prefer to not have to load any type of csv handler and works
for the most part.
I don't see the difference in the code other than two different
systems.
Note: Moved this same code over ( didn't occur to try it, but
head must be stuck). It runs and removes the , from within the double
quotes.
Has to be something simple that I am missing. Though been doing
Perl for quite a while, never really been good at the regex processing.
=09
Thanks.
Wags ;)
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
3D=3D=3D=3D=3D=3D=3D=3D=3D
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
LABEL1:
#!perl
use strict;
use warnings;
my $MyIn =3D 0;
my $MyOut =3D 0;
my $MyHldData;
my $MyWrkFld;
my $MyWrkFldUpd;
while ( <DATA> ) {
chomp;
s/\r//g;
next if ( /^\s*$/ );
my $MyHldData =3D $_;
if ( /"/ ) {
printf "*1a* Looking at line with quotes\n";
while ( /("[^"]+")/ ) {
$MyWrkFld =3D $1;
$MyWrkFldUpd =3D $MyWrkFld;
if ( $MyWrkFld =3D~ /,/ ) {
printf "<--Comma hit!!";
$MyWrkFldUpd =3D~ s/[,"]//g;
s/$MyWrkFld/$MyWrkFldUpd/g;
}
else {
$MyWrkFldUpd =3D~ s/"//g;
s/$MyWrkFld/$MyWrkFldUpd/g;
}
printf "\n";
}
}
else {
printf "No quotes in line %d\n",
$.;
next;
}
printf "ln:<%5d>\nor:<%s>\nmd:<%s>\n",
$.,
$MyHldData,
$_
}
__DATA__
2006-02-18 12:00EE,"TBUTHGHN - GGTT","TEN DIGGB","TEN DIGGB","FHEZGG
PEINT TD","7077 CBNTBLIDETGD GEY",2006-02-14
12:00EE,15,"10:05","0152785","2737526",1,1250,10,"892913494",1,25=20
2006-02-18 12:00EE,"TBUTHGHN - GGTT","TEN DIGGB","TEN DIGGB","EETGH
EGCHENICEL","7840 BELBBE EVG",2006-02-15
12:00EE,16,"11:27","0107405","2846954",1,1167,3,"916708540",1,25=20
2006-02-18 12:00EE,"TBUTHGHN - GGTT","TEN DIGGB","TEN DIGGB","EETGH
EGCHENICEL","7840 BELBBE EVG",2006-02-15
12:00EE,17,"13:47","0107405","2846954",1,456,1,"916708557",1,25=20
2006-02-18 12:00EE,"TBUTHGHN - GGTT","TEN DIGGB","TEN DIGGB","NEVEL
DGPBT LGVGL HGPEIH EGGNT","N46433 FLGGT TT, BLDG 661-3",2006-02-16
12:00EE,18,"11:40","0164109","2500058",1,529,1,"1078754644",1,25
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
3D=3D=3D
LABEL2:
....
INPUTTP: while (<MYFILEIN> ) {
chomp;
$in++;
s/\r//g;
next if ( /^\s*$/ ); # bypass blank lines
if ( ! /,(\d+)$/ ) {
printf "Expecting a csv line ending with the total number of
times associated with\n";
printf "a terminal, but did not get a hit!\n";
printf "Data(%d):\<%-s>\n",
$.,
$_;
diet(5, $MyFileIn);
}
$MyDtlCnt =3D $1;
undef @MyWorka;
undef @MyUnSortedData;
if ( /"/ ) {
printf "*1a* Looking at line with quotes\n";
while ( /("[^"]+")/ ) {
$MyWrkFld =3D $1;
$MyWrkFldUpd =3D $MyWrkFld;
if ( $MyWrkFld =3D~ /,/ ) {
$MyWrkFldUpd =3D~ s/[,"]//g;
s/$MyWrkFld/$MyWrkFldUpd/g;
}
else {
$MyWrkFldUpd =3D~ s/"//g;
s/$MyWrkFld/$MyWrkFldUpd/g;
}
}
}
.....
****************************************
***************
This message contains information that is confidential
and proprietary to FedEx Freight or its affiliates.
It is intended only for the recipient named and for
the express purpose(s) described therein.
Any other use is prohibited.
****************************************
***************
--=20
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
<http://learn.perl.org/> <http://learn.perl.org/first-response>
| |
| Wagner, David --- Senior Programmer Analyst --- WG 2006-02-21, 6:57 pm |
| Timothy Johnson wrote:
> I don't have time right now to run it, but I notice that you left out
> the portion of the code in LABEL2: where you actually print whether or
> not it worked. Is it possible that it really is working but you
> forgot this line?
>=20
> printf "<--Comma hit!!";
>=20
> Sorry if that seems like a stupid question, but I've gotten stuck on
> worse.
>=20
No. The second is an actual Production script and as it does the split and=
checks the data , it fails because the comma was still in the code. That i=
s why I know it fails.
Wags ;)
****************************************
***************
This message contains information that is confidential
and proprietary to FedEx Freight or its affiliates.
It is intended only for the recipient named and for
the express purpose(s) described therein.
Any other use is prohibited.
****************************************
***************
| |
| Uri Guttman 2006-02-21, 6:57 pm |
| >>>>> "WD" == Wagner, David <--- Senior Programmer Analyst --- WGO"> writes:
WD> here is a small snippet of code(LABEL1) which appears to remove
WD> a comma which lies between two double quotes. I run it and and
WD> display output and the one line of code which does have the
WD> comma is cleaned up. In LABEL2 , is a snippet of code which
WD> does not work, but in all appearances is the same as my small
WD> snippet of code. The working code is AS 5.8.3 on Windows XP
WD> while the the failing is on Sun and is also 5.8.3.
use Text::Balance or a csv module to parse that data. you are badly
reinventing several wheel.s
WD> I am receiving some data and and need to clean up and also
WD> split. I prefer to not have to load any type of csv handler and
WD> works for the most part.
why not? the modules work 100% and the xs version will likely be much
faster than anything written in perl. there is no win for you in writing
your own csv parser. there are little traps and gotchas all over csv
parsing.
WD> my $MyHldData;
WD> my $MyWrkFld;
WD> my $MyWrkFldUpd;
please choose better names. you may think those are good but i surely
can't read them. and will stop calling myself shirley.
common perl style is to use _ and not StudlyCaps/CamelCase for names. _
was put on the keyboard and allowed in perl names for a reason so use
it.
the prefix My is silly when the var is also declared with my.
Wrk maybe just work but Hld? Fld is probably field but is Upd updated?
i would use $wrk_field or $updated_field or variants of those.
i will leave any more code review to others here.
uri
--
Uri Guttman ------ uri@stemsystems.com -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
| |
| Hans Meier 2006-02-21, 9:55 pm |
| Wagner, David --- Senior Programmer Analyst --- WGO am Dienstag, 21. Februar 2006 23.55:
> here is a small snippet of code(LABEL1) which appears to remove a comma
> which lies between two double quotes. I run it and and display output and
> the one line of code which does have the comma is cleaned up. In LABEL2 ,
> is a snippet of code which does not work, but in all appearances is the
> same as my small snippet of code. The working code is AS 5.8.3 on Windows
> XP while the the failing is on Sun and is also 5.8.3.
>
> I am receiving some data and and need to clean up and also split. I prefer
> to not have to load any type of csv handler and works for the most part.
>
> I don't see the difference in the code other than two different systems.
> Note: Moved this same code over ( didn't occur to try it, but head must be
> stuck). It runs and removes the , from within the double quotes.
>
> Has to be something simple that I am missing. Though been doing Perl for
> quite a while, never really been good at the regex processing.
>
> Thanks.
>
> Wags ;)
> ========================================
===================================
> ========================================
======
>
> LABEL1:
> #!perl
> use strict;
> use warnings;
>
> my $MyIn = 0;
> my $MyOut = 0;
>
> my $MyHldData;
> my $MyWrkFld;
> my $MyWrkFldUpd;
>
> while ( <DATA> ) {
> chomp;
> s/\r//g;
> next if ( /^\s*$/ );
> my $MyHldData = $_;
>
> if ( /"/ ) {
> printf "*1a* Looking at line with quotes\n";
> while ( /("[^"]+")/ ) {
> $MyWrkFld = $1;
> printf "*1* <%s>",
> $1;
>
> $MyWrkFldUpd = $MyWrkFld;
>
> if ( $MyWrkFld =~ /,/ ) {
> printf "<--Comma hit!!";
> $MyWrkFldUpd =~ s/[,"]//g;
> s/$MyWrkFld/$MyWrkFldUpd/g;
> }
> else {
> $MyWrkFldUpd =~ s/"//g;
> s/$MyWrkFld/$MyWrkFldUpd/g;
> }
> printf "\n";
> }
> }
> else {
> printf "No quotes in line %d\n",
> $.;
> next;
> }
> printf "ln:<%5d>\nor:<%s>\nmd:<%s>\n",
> $.,
> $MyHldData,
> $_
> }
> __DATA__
> 2006-02-18 12:00EE,"TBUTHGHN - GGTT","TEN DIGGB","TEN DIGGB","FHEZGG PEINT
> TD","7077 CBNTBLIDETGD GEY",2006-02-14
> 12:00EE,15,"10:05","0152785","2737526",1,1250,10,"892913494",1,25
> 2006-02-18 12:00EE,"TBUTHGHN - GGTT","TEN DIGGB","TEN DIGGB","EETGH
> EGCHENICEL","7840 BELBBE EVG",2006-02-15
> 12:00EE,16,"11:27","0107405","2846954",1,1167,3,"916708540",1,25 2006-02-18
> 12:00EE,"TBUTHGHN - GGTT","TEN DIGGB","TEN DIGGB","EETGH EGCHENICEL","7840
> BELBBE EVG",2006-02-15
> 12:00EE,17,"13:47","0107405","2846954",1,456,1,"916708557",1,25 2006-02-18
> 12:00EE,"TBUTHGHN - GGTT","TEN DIGGB","TEN DIGGB","NEVEL DGPBT LGVGL HGPEIH
> EGGNT","N46433 FLGGT TT, BLDG 661-3",2006-02-16
> 12:00EE,18,"11:40","0164109","2500058",1,529,1,"1078754644",1,25
>
> ========================================
==========================
>
> LABEL2:
> ....
>
> INPUTTP: while (<MYFILEIN> ) {
> chomp;
>
> $in++;
> s/\r//g;
>
> next if ( /^\s*$/ ); # bypass blank lines
>
> if ( ! /,(\d+)$/ ) {
> printf "Expecting a csv line ending with the total number of
> times associated with\n"; printf "a terminal, but did not get a hit!\n";
> printf "Data(%d):\<%-s>\n",
> $.,
> $_;
> diet(5, $MyFileIn);
> }
>
> $MyDtlCnt = $1;
> undef @MyWorka;
> undef @MyUnSortedData;
>
> if ( /"/ ) {
> printf "*1a* Looking at line with quotes\n";
> while ( /("[^"]+")/ ) {
> $MyWrkFld = $1;
> $MyWrkFldUpd = $MyWrkFld;
> if ( $MyWrkFld =~ /,/ ) {
> $MyWrkFldUpd =~ s/[,"]//g;
> s/$MyWrkFld/$MyWrkFldUpd/g;
> }
> else {
> $MyWrkFldUpd =~ s/"//g;
> s/$MyWrkFld/$MyWrkFldUpd/g;
> }
> }
> }
>
> .....
For fun I played around a bit with regexes, but I think the usage of a csv module is easier :-)
while (<DATA> ) {
chomp;
# extract fields
#
my @fields=$_=~/((?:".*?")|(?:(?<=,).*?(?=,))|(?:(?<=,).*?$)|(?:^.*?(?=,)))/g;
# remove quotes
#
$_=~s/"(.*?)"/$1/ for @fields;
# print the fields separated with *
#
print join '*', @fields; print "\n";
}
__DATA__
2006-02-18 12:00EE,"TBUTHGHN - GGTT","TEN DIGGB","TEN DIGGB","FHEZGG PEINT TD","7077 CBNTBLIDETGD GEY",2006-02-14 12:00EE,15,"10:05","0152785","2737526",1,1250,10,"892913494",1,25
2006-02-18 12:00EE,"TBUTHGHN - GGTT","TEN DIGGB","TEN DIGGB","EETGH EGCHENICEL","7840 BELBBE EVG",2006-02-15 12:00EE,16,"11:27","0107405","2846954",1,1167,3,"916708540",1,25
2006-02-18 12:00EE,"TBUTHGHN - GGTT","TEN DIGGB","TEN DIGGB","EETGH EGCHENICEL","7840 BELBBE EVG",2006-02-15 12:00EE,17,"13:47","0107405","2846954",1,456,1,"916708557",1,25
2006-02-18 12:00EE,"TBUTHGHN - GGTT","TEN DIGGB","TEN DIGGB","NEVEL DGPBT LGVGL HGPEIH EGGNT","N46433 FLGGT TT, BLDG 661-3",2006-02-16 12:00EE,18,"11:40","0164109","2500058",1,529,1,"1078754644",1,25
| |
| Xicheng 2006-02-21, 9:55 pm |
| Wagner, David --- Senior Programmer Analyst --- WGO wrote:
> here is a small snippet of code(LABEL1) which appears to remove a comma which lies between two double quotes. I run it and and display output and the one line of code which does have the comma is cleaned up.
> In LABEL2 , is a snippet of code which does not work, but in all appearances is the same as my small snippet of code.
> The working code is AS 5.8.3 on Windows XP while the the failing is on Sun and is also 5.8.3.
>
> I am receiving some data and and need to clean up and also split. I prefer to not have to load any type of csv handler and works for the most part.
>
> I don't see the difference in the code other than two different systems.
> Note: Moved this same code over ( didn't occur to try it, but head must be stuck). It runs and removes the , from within the double quotes.
>
> Has to be something simple that I am missing. Though been doing Perl for quite a while, never really been good at the regex processing.
>
> Thanks.
>
> Wags ;)
> ========================================
========================================
========================================
=
>
> LABEL1:
> #!perl
> use strict;
> use warnings;
>
> my $MyIn = 0;
> my $MyOut = 0;
>
> my $MyHldData;
> my $MyWrkFld;
> my $MyWrkFldUpd;
>
> while ( <DATA> ) {
> chomp;
> s/\r//g;
> next if ( /^\s*$/ );
> my $MyHldData = $_;
>
> if ( /"/ ) {
> printf "*1a* Looking at line with quotes\n";
> while ( /("[^"]+")/ ) {
> $MyWrkFld = $1;
> printf "*1* <%s>",
> $1;
=================
> $MyWrkFldUpd = $MyWrkFld;
>
> if ( $MyWrkFld =~ /,/ ) {
> printf "<--Comma hit!!";
> $MyWrkFldUpd =~ s/[,"]//g;
> s/$MyWrkFld/$MyWrkFldUpd/g;
Are you sure there is not any metacharacters in your double-quoted
texts?? if not you'd better add \Q...
> }
> else {
> $MyWrkFldUpd =~ s/"//g;
> s/$MyWrkFld/$MyWrkFldUpd/g;
> }
#===================
In your real code, you dont need to do it separately, the above "IF
ELSE" block can be replaced by:
( $MyWrkFldUpd = $MyWrkFld ) =~ s/[,"]//g;
s/\Q$MyWrkFld/$MyWrkFldUpd/g;
Xicheng
> printf "\n";
> }
> }
> else {
> printf "No quotes in line %d\n",
> $.;
> next;
> }
> printf "ln:<%5d>\nor:<%s>\nmd:<%s>\n",
> $.,
> $MyHldData,
> $_
> }
> __DATA__
> 2006-02-18 12:00EE,"TBUTHGHN - GGTT","TEN DIGGB","TEN DIGGB","FHEZGG PEINT TD","7077 CBNTBLIDETGD GEY",2006-02-14 12:00EE,15,"10:05","0152785","2737526",1,1250,10,"892913494",1,25
> 2006-02-18 12:00EE,"TBUTHGHN - GGTT","TEN DIGGB","TEN DIGGB","EETGH EGCHENICEL","7840 BELBBE EVG",2006-02-15 12:00EE,16,"11:27","0107405","2846954",1,1167,3,"916708540",1,25
> 2006-02-18 12:00EE,"TBUTHGHN - GGTT","TEN DIGGB","TEN DIGGB","EETGH EGCHENICEL","7840 BELBBE EVG",2006-02-15 12:00EE,17,"13:47","0107405","2846954",1,456,1,"916708557",1,25
> 2006-02-18 12:00EE,"TBUTHGHN - GGTT","TEN DIGGB","TEN DIGGB","NEVEL DGPBT LGVGL HGPEIH EGGNT","N46433 FLGGT TT, BLDG 661-3",2006-02-16 12:00EE,18,"11:40","0164109","2500058",1,529,1,"1078754644",1,25
>
> ========================================
==========================
>
> LABEL2:
> ....
>
> INPUTTP: while (<MYFILEIN> ) {
> chomp;
>
> $in++;
> s/\r//g;
>
> next if ( /^\s*$/ ); # bypass blank lines
>
> if ( ! /,(\d+)$/ ) {
> printf "Expecting a csv line ending with the total number of times associated with\n";
> printf "a terminal, but did not get a hit!\n";
> printf "Data(%d):\<%-s>\n",
> $.,
> $_;
> diet(5, $MyFileIn);
> }
>
> $MyDtlCnt = $1;
> undef @MyWorka;
> undef @MyUnSortedData;
>
> if ( /"/ ) {
> printf "*1a* Looking at line with quotes\n";
> while ( /("[^"]+")/ ) {
> $MyWrkFld = $1;
> $MyWrkFldUpd = $MyWrkFld;
> if ( $MyWrkFld =~ /,/ ) {
> $MyWrkFldUpd =~ s/[,"]//g;
> s/$MyWrkFld/$MyWrkFldUpd/g;
> }
> else {
> $MyWrkFldUpd =~ s/"//g;
> s/$MyWrkFld/$MyWrkFldUpd/g;
> }
> }
> }
>
> .....
>
>
>
> ****************************************
***************
> This message contains information that is confidential
> and proprietary to FedEx Freight or its affiliates.
> It is intended only for the recipient named and for
> the express purpose(s) described therein.
> Any other use is prohibited.
> ****************************************
***************
| |
| Ryan Gies 2006-02-22, 3:55 am |
| Yet another way to do it*:
while (<DATA> ) {
chomp;
# extract fields
my @fields = split /"?,"|"?,(?=\d)/;
# print the fields separated with *
print join '*', @fields;
print "\n";
}
*Presumimg that fields which are *not* enclosed in quotes will begin
with a digit.
*Note that fields can contain nested quotes as long as there isn't a
comma next to them.
Hans Meier (John Doe) wrote:
> Wagner, David --- Senior Programmer Analyst --- WGO am Dienstag, 21. Februar 2006 23.55:
>
>
>
> For fun I played around a bit with regexes, but I think the usage of a csv module is easier :-)
>
> while (<DATA> ) {
> chomp;
>
> # extract fields
> #
> my @fields=$_=~/((?:".*?")|(?:(?<=,).*?(?=,))|(?:(?<=,).*?$)|(?:^.*?(?=,)))/g;
>
> # remove quotes
> #
> $_=~s/"(.*?)"/$1/ for @fields;
>
>
> # print the fields separated with *
> #
> print join '*', @fields; print "\n";
> }
>
> __DATA__
> 2006-02-18 12:00EE,"TBUTHGHN - GGTT","TEN DIGGB","TEN DIGGB","FHEZGG PEINT TD","7077 CBNTBLIDETGD GEY",2006-02-14 12:00EE,15,"10:05","0152785","2737526",1,1250,10,"892913494",1,25
> 2006-02-18 12:00EE,"TBUTHGHN - GGTT","TEN DIGGB","TEN DIGGB","EETGH EGCHENICEL","7840 BELBBE EVG",2006-02-15 12:00EE,16,"11:27","0107405","2846954",1,1167,3,"916708540",1,25
> 2006-02-18 12:00EE,"TBUTHGHN - GGTT","TEN DIGGB","TEN DIGGB","EETGH EGCHENICEL","7840 BELBBE EVG",2006-02-15 12:00EE,17,"13:47","0107405","2846954",1,456,1,"916708557",1,25
> 2006-02-18 12:00EE,"TBUTHGHN - GGTT","TEN DIGGB","TEN DIGGB","NEVEL DGPBT LGVGL HGPEIH EGGNT","N46433 FLGGT TT, BLDG 661-3",2006-02-16 12:00EE,18,"11:40","0164109","2500058",1,529,1,"1078754644",1,25
>
>
>
>
| |
| Wagner, David --- Senior Programmer Analyst --- WG 2006-02-22, 6:56 pm |
| Jay Savage wrote:[color=darkred]
> One day, I'll start remembering this ist doesn't set reply-to...
>=20
> On 2/22/06, Jay Savage <daggerquill@gmail.com> wrote:
True, but I don't have the control I prefer over my prod and test location=
s. So I try to use what I have and keep the loads of modules to minimum. W=
hat I found out was that I was missing a chuck of data processing which nee=
ded to have the same code placed there. I corrected that and all is working=
as it should be.
Yes, I am playing with \r, but I am ftp'ing from one sun box to another, b=
ut I end up with a return and the std carriage return and linefeed. Where I=
am getting the data is a third party application but they seem to be at th=
e Windows world than Sun arena. That is my take on it, but for me until my =
Sys Admin can see what if anything can be done, I replace the \r with nohti=
ng and all is working as it should. Working harder than I should, but at le=
ast working.
I thank the list for the up and suggestions you provided. Made me look aga=
in and see that enemy is ME and not Perl. Which what it usually is.
Wags ;)[color=darkred]
****************************************
***************
This message contains information that is confidential
and proprietary to FedEx Freight or its affiliates.
It is intended only for the recipient named and for
the express purpose(s) described therein.
Any other use is prohibited.
****************************************
***************
|
|
|
|
|