Home > Archive > PERL Beginners > March 2005 > Problems matching or parsing with delimiters in text
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Problems matching or parsing with delimiters in text
|
|
| Kevin Zembower 2005-03-28, 3:56 pm |
| I'm trying to read in text lines from a file that look like this:
"B-B01","Eng","Binder for Complete Set of Population Reports",13,0
"C-CD01","Eng","The Condoms CD-ROM",12,1
"F-J41a","Fre",,13,1
"F-J41a","SPA",,13,1
"M-FC01","Eng","Africa Flip Charts- Planning Your Family (E,F, Swahili)(12"=
"x9"")",7,1
"M-FC01","Fre","Africa Flip Charts- Planning Your Family (E,F, Swahili)(12"=
"x9"")",7,1
The first two lines are typical of most of the file. The second two have a =
blank third field and the last two show embedded commas and escaped double =
quotes in the third field. This is an output of another program, but I can =
filter it and make substitutions if that makes anything easier.
I'm trying to parse it with these statements:
while (<> ) { # While there are more records in the inventory export file =
called on the command line
++$ln; #increment the line number count
my ($partno, $language, $title, $cost, $available) =3D m["(.*)","(.*)","=
?(.*?)"?,(.*),(.*)$];
print "PN=3D$partno, L=3D$language, T=3D$title, C=3D$cost, A=3D$availabl=
e\n" if $debug;
next if $debug;
createlangversion($partno, $language, $title, $cost, $available);
} #while there are more lines in the import data file
The output looks like this:
kevinz@www:~/public_html/orderDB/obsolete$ ./loadInventory.pl ../tmp/t=20
PN=3DB-B01, L=3DEng, T=3DBinder for Complete Set of Population Reports, =
C=3D13, A=3D0
PN=3DC-CD01, L=3DEng, T=3DThe Condoms CD-ROM, C=3D12, A=3D1
PN=3DF-J41a, L=3DFre, T=3D, C=3D13, A=3D1
PN=3DF-J41a, L=3DSPA, T=3D, C=3D13, A=3D1
PN=3DM-FC01, L=3DEng, T=3DAfrica Flip Charts- Planning Your Family (E, =
C=3DF, Swahili)(12""x9"")",7, A=3D1
PN=3DM-FC01, L=3DFre, T=3DAfrica Flip Charts- Planning Your Family (E, =
C=3DF, Swahili)(12""x9"")",7, A=3D1
kevinz@www:~/public_html/orderDB/obsolete$=20
Note that the first four lines parsed correctly, but that the last two =
incorrectly assigned $cost to part of the title.
Can anyone help me write a match which would parse all of these lines =
correctly? Extra bonus points for explaining it throughly, so I don't have =
to ask this question here again. If it's easier to just filter or =
substitute in the original input file, what should I do?
Thank you all in advance for your help and suggestions.
-Kevin Zembower
| |
| Chris Devers 2005-03-28, 3:56 pm |
| On Mon, 28 Mar 2005, KEVIN ZEMBOWER wrote:
> I'm trying to read in text lines from a file that look like this:
> "B-B01","Eng","Binder for Complete Set of Population Reports",13,0
> "C-CD01","Eng","The Condoms CD-ROM",12,1
> "F-J41a","Fre",,13,1
> "F-J41a","SPA",,13,1
> "M-FC01","Eng","Africa Flip Charts- Planning Your Family (E,F, Swahili)(12""x9"")",7,1
> "M-FC01","Fre","Africa Flip Charts- Planning Your Family (E,F, Swahili)(12""x9"")",7,1
>
> The first two lines are typical of most of the file. The second two
> have a blank third field and the last two show embedded commas and
> escaped double quotes in the third field. This is an output of another
> program, but I can filter it and make substitutions if that makes
> anything easier.
>
> I'm trying to parse it with these statements:
>
> while (<> ) { # While there are more records in the inventory export file called on the command line
> ++$ln; #increment the line number count
> my ($partno, $language, $title, $cost, $available) = m["(.*)","(.*)","?(.*?)"?,(.*),(.*)$];
> print "PN=$partno, L=$language, T=$title, C=$cost, A=$available\n" if $debug;
> next if $debug;
> createlangversion($partno, $language, $title, $cost, $available);
> } #while there are more lines in the import data file
No. Use split(). This problem is what it's for.
while (<> ) {
$ln++; # postfix increment is more common & so readable
my ($partno, $language, $title, $cost, $available) =
split(',', $_);
if $debug {
print "PN=$partno, L=$language, T=$title, C=$cost, A=$available\n";
next;
}
createlangversion($partno, $language, $title, $cost, $available);
}
This should be both easier and more robust than hand-matching the line
with a regex.
Note though that the comma-separated values (CSV) format you're using is
infamous for being deceptively simple. If one of the fields in your file
itself has an embedded comma, then parsing it immediately gets much
harder to do. For example, if you had this record:
"C-CD02","Eng","The Condoms CD-ROM, Second Edition",12,1
Then everything falls apart.
You could try to fix this by writing code to detect these situations,
but it's really annoying to get right. You're *much* better off by
turning to a module to do the work for you. Two popular ones for this
are DBD::CSV, which allows you to write DBI code that treats your CSV
data file as if it were a table in a database, and Text::CSV (or, if you
can run it, the optimised Text::CSV_XS, which is written in C rather
than Perl and so is much faster). For information about these, see:
<http://search.cpan.org/dist/DBD-CSV/lib/DBD/CSV.pm>
<http://search.cpan.org/~alancitt/Text-CSV-0.01/CSV.pm>
<http://search.cpan.org/~jwied/Text-CSV_XS/CSV_XS.pm>
Good luck...
--
Chris Devers cdevers@pobox.com
http://devers.homeip.net:8080/blog/
np: 'Missed Me'
by The Dresden Dolls
from 'A Is For Accident'
| |
| Offer Kaye 2005-03-28, 3:56 pm |
| On Mon, 28 Mar 2005 11:13:05 -0500, KEVIN ZEMBOWER wrote:
> I'm trying to read in text lines from a file that look like this:
[...snip...]
As others have said, you really should use a module. For completness,
here's a solution using Text::CSV::Simple ("datafile" holds the data
you gave in the question):
use Text::CSV::Simple;
my $parser = Text::CSV::Simple->new;
my @data = $parser->read_file("datafile");
for my $aref (@data) {
my ($partno, $language, $title, $cost, $available) = @$aref;
print "PN=$partno, L=$language, T=$title, C=$cost, A=$available\n";
}
Note that @data is an array of array references. I assign in the "for"
loop each one to a scalar called "$aref", and then de-reference it
using the "@$aref" notation.
Hope this helps,
--
Offer Kaye
| |
| Kevin Zembower 2005-03-28, 3:56 pm |
| Offer, Chris and Jose, thank you all very much for the help. I was aware =
of Text::CSV and it's brothers, and have even used them successfully in =
other programs. I can't remember what I was thinking here. I may have even =
started with Text::CSV, but then discarded it because I thought it was =
malfunctioning on this input. but really due to errors in my program. Or, =
I might have thought it was just too much overhead for what I thought at =
first was a simple match.
In any case, I'll go back to Text::CSV. Thanks, again, for all your help =
and well-written answers.
-Kevin
On Mon, 28 Mar 2005 11:13:05 -0500, KEVIN ZEMBOWER wrote:[color=darkred]
> I'm trying to read in text lines from a file that look like this:
[...snip...]
As others have said, you really should use a module. For completness,
here's a solution using Text::CSV::Simple ("datafile" holds the data
you gave in the question):
use Text::CSV::Simple;
my $parser =3D Text::CSV::Simple->new;
my @data =3D $parser->read_file("datafile");
for my $aref (@data) {
my ($partno, $language, $title, $cost, $available) =3D @$aref;
print "PN=3D$partno, L=3D$language, T=3D$title, C=3D$cost, A=3D$availabl=
e\n";
}
Note that @data is an array of array references. I assign in the "for"
loop each one to a scalar called "$aref", and then de-reference it
using the "@$aref" notation.
Hope this helps,
--=20
Offer Kaye
--=20
To unsubscribe, e-mail: beginners-unsubscribe@perl.org=20
For additional commands, e-mail: beginners-help@perl.org=20
<http://learn.perl.org/> <http://learn.perl.org/first-response>
| |
| Graeme St. Clair 2005-03-28, 8:56 pm |
| -----Original Message-----
From: Offer Kaye [mailto:offer.kaye@gmail.com]
Sent: Monday, March 28, 2005 12:17 PM
To: Perl Beginners
Subject: Re: Problems matching or parsing with delimiters in text
On Mon, 28 Mar 2005 11:13:05 -0500, KEVIN ZEMBOWER wrote:
> I'm trying to read in text lines from a file that look like this:
[...snip...]
As others have said, you really should use a module. For completness, here's
a solution using Text::CSV::Simple ("datafile" holds the data you gave in
the question):
use Text::CSV::Simple;
my $parser = Text::CSV::Simple->new;
my @data = $parser->read_file("datafile"); for my $aref (@data) {
my ($partno, $language, $title, $cost, $available) = @$aref;
print "PN=$partno, L=$language, T=$title, C=$cost, A=$available\n"; }
####
I just tried the following (Win XP, AS Perl 5.6.1):-
C:\...PERL Documents>ppm install Text::CSV::Simple
And got (slightly re-formatted):-
Installing package 'Text-CSV-Simple'...
Error installing package 'Text-CSV-Simple':
Could not locate a PPD file for package
Text-CSV-Simple
Only a short while before I had no trouble with "ppm install
HTML::CalendarMonthSimple". How do I fix the above and get the CSV module?
Rgds, GStC.
|
|
|
|
|