Code Comments
Programming Forum and web based access to our favorite programming groups.I have a directory which has output reports in them. The big problem is t= hat they are just a bunch of letters and numbers like 2781424330fdac47c916.= xls. So the user would like a more recognizable file name. This is to run on Solaris w/ Perl 5.8.3. I have tried the following two m= odules on my PC using ActiveState 5.8.3 under XP: Spreadsheet::ParseExcel ( 15 minutes plus to read 4 files size 4 to 12 me= g) and Spreadsheet::ParseExcel::Simple ( 4 minutes for the same files as Pa= rseExcel ) I am running my first test on Solaris using Simple and the first file took= 5 minutes to process. Still have 3 to do.=20 Any suggestions on what to do? The key for naming the file is ( believe = it or not ) the last column of the first row on worksheet 1. Any questions and/or problems, please let me know. Thanks! Wags ;) WGO: x2224 **************************************** *************** This message contains information that is confidential and proprietary to FedEx Freight or its affiliates. It is intended only for the recipient named and for the express purpose(s) described therein. Any other use is prohibited. **************************************** ***************
Post Follow-up to this messageOn Tue, 29 Mar 2005 13:30:57 -0800, Wagner, David --- Senior Programmer Analyst --- WGO <David.Wagner@freight.fedex.com> wrote: > I have a directory which has output reports in them. The big prob lem is that they are just a bunch of letters and numbers like 2781424330fdac 47c916.xls. So the user would like a more recognizable file name. > > This is to run on Solaris w/ Perl 5.8.3. I have tried the followi ng two modules on my PC using ActiveState 5.8.3 under XP: > Spreadsheet::ParseExcel ( 15 minutes plus to read 4 files size 4 to 12 meg) and Spreadsheet::ParseExcel::Simple ( 4 minutes for the same file s as ParseExcel ) > > I am running my first test on Solaris using Simple and the first f ile took 5 minutes to process. Still have 3 to do. > > Any suggestions on what to do? The key for naming the file is ( b elieve it or not ) the last column of the first row on worksheet 1. > > Any questions and/or problems, please let me know. Hey Wags, I'm kind ofon what you're asking help for. Are you asking for help on getting the last column of the first row on worksheet 1 for each file you have? If so, my suggestion is to use File::Find to parse down through your directory and use a regex to identify the files you want, then use Spreadsheet::ParseExcel to read the first worksheet and get the first row. Use the code on http://search.cpan.org/~kwitknr/Spr...3/ParseExcel.pm in the Synopsis section and play around with the for loops to have it break after the first row. Here's a great column on using File::Find. http://www.stonehenge.com/merlyn/LinuxMag/col45.html Sorry for the lack of syntax, but if you need some later, let me know! Hope this helps, Kevin -- Kevin Old kevinold@gmail.com
Post Follow-up to this messageKevin Old wrote: > On Tue, 29 Mar 2005 13:30:57 -0800, Wagner, David --- Senior > Programmer Analyst --- WGO <David.Wagner@freight.fedex.com> wrote: >=20 > Hey Wags, >=20 > I'm kind ofon what you're asking help for. Are you asking > for help on getting the last column of the first row on worksheet 1 > for each file you have? >=20 > If so, my suggestion is to use File::Find to parse down through your > directory and use a regex to identify the files you want, then use > Spreadsheet::ParseExcel to read the first worksheet and get the first > row. Sorry for not being clear. I can find the files I want, but to open them = takes way too long. On my test node, to open the four files and get the fi= rst line of worksheet 1 took almost 38 minutes. Any other file processing = would be over almost before it got started, but with the MS setup it does n= ot work that way. Like I stated, I have tried both ParseExcel and parseExc= el::Simple and each for some reason reads in the whole file at once. =20 Is there a way to tell either of these modules to just open and allow me t= o read what I need and then stop. Also it can take up to another 5 to 10 m= inutes after my program says it is done, for the prompt to return also. Thanks for the reply, but is there any way to easily read something from M= S and NOT have the overhead of the whole file being stuffed into memory. Wags ;) >=20 > Use the code on > http://search.cpan.org/~kwitknr/Spr...03/ParseExcel.= pm > in the Synopsis section and play around with the for loops to have it > break after the first row. >=20 > Here's a great column on using File::Find. > http://www.stonehenge.com/merlyn/LinuxMag/col45.html >=20 > Sorry for the lack of syntax, but if you need some later, let me know! >=20 > Hope this helps, > Kevin **************************************** *************** This message contains information that is confidential and proprietary to FedEx Freight or its affiliates. It is intended only for the recipient named and for the express purpose(s) described therein. Any other use is prohibited. **************************************** ***************
Post Follow-up to this messageOn Wed, 30 Mar 2005 09:04:23 -0800, Wagner, David --- Senior Programmer Analyst --- WGO wrote: > > Sorry for not being clear. I can find the files I want, but to op en them takes way too long. > On my test node, to open the four files and get the first line of workshee t 1 took almost 38 > minutes. Any other file processing would be over almost before it got sta rted, but with the MS > setup it does not work that way. Like I stated, I have tried both ParseEx cel and > parseExcel::Simple and each for some reason reads in the whole file at onc e. > > Is there a way to tell either of these modules to just open and al low me to read what I > need and then stop. Also it can take up to another 5 to 10 minutes after my program says it is > done, for the prompt to return also. > > Thanks for the reply, but is there any way to easily read somethin g from MS and NOT > have the overhead of the whole file being stuffed into memory. > > Wags ;) Hi Wags, The following code, when reading a 3Meg file, took 21 seconds to run on a strong Linux machine, so figure x4 the time since you're running on Solaris, and another x4 for your 12Meg files, for a total of about 5 and a half minutes worst case. Here's the code: ################## begin code use strict; use warnings; use Spreadsheet::ParseExcel::Simple; my $xls_file = "filename.xls"; my $xls = Spreadsheet::ParseExcel::Simple->read($xls_file) or die "Couldn't read $xls_file: $!\n"; my $sheet = ($xls->sheets)[0]; # Sheet 1 my $data = ($sheet->next_row)[-1]; # last col of row 1 print "The name is: $data\n"; ################## end code I'd be interested in hearing what your runtime is. -- Offer Kaye
Post Follow-up to this messageDavid Van Ginneken wrote: > This node may help you. >=20 > http://www.perlmonks.org/index.pl?node_id=3D379743 >=20 > Basically you create a custom cell handler. The one I use looks > something like this. David responded but only to me, but if you need to work with Excel and = only need to search and be selective about it, then this code is a great = time saver. I looked at your code and that was good, but when I went and took a = look at the messge at the url above, then I found about also getting out = as soon as I found what I was looking for. From about 38 minutes to less = than a minute for the same four files. =20 Thanks. Wags ;) >=20 > sub cell_handler { > my $workbook =3D $_[0]; > my $sheet_index =3D $_[1]; > my $row =3D $_[2]; > my $col =3D $_[3]; > my $cell =3D $_[4]; > return unless (($col =3D=3D 2)||($col =3D=3D 3)); # Only take the = 3rd and > 4th columns. > return if $row =3D=3D 0; # Ignore the header row > $row =3D sprintf("%05s" , $row); > $col =3D sprintf("%05s" , $col); > $tmpcallmap{$row}{$col} =3D $cell->{_Value}; > } >=20 > Which populates a hash with the columns I wanted from the report > skipping the 1st row. You could easily adapt something like this to > ignore other worksheets, rows, columns, etc.. >=20 >=20 >=20 > On Wed, 30 Mar 2005 09:04:23 -0800, Wagner, David --- Senior > Programmer Analyst --- WGO <David.Wagner@freight.fedex.com> wrote: http://search.cpan.org/~kwitknr/Spr...03/ParseExcel.= pm > play around with the for loops to have it
Post Follow-up to this messageThe Spreadsheet::ParseExcel will dump all the worksheets one after the other into one big file. Is this your observation. The first line of the file will be the first row of worksheet 1. From this the last column can be extracted which contains the file name. If the user still insists on the files retained as excel files, then Spreadsheet::WriteExcel can write the results back as excel files. In Windows, Win32::OLE is used to do this kind of work. HTH, Alfred, Kevin Old wrote: >On Tue, 29 Mar 2005 13:30:57 -0800, Wagner, David --- Senior >Programmer Analyst --- WGO <David.Wagner@freight.fedex.com> wrote: > > > >Hey Wags, > >I'm kind ofon what you're asking help for. Are you asking >for help on getting the last column of the first row on worksheet 1 >for each file you have? > >If so, my suggestion is to use File::Find to parse down through your >directory and use a regex to identify the files you want, then use >Spreadsheet::ParseExcel to read the first worksheet and get the first >row. > >Use the code on >http://search.cpan.org/~kwitknr/Spr...3/ParseExcel.pm >in the Synopsis section and play around with the for loops to have it >break after the first row. > >Here's a great column on using File::Find. >http://www.stonehenge.com/merlyn/LinuxMag/col45.html > >Sorry for the lack of syntax, but if you need some later, let me know! > >Hope this helps, >Kevin > >
Post Follow-up to this messageAlfred Vahau wrote: > The Spreadsheet::ParseExcel will dump all the worksheets one after the > other into one big file. Is this your observation. > The first line of the file will be the first row of worksheet 1. From > this the last column can be extracted which contains the > file name. >=20 > If the user still insists on the files retained as excel files, then > Spreadsheet::WriteExcel can write the results back as excel files. >=20 > In Windows, Win32::OLE is used to do this kind of work. >=20 > HTH, >=20 > Alfred, Thanks, Alfred, but the email from David Van Ginneken concerning an email = on PerlMonks got me the first line and also out of the file in quick order. I do appreciate the followup though. Wags ;) >=20 >=20 >=20 > Kevin Old wrote: >=20 .pm **************************************** *************** This message contains information that is confidential and proprietary to FedEx Freight or its affiliates. It is intended only for the recipient named and for the express purpose(s) described therein. Any other use is prohibited. **************************************** ***************
Post Follow-up to this message
Show a Printable Version
Email This Page to Someone!
Receive updates to this thread
Powered by vBulletin
Copyright 2000-2006 Jelsoft Enterprises Limited.