Home > Archive > PERL Beginners > July 2006 > Script Optimization file parsing script
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Script Optimization file parsing script
|
|
| Ajay Nagrale 2006-07-12, 6:57 pm |
| Hi,
I am working on optimization of one of the file parsing script. There =
are around 4,50,000 lines present in the input file. This file size is =
bound to increase in coming days. New entries would be added every 2 =
minutes.=20
Current script is taking around 60 seconds (average) and 150 seconds =
(max) time for parsing the input file and writing into the output file. =
Since this script is executed every two minutes (have to :( and very =
important script), times in seconds itself is costing me.
The flow in the script is something like this:
1. Open the input file handle using the open function
2. In while loop parse the entries (using the file handle directly in =
while), parse the input entries. Do the sanity check required (sanity =
check involved is a combination of specific line format and a few =
regular expression check). If sanity check is successful, load the =
required entries into the hash. Close the input file handle.
3. Open the output file handle
4. Sorting of the hash in the required order.
5. Print hash into the file
6. Close output file handle.
Any help/suggestion would be appreciated.
Thanks,
Ajay
| |
| Mr. Shawn H. Corey 2006-07-12, 6:57 pm |
| Nagrale, Ajay wrote:
> The flow in the script is something like this:
>
> 1. Open the input file handle using the open function
> 2. In while loop parse the entries (using the file handle
> directly in while), parse the input entries. Do the sanity
> check required (sanity check involved is a combination of
> specific line format and a few regular expression check). If
> sanity check is successful, load the required entries into the
> hash. Close the input file handle.
> 3. Open the output file handle
> 4. Sorting of the hash in the required order.
> 5. Print hash into the file
> 6. Close output file handle.
>
> Any help/suggestion would be appreciated.
* Keep track of the last item in the file via tell() and s () that
location at the next run.
* Keep the previous sorted output in a file with the format: sort_key
(delimiter) output_line
* Merge sort the new entries with the old output to produce the output
and a new sorted file.
--
__END__
Just my 0.00000002 million dollars worth,
--- Shawn
"For the things we have to learn before we can do them, we learn by
doing them."
Aristotle
* Perl tutorials at http://perlmonks.org/?node=Tutorials
* A searchable perldoc is at http://perldoc.perl.org/
| |
| Dr.Ruud 2006-07-12, 6:57 pm |
| "Nagrale, Ajay" schreef:
> I am working on optimization of one of the file parsing script. There
> are around 4,50,000 lines present in the input file. This file size
> is bound to increase in coming days. New entries would be added every
> 2 minutes.
>
> Current script is taking around 60 seconds (average) and 150 seconds
> (max) time for parsing the input file and writing into the output
> file. Since this script is executed every two minutes (have to :( and
> very important script), times in seconds itself is costing me.
>
> The flow in the script is something like this:
>
> 1. Open the input file handle using the open function
> 2. In while loop parse the entries (using the file handle directly in
> while), parse the input entries. Do the sanity check required (sanity
> check involved is a combination of specific line format and a few
> regular expression check). If sanity check is successful, load the
> required entries into the hash. Close the input file handle.
> 3. Open the output file handle
> 4. Sorting of the hash in the required order.
> 5. Print hash into the file
> 6. Close output file handle.
>
> Any help/suggestion would be appreciated.
Does the file have (mostly) totally new data, evey 2 minutes? If so, go
to (2).
(1) You re-process old lines over and over again. For what?
Cache the old lines in a database, only insert (or append) the new data,
create the output.
(2) I would use a yacc/lex solution, because that normally does it in a
few seconds.
--
Affijn, Ruud
"Gewoon is een tijger."
|
|
|
|
|