For Programmers: Free Programming Magazines  


Home > Archive > PERL Miscellaneous > December 2006 > Speeding up an application - general rules









You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

 

Author Speeding up an application - general rules
Petyr David

2006-12-21, 10:02 pm

I have a small Perl application that searches through a series of
directories chosen by the user for files containing a pattern or group
of patterns. The file names and matching patterns are returned to the
user sorted by the file's modification time.The user also has the
choice of how far back in time to search and how many lines of output
he wants to see for each file.

With an expected and current increase of files and file sizes, the
application is bogging down a bit. I didn't design it with performance
in mind and I will be reviewing what I've done, but are there general
rules or specific suggestions you could offer to enhance performance?

Basically: the script uses perl's system command to run a long winded
"find" command which is piped to sed to correct patterns that match
HTML markers. The matching lines are then shoved into an array. The
elements of the array are moved into a hash for the purpose of sorting
the file names. Then file names and matching lines are printed.

Q: Can I speed things by eliminating the sed command and letting Perl
filter and modify the matching patterns? If so, how much of a
performance gain?

Is using Perl's grep to search through every file for the pattern
faster than using the find command? The find command has the advantage
that I can search for files of a certain date rather easily. Again:
could that be done more rapidly by Perl's looking at the file's mod
time?

Any thoughts or suggestions would be appreciated

TX

xhoster@gmail.com

2006-12-22, 4:03 am

"Petyr David" <phynkel@gmail.com> wrote:

> Basically: the script uses perl's system command to run a long winded
> "find" command which is piped to sed to correct patterns that match
> HTML markers. The matching lines are then shoved into an array. The
> elements of the array are moved into a hash for the purpose of sorting
> the file names. Then file names and matching lines are printed.
>
> Q: Can I speed things by eliminating the sed command and letting Perl
> filter and modify the matching patterns?


Probably not. It should be a 30 second job to take out the sed pipe.
Sure, the answers will now be wrong, but unless it gives the wrong answers
much faster than it used to, you will know there is no speed benefit to be
had by rewriting the sed into Perl.

> If so, how much of a
> performance gain?
>
> Is using Perl's grep to search through every file for the pattern
> faster than using the find command?


Probably not. Also, Perl's grep (currently) forces the list to be
evaluated to completion (in memory) before it gets started, so potentially
takes much more memory. You may want to look at Perl's File::Find,
although I see no particular reason to think it will be faster than the
system's find.

> The find command has the advantage
> that I can search for files of a certain date rather easily. Again:
> could that be done more rapidly by Perl's looking at the file's mod
> time?


Probably not more rapidly, no.

What is the total CPU usage? What is the relative usage of each process
(perl, find, sed)?

Xho

--
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service $9.95/Month 30GB
Todd W

2006-12-22, 4:03 am


"Petyr David" <phynkel@gmail.com> wrote in message
news:1166757223.858558.144370@48g2000cwx.googlegroups.com...
>I have a small Perl application that searches through a series of
> directories chosen by the user for files containing a pattern or group
> of patterns. The file names and matching patterns are returned to the
> user sorted by the file's modification time.The user also has the
> choice of how far back in time to search and how many lines of output
> he wants to see for each file.
>
> With an expected and current increase of files and file sizes, the
> application is bogging down a bit. I didn't design it with performance
> in mind and I will be reviewing what I've done, but are there general
> rules or specific suggestions you could offer to enhance performance?
>
> Basically: the script uses perl's system command to run a long winded
> "find" command which is piped to sed to correct patterns that match
> HTML markers. The matching lines are then shoved into an array. The
> elements of the array are moved into a hash for the purpose of sorting
> the file names. Then file names and matching lines are printed.
>
> Q: Can I speed things by eliminating the sed command and letting Perl
> filter and modify the matching patterns? If so, how much of a
> performance gain?
>
> Is using Perl's grep to search through every file for the pattern
> faster than using the find command? The find command has the advantage
> that I can search for files of a certain date rather easily. Again:
> could that be done more rapidly by Perl's looking at the file's mod
> time?
>
> Any thoughts or suggestions would be appreciated


The conventional way of doing what you are proposing is some how building an
index of the files. Your index interface then gives you pointers to results
when a search is performed. If the data changes regularly, you also have to
regularly reindex your files.

I've been using htdig in some form or another to accomplish what you
suggest.

Your post, though, caused me to take another look on CPAN for relevant
modules as I was sure the state of this technology has improved since I
decided to use htdig (several years ago). The following module looks very
promising:

http://search.cpan.org/~dpavlin/Search-Estraier-0.08/

I think I'm going to give it a try as my next search engine. Heres another
one that looks interesting:

http://search.cpan.org/~snkwatt/Search-FreeText-0.05/

I found these modules by going to:

http://search.cpan.org/search?query=search&mode=all

Enjoy,

Todd W.


Petyr David

2006-12-22, 8:02 am


I haven't yet done any measuring of the CPU usage for the processes,
but will look into that -TX. I just heard yesterday (the day of my
post) that the application was bogging down. When I do my testing, I'm
working with live, production data, but typically limit my search to
one of three patterns and do it on only one or two directories. I want
to get my results back quickly. The users of this app apparently make
heavy use of this and are looking for the "needle in a haystack".

On Dec 22, 1:26 am, xhos...@gmail.com wrote:
> "Petyr David" <phyn...@gmail.com> wrote:
>
> Sure, the answers will now be wrong, but unless it gives the wrong answers
> much faster than it used to, you will know there is no speed benefit to be
> had by rewriting the sed into Perl.
>
>
> evaluated to completion (in memory) before it gets started, so potentially
> takes much more memory. You may want to look at Perl's File::Find,
> although I see no particular reason to think it will be faster than the
> system's find.
>
>
> What is the total CPU usage? What is the relative usage of each process
> (perl, find, sed)?
>
> Xho
>
> --
> --------------------http://NewsReader.Com/--------------------
> Usenet Newsgroup Service $9.95/Month 30GB


Petyr David

2006-12-22, 8:02 am

I will also review those URLS. Creating an app that did indexing of the
files did not come up as this script came from a far simpler one that
merely found files matching the single pattern and printed a link to
the file. I also don't have the time to make this a full time job.
Something was needed quick and dirty and that's what they got : -)

TX

On Dec 22, 4:28 am, "Todd W" <t...@sbcglobal.net> wrote:
> "Petyr David" <phyn...@gmail.com> wrote in messagenews:1166757223.858558.144370@48g2000cwx.googlegroups.com...
>
>
>
>
>
>
>
>
> index of the files. Your index interface then gives you pointers to results
> when a search is performed. If the data changes regularly, you also have to
> regularly reindex your files.
>
> I've been using htdig in some form or another to accomplish what you
> suggest.
>
> Your post, though, caused me to take another look on CPAN for relevant
> modules as I was sure the state of this technology has improved since I
> decided to use htdig (several years ago). The following module looks very
> promising:
>
> http://search.cpan.org/~dpavlin/Search-Estraier-0.08/
>
> I think I'm going to give it a try as my next search engine. Heres another
> one that looks interesting:
>
> http://search.cpan.org/~snkwatt/Search-FreeText-0.05/
>
> I found these modules by going to:
>
> http://search.cpan.org/search?query=search&mode=all
>
> Enjoy,
>
> Todd W.


Eric Schwartz

2006-12-22, 7:03 pm

"Petyr David" <phynkel@gmail.com> writes:
> Basically: the script uses perl's system command to run a long winded
> "find" command which is piped to sed to correct patterns that match
> HTML markers.


You are unclear here, which is why we generally ask you to post
example code. In fact, it's really kinda hard to say anything for
sure because you didn't. I'm not sure, for instance, if you pipe the
output of find to sed, or if you iterate over the list of files
returned by find and run sed on the contents of those files. I'm
guessing the former, but it's just a guess. If you want people to be
able to help you the best way possible, you probably don't want to
make them guess.

> The matching lines are then shoved into an array.


Which lines? Are you talking about contents of the files, or names of
files? Now I think you're talking about contents. It would help if
you were more clear.

> The elements of the array are moved into a hash for the purpose of
> sorting the file names.


Er, now I think you're talking about file names.

> Then file names and matching lines are printed.


Now I have no idea. What are you actually doing? Can you please show
some code?

> Q: Can I speed things by eliminating the sed command and letting Perl
> filter and modify the matching patterns? If so, how much of a
> performance gain?


Honestly, rather than asking us, you should ask Perl. The answer to
"how do I speed things up?" is profile profile profile! Until you
profile, you don't know what will help.

'perldoc -q profile' mentions the Devel::DProf module, and you can use
'perldoc Devel::DProf' to find out more about it. You'll also want to
learn about the Benchmark module ('perldoc Benchmark'), which will
help you compare two different ways of doing the same thing to find
out which is faster.

> Is using Perl's grep to search through every file for the pattern
> faster than using the find command?


Wait, are you on Windows? It's been a very long time, but I vaguely
recall that the Windows 'find' command searches in files, whereas the
Unix one mostly just looks at file names and metadata.

> The find command has the advantage that I can search for files of a
> certain date rather easily. Again: could that be done more rapidly
> by Perl's looking at the file's mod time?


Those questions really depend on uch a large a number of things,
including your system's OS, configuration, load for other tasks, etc,
that it's almost impossible for anyone to tell you for certain.
Honestly, even if somebody were to give you an answer here, I wouldn't
believe them-- they may be telling you what worked for them, but it
might not be the same for you. Profile, then optimize the
worst-performing part, then profile again, optimize what's left, and
repeat. Take care that in optimizing one part you don't make another
slower-- but that's all part of the art, really.s

> Any thoughts or suggestions would be appreciated


Enjoy. But next time, please post some code, so we can actulaly tell
what you're doing. Making people guess and make stuff up is
frustrating for us, because we can't tell if we're guessing right, or
going completely off the deep end. I hope I was helpful anyway.

-=Eric
Ric

2006-12-22, 7:03 pm

Petyr David schrieb:
> I will also review those URLS. Creating an app that did indexing of the
> files did not come up as this script came from a far simpler one that
> merely found files matching the single pattern and printed a link to
> the file. I also don't have the time to make this a full time job.
> Something was needed quick and dirty and that's what they got : -)


I just took a quick look at your problem description, not sure what your
needs are, but have you considered using a desktop search engine to do
the work for you?

http://beagle-project.org/Searching_Data



>
> TX
>
> On Dec 22, 4:28 am, "Todd W" <t...@sbcglobal.net> wrote:
>

Michele Dondi

2006-12-23, 4:40 am

On 21 Dec 2006 19:13:43 -0800, "Petyr David" <phynkel@gmail.com>
wrote:

>Basically: the script uses perl's system command to run a long winded

^^^^^^
^^^^^^

>"find" command which is piped to sed to correct patterns that match
>HTML markers. The matching lines are then shoved into an array. The

^^^^^
^^^^^

Wait a minute! There's something fundamental missing: system() won't
return "lines", so how are matching lines shoved into an *array*?


Michele
--
{$_=pack'B8'x25,unpack'A8'x32,$a^=sub{po
p^pop}->(map substr
(($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
..'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,
Petyr David

2006-12-24, 7:03 pm

You're correct: I use Perl's back tics to take output from a command
that looks similar to this to populate an array:

my @filepatterns=`find $subdir -n $days -type f -exec egrep $pattern {}
\; |sed "s/somepattern/diffpattern"`

I like using the find command because I can also control how many days
to go back in my search. I will also check Devl::DProf

On Dec 22, 12:15 am, Eric Schwartz <emsch...@pobox.com> wrote:
> "Petyr David" <phyn...@gmail.com> writes:
> example code. In fact, it's really kinda hard to say anything for
> sure because you didn't. I'm not sure, for instance, if you pipe the
> output of find to sed, or if you iterate over the list of files
> returned by find and run sed on the contents of those files. I'm
> guessing the former, but it's just a guess. If you want people to be
> able to help you the best way possible, you probably don't want to
> make them guess.
>
> files? Now I think you're talking about contents. It would help if
> you were more clear.
>
>
> some code?
>
> "how do I speed things up?" is profile profile profile! Until you
> profile, you don't know what will help.
>
> 'perldoc -q profile' mentions the Devel::DProf module, and you can use
> 'perldoc Devel::DProf' to find out more about it. You'll also want to
> learn about the Benchmark module ('perldoc Benchmark'), which will
> help you compare two different ways of doing the same thing to find
> out which is faster.
>
> recall that the Windows 'find' command searches in files, whereas the
> Unix one mostly just looks at file names and metadata.
>
> including your system's OS, configuration, load for other tasks, etc,
> that it's almost impossible for anyone to tell you for certain.
> Honestly, even if somebody were to give you an answer here, I wouldn't
> believe them-- they may be telling you what worked for them, but it
> might not be the same for you. Profile, then optimize the
> worst-performing part, then profile again, optimize what's left, and
> repeat. Take care that in optimizing one part you don't make another
> slower-- but that's all part of the art, really.s
>
> what you're doing. Making people guess and make stuff up is
> frustrating for us, because we can't tell if we're guessing right, or
> going completely off the deep end. I hope I was helpful anyway.
>
> -=Eric


Petyr David

2006-12-24, 7:03 pm

This litle app is web based and is going against files on a Red Hat
Server's NFS file system. I suppose I could use Samba ...

On Dec 22, 7:56 pm, Ric <antis...@randometry.com> wrote:[color=darkred]
> Petyr David schrieb:
>
> needs are, but have you considered using a desktop search engine to do
> the work for you?
>
> http://beagle-project.org/Searching_Data
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>

Michele Dondi

2006-12-24, 7:03 pm

On 24 Dec 2006 13:54:10 -0800, "Petyr David" <phynkel@gmail.com>
wrote:

>You're correct: I use Perl's back tics to take output from a command
>that looks similar to this to populate an array:
>
>my @filepatterns=`find $subdir -n $days -type f -exec egrep $pattern {}
>\; |sed "s/somepattern/diffpattern"`


That can be fine under certain circumstances. Certainly you're using
perl as a shell script. As a general rule, though, a shell is fine for
shell scripts and perl is fine for Perl scripts. In this case what you
are doing can be done perfectly well and easily enough with File::Find
(or one of its cousins) and Perl builtins. Chances are that F::F in
and of itself may be slightly slower than egrep, which is a compiled
and supposedly efficient program. However you will avoid the overhead
of launching a process for every examined file, so things may well
balance out in the end...

>I like using the find command because I can also control how many days
>to go back in my search. I will also check Devl::DProf


You can do that in perl as well. You may want to read

perldoc -f -X
[color=darkred]
>On Dec 22, 12:15 am, Eric Schwartz <emsch...@pobox.com> wrote:
[snip full quoted content]

*PLEASE* do not top-post!


Michele
--
{$_=pack'B8'x25,unpack'A8'x32,$a^=sub{po
p^pop}->(map substr
(($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
..'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,
Sponsored Links







Also available: Server administration forum archive | Web Design forum archive | Software forum archive | Hardware reviews archive

Copyright 2008 codecomments.com