Code Comments
Programming Forum and web based access to our favorite programming groups.I'm a C/C++ programmer, with a perl script I'm trying to optimize. I
don't know enough about perl syntax and string/buffer handling
functions to feel comfortable doing this on my own. I imagine adding
some 'regular expressions' would be ideal, in place of my while()
loop. If anyone wouldn't mind giving me a hand with this, I'd really
appreciate it, and I also figure at the very least... by posting this
code it might help someone else looking for a solution like this (even
if my code is really slow).
I have a file on my web-site that users can download using this
script, and only by using this script. The script injects some
information in to the file while sending it, since I need different
info for each user and didn't really want to mess with temp files or
anything like that. This script works great, but it takes several
minutes to download a 5 MB file, which I can download directly (bypass
the script) in about 30 seconds.
$hostname is set to whatever I need to inject by some code above this
part, and $search is the data I need to replace with $hostname (they
are always the same length, so it never changes the size of the
downloaded file). The loop at the end is what I imagine needs to be
faster. I'm not sure if it would be safe or possible to read in the
entire file in one block, then just work on that buffer... would
probably use up too much memory on the server if I did that.
# Open the input file for reading
open IN, "< $infile"
or die "Can't open $infile: $!";
# Our EXE is a binary file
binmode IN;
# Get the file info
my @info = stat(IN);
my $length = $info[7];
my $blksize = 1;
# Send the HTTP header
print "Content-type: application/octet-stream\n";
print "Content-Length: $length\n";
print "Content-Disposition: attachment; filename=\"webinst.exe\"\n";
print "\n";
# Make sure that the output is written in
# binary mode and buffered
binmode STDOUT;
select STDOUT; $|=1;
# This is used to track our progress while searching for $search
string.
my $sx = 0;
# Read each chunk of the input file
# and pass it to the browser
# This will replace each occurance of '$search' with '$hostname'
# as it sends the file.
my $buffer;
while (!eof(IN))
{
read(IN, $buffer, $blksize);
my $ch1 = substr($buffer,0,1);
my $ch2 = substr($search,$sx,1);
if(uc($ch1) eq uc($ch2))
{
$sx++;
if($sx eq length($search))
{
$sx = 0;
print $hostname;
}
}
else
{
if($sx > 0)
{
print substr($search,0,$sx);
$sx = 0;
}
print $buffer;
}
}
close IN;
exit 0;
Post Follow-up to this messageeselk@surfbest.net wrote:
> $hostname is set to whatever I need to inject by some code above this
> part, and $search is the data I need to replace with $hostname (they
> are always the same length, so it never changes the size of the
> downloaded file). The loop at the end is what I imagine needs to be
> faster. I'm not sure if it would be safe or possible to read in the
> entire file in one block, then just work on that buffer... would
> probably use up too much memory on the server if I did that.
>
Do you want to replace _every_ occurrence of $search with $hostname?
> # Open the input file for reading
> open IN, "< $infile"
> or die "Can't open $infile: $!";
>
> # Our EXE is a binary file
> binmode IN;
>
> # Get the file info
> my @info = stat(IN);
> my $length = $info[7];
> my $blksize = 1;
>
> # Send the HTTP header
> print "Content-type: application/octet-stream\n";
> print "Content-Length: $length\n";
> print "Content-Disposition: attachment; filename=\"webinst.exe\"\n";
> print "\n";
>
> # Make sure that the output is written in
> # binary mode and buffered
> binmode STDOUT;
> select STDOUT; $|=1;
Don't do this. STDOUT is already selected. You don't have to select it unl
ess you are using formats.
Don't set $|. Most of the time this slows things down, not speed them up.
>
> # This is used to track our progress while searching for $search
> string.
> my $sx = 0;
>
> # Read each chunk of the input file
> # and pass it to the browser
> # This will replace each occurance of '$search' with '$hostname'
> # as it sends the file.
> my $buffer;
> while (!eof(IN))
> {
> read(IN, $buffer, $blksize);
$buffer =~ s{$search}{$hostname}g;
> print $buffer;
> }
> }
>
> close IN;
> exit 0;
>
>
--
Just my 0.00000002 million dollars worth,
Shawn
"For the things we have to learn before we can do them, we learn by doing th
em."
Aristotle
Post Follow-up to this messageeselk@surfbest.net wrote:
> I'm a C/C++ programmer, with a perl script I'm trying to optimize. I
> don't know enough about perl syntax and string/buffer handling
> functions to feel comfortable doing this on my own.
Perl's open()/read() functions are equivalent to C's fopen()/fread() and so
the IO is buffered. Perl's sysopen()/sysread() functions are equivalent to
C's open()/read() and so the IO is not buffered.
perldoc perlopentut
perldoc -f open
perldoc -f read
perldoc -f sysopen
perldoc -f sysread
> I imagine adding
> some 'regular expressions' would be ideal, in place of my while()
> loop. If anyone wouldn't mind giving me a hand with this, I'd really
> appreciate it, and I also figure at the very least... by posting this
> code it might help someone else looking for a solution like this (even
> if my code is really slow).
>
> I have a file on my web-site that users can download using this
> script, and only by using this script. The script injects some
> information in to the file while sending it, since I need different
> info for each user and didn't really want to mess with temp files or
> anything like that. This script works great, but it takes several
> minutes to download a 5 MB file, which I can download directly (bypass
> the script) in about 30 seconds.
>
> $hostname is set to whatever I need to inject by some code above this
> part, and $search is the data I need to replace with $hostname (they
> are always the same length, so it never changes the size of the
> downloaded file). The loop at the end is what I imagine needs to be
> faster. I'm not sure if it would be safe or possible to read in the
> entire file in one block, then just work on that buffer... would
> probably use up too much memory on the server if I did that.
>
> # Open the input file for reading
> open IN, "< $infile"
> or die "Can't open $infile: $!";
You should probably use the three argument open (and you can skip the separa
te
binmode().)
open IN, '<:raw', $infile or die "Can't open $infile: $!";
> # Our EXE is a binary file
> binmode IN;
>
> # Get the file info
> my @info = stat(IN);
> my $length = $info[7];
You don't need the @info array, just assign the eighth element directly:
my $length = ( stat IN )[ 7 ];
Or just use the -s file test operator:
my $length = -s IN;
> my $blksize = 1;
You are setting your read $buffer to one byte!? Even in C with buffered IO
reading one byte at a time is usually a bad idea.
> # Send the HTTP header
> print "Content-type: application/octet-stream\n";
> print "Content-Length: $length\n";
> print "Content-Disposition: attachment; filename=\"webinst.exe\"\n";
> print "\n";
>
> # Make sure that the output is written in
> # binary mode and buffered
> binmode STDOUT;
> select STDOUT; $|=1;
>
> # This is used to track our progress while searching for $search
> string.
> my $sx = 0;
>
> # Read each chunk of the input file
> # and pass it to the browser
> # This will replace each occurance of '$search' with '$hostname'
> # as it sends the file.
> my $buffer;
> while (!eof(IN))
> {
> read(IN, $buffer, $blksize);
In perl you almost *never* need to use the eof() function. That is usually
written as:
while ( read IN, $buffer, $blksize )
{
But even that doesn't report any errors that read() may encounter.
while ( my $read = read IN, $buffer, $blksize )
{
defined $read or die "Cannot read from '$infile' $!";
> my $ch1 = substr($buffer,0,1);
Since $buffer only contains one byte there is no point in using substr().
> my $ch2 = substr($search,$sx,1);
>
> if(uc($ch1) eq uc($ch2))
> {
> $sx++;
> if($sx eq length($search))
$sx and length($search) are numbers so you should use the correct comparison
operator:
if ( $sx == length( $search ) )
> {
> $sx = 0;
> print $hostname;
> }
> }
> else
> {
> if($sx > 0)
> {
> print substr($search,0,$sx);
> $sx = 0;
> }
> print $buffer;
> }
> }
>
> close IN;
> exit 0;
If possible you should try reading the entire file into a scalar and use the
substitution operator or at least use a buffer larger than one byte.
John
--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order. -- Larry Wall
Post Follow-up to this messageJohn W. Krahn wrote:
>
> In perl you almost *never* need to use the eof() function. That is
> usually written as:
>
> while ( read IN, $buffer, $blksize )
> {
>
> But even that doesn't report any errors that read() may encounter.
>
> while ( my $read = read IN, $buffer, $blksize )
> {
> defined $read or die "Cannot read from '$infile' $!";
If $read is undefined then while test will already have failed.
my $read;
while ( $read = read IN, $buffer, $blksize ) {
:
}
defined $read or die "Read from '$infile' failed: $!";
Rob
Post Follow-up to this messageRob Dixon wrote:
> John W. Krahn wrote:
>
> If $read is undefined then while test will already have failed.
>
> my $read;
> while ( $read = read IN, $buffer, $blksize ) {
> :
> }
>
> defined $read or die "Read from '$infile' failed: $!";
Thanks Rob. Sorry for the mistake. :-)
John
--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order. -- Larry Wall
Post Follow-up to this messageThanks to all who replied. I figured reading (and writing) one byte at a time was the bottleneck, but mostly I'm not familiar with perl's buffer handling functions/methods... I just knew enough to compare one byte at a time at least (and I guess even that wasn't optimal since I was using "eq" instead of "=="). Anyway, I'll have to do some reading I guess, probably not a bad language to "get comfortable" with anyway... either that or I'll redo it in C (could do that in 5 minutes), but then I have to mess with Unix compilers and all that mess since it must run on a Unix server (but at least I only need it to run on one specific server).
Post Follow-up to this message
Show a Printable Version
Email This Page to Someone!
Receive updates to this thread
Powered by vBulletin
Copyright 2000-2006 Jelsoft Enterprises Limited.