Home > Archive > PERL Beginners > October 2006 > Querying very big database
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Querying very big database
|
|
| Toddy Prawiraharjo 2006-09-28, 6:57 pm |
| Hello all,
I am not sure if i'm inside the correct mailing list, but it should be
simple for you guys. Recently i started a simple perl program to catch
all syslog from my netscreen firewall, and put them into a database, so
later can do some analysing on it (sort of AWStats). the database itself
is pretty big, with 600k entry (for 3 w s of the running firewall)
worth 80 megs in mysql. My question is, during the analysing my perl
script giving very slow processing. I know the query to mysql itself is
pretty quick, less than 2 seconds to return about 40k to 60k result to
perl to be analysed
Here it goes:
<snipped>
$query = "SELECT src, rcvd FROM no_name WHERE start_date >=
'$fromdate' AND start_date <= '$todate';";
$sth = $dbh->prepare($query);
$sth -> execute() || &errorhere("SQL Error: $DBI::errstr");
print "$query<br>";
my %src = ();
my ($totalsent, $totalrcvd, $srcchecked);
while(@row = $sth->fetchrow_array){
$srcchecked = 0;
while($source, $total = each (%src)){
if ($source eq "$row[0]"){
$srcchecked = 1;
$src{$source} = $src{$source} + $row[1];
}
}
if ($srcchecked != 1){
print "$row[0]<br>";
$src{$row[0]} = $row[1];
}
$totalrcvd = $totalsent + $row[1];
#$totalrcvd = $totalrcvd + $row[2];
}
</snipped>
The while loop to do analysing on the data take more than 15 minutes,
and that only to a query for 1 day long records($fromdate-$todate)
So, if I want perl to give me faster result, what's the solution? Did I
make fundamentally wrong approach?
Thanks in advance for any response
Toddy Prawiraharjo
| |
| Rob Dixon 2006-09-28, 6:57 pm |
| Toddy Prawiraharjo wrote:
> Hello all,
>
> I am not sure if i'm inside the correct mailing list, but it should be
> simple for you guys. Recently i started a simple perl program to catch
> all syslog from my netscreen firewall, and put them into a database, so
> later can do some analysing on it (sort of AWStats). the database itself
> is pretty big, with 600k entry (for 3 w s of the running firewall)
> worth 80 megs in mysql. My question is, during the analysing my perl
> script giving very slow processing. I know the query to mysql itself is
> pretty quick, less than 2 seconds to return about 40k to 60k result to
> perl to be analysed
>
> Here it goes:
> <snipped>
> $query = "SELECT src, rcvd FROM no_name WHERE start_date >=
> '$fromdate' AND start_date <= '$todate';";
> $sth = $dbh->prepare($query);
> $sth -> execute() || &errorhere("SQL Error: $DBI::errstr");
> print "$query<br>";
>
> my %src = ();
> my ($totalsent, $totalrcvd, $srcchecked);
>
> while(@row = $sth->fetchrow_array){
> $srcchecked = 0;
> while($source, $total = each (%src)){
> if ($source eq "$row[0]"){
> $srcchecked = 1;
> $src{$source} = $src{$source} + $row[1];
> }
> }
> if ($srcchecked != 1){
> print "$row[0]<br>";
> $src{$row[0]} = $row[1];
> }
> $totalrcvd = $totalsent + $row[1];
> #$totalrcvd = $totalrcvd + $row[2];
> }
> </snipped>
>
> The while loop to do analysing on the data take more than 15 minutes,
> and that only to a query for 1 day long records($fromdate-$todate)
> So, if I want perl to give me faster result, what's the solution? Did I
> make fundamentally wrong approach?
You've missed the whole point of hashes, which is that they will access a data
value directly from a key. What you've written is similar to
my $i = 0;
my $value;
foreach my $v (@array) {
if ($i++ == $n) {
$value = $v;
}
}
instead of
$value = $array[$n];
Try this while loop instead of the one you have and see if you get an
improvement:
while (@row = $sth->fetchrow_array) {
my $key = $row[0];
print "$key<br>" unless exists $src{$key};
$src{$key} += $row[1];
$totalsent += $row[1];
#$totalrcvd += $row[2];
}
HTH,
Rob
| |
| Toddy Prawiraharjo 2006-09-28, 6:57 pm |
|
THX! I always knew it's my n00b scripting that caused the problem. It
now down to 4 lines inside while loop, and I'm flying! The processing
down from ~15 minutes to 4 secs! But, for longer date range (2 w s
time limit) it clocked at 25 wallclock secs (with processing ~350k mysql
return entries). Any more way to process this beast any faster? Any good
reading or reference about this? Just afraid, if this rolled to
production level, the report for months activities will take minutes to
make!
<snipped>
$query = "SELECT src, sent, rcvd FROM no_name WHERE start_date
>= '$fromdate' AND start_date <= '$todate';";
$sth = $dbh->prepare($query);
$sth -> execute() || &errorhere("SQL Error: $DBI::errstr");
#print "$query<br>";
my %src = ();
my ($totalsent, $totalrcvd);
while(@row = $sth->fetchrow_array){
my $key = $row[0];
$src{$key} += $row[2];
$totalsent = $totalsent + $row[1];
$totalrcvd = $totalrcvd + $row[2];
}
</snipped>
Cheers,
Toddy Prawiraharjo
-----Original Message-----
From: Rob Dixon [mailto:rob.dixon@350.com]
Sent: Friday, 29 September 2006 10:04 AM
To: beginners@perl.org
Subject: Re: Querying very big database
Toddy Prawiraharjo wrote:
> Hello all,
>
> I am not sure if i'm inside the correct mailing list, but it should be
> simple for you guys. Recently i started a simple perl program to catch
> all syslog from my netscreen firewall, and put them into a database,
so
> later can do some analysing on it (sort of AWStats). the database
itself
> is pretty big, with 600k entry (for 3 w s of the running firewall)
> worth 80 megs in mysql. My question is, during the analysing my perl
> script giving very slow processing. I know the query to mysql itself
is
> pretty quick, less than 2 seconds to return about 40k to 60k result to
> perl to be analysed
>
> Here it goes:
> <snipped>
> $query = "SELECT src, rcvd FROM no_name WHERE start_date >=
> '$fromdate' AND start_date <= '$todate';";
> $sth = $dbh->prepare($query);
> $sth -> execute() || &errorhere("SQL Error: $DBI::errstr");
> print "$query<br>";
>
> my %src = ();
> my ($totalsent, $totalrcvd, $srcchecked);
>
> while(@row = $sth->fetchrow_array){
> $srcchecked = 0;
> while($source, $total = each (%src)){
> if ($source eq "$row[0]"){
> $srcchecked = 1;
> $src{$source} = $src{$source} +
$row[1];
> }
> }
> if ($srcchecked != 1){
> print "$row[0]<br>";
> $src{$row[0]} = $row[1];
> }
> $totalrcvd = $totalsent + $row[1];
> #$totalrcvd = $totalrcvd + $row[2];
> }
> </snipped>
>
> The while loop to do analysing on the data take more than 15 minutes,
> and that only to a query for 1 day long records($fromdate-$todate)
> So, if I want perl to give me faster result, what's the solution? Did
I
> make fundamentally wrong approach?
You've missed the whole point of hashes, which is that they will access
a data
value directly from a key. What you've written is similar to
my $i = 0;
my $value;
foreach my $v (@array) {
if ($i++ == $n) {
$value = $v;
}
}
instead of
$value = $array[$n];
Try this while loop instead of the one you have and see if you get an
improvement:
while (@row = $sth->fetchrow_array) {
my $key = $row[0];
print "$key<br>" unless exists $src{$key};
$src{$key} += $row[1];
$totalsent += $row[1];
#$totalrcvd += $row[2];
}
HTH,
Rob
--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
<http://learn.perl.org/> <http://learn.perl.org/first-response>
| |
| nobull67@gmail.com 2006-09-29, 3:57 am |
|
Toddy Prawiraharjo wrote:
> THX! I always knew it's my n00b scripting that caused the problem. It
> now down to 4 lines inside while loop, and I'm flying! The processing
> down from ~15 minutes to 4 secs! But, for longer date range (2 w s
> time limit) it clocked at 25 wallclock secs (with processing ~350k mysql
> return entries). Any more way to process this beast any faster?
Yes, do more of it in SQL. Perl is good but the whole point of an SQL
database is that simple data preprocessing can be done close to the dB.
> Any good reading or reference about this?
There are loads of books (and on-line tutorials) on SQL at all levels.
As with Perl books some are no doubt not worth the shelf space they
occupy. I'm affraid I have no recommendations.
> <snipped>
> $query = "SELECT src, sent, rcvd FROM no_name WHERE start_date
You forgot the my. Is there a reason you don't use placeholders?
[color=darkred]
> $sth = $dbh->prepare($query);
You forgot the my. You forgot to check for errors.
> $sth -> execute() || &errorhere("SQL Error: $DBI::errstr");
Why is that & there?
> my %src = ();
There is no need to explicitly initialise hashes as empty, they start
that way anyhow.
> my ($totalsent, $totalrcvd);
>
> while(@row = $sth->fetchrow_array){
You forgot the my.
> my $key = $row[0];
> $src{$key} += $row[2];
> $totalsent = $totalsent + $row[1];
> $totalrcvd = $totalrcvd + $row[2];
You are allowed to use the += operator more than once in the same
program.
Named scalar variables rather than numeric subscripts into an array can
make your code more readable.
> }
> </snipped>
> Cheers,
Did you perhaps "forget" to put "use strict" and "use warnings" at the
top of your code?
# Untested!
my $query = "SELECT src, SUM(sent), SUM(rcvd) FROM no_name
WHERE start_date >= ? AND start_date <= ? GROUP BY src";
my $sth = $dbh->prepare($query) or errorhere("SQL Error:
$DBI::errstr");
$sth->execute($fromdate,$todate) or errorhere("SQL Error:
$DBI::errstr");
my (%src,$totalsent, $totalrcvd);
while(my ($key,$sent,$rcvd) = $sth->fetchrow_array){
$src{$key} += $rcvd;
$totalsent += $sent;
$totalrcvd += $rcvd;
}
| |
| Rob Dixon 2006-09-29, 3:57 am |
| Toddy Prawiraharjo wrote:
>
> From: Rob Dixon [mailto:rob.dixon@350.com]
> Sent: Friday, 29 September 2006 10:04 AM
> To: beginners@perl.org
> Subject: Re: Querying very big database
>
> THX! I always knew it's my n00b scripting that caused the problem. It
> now down to 4 lines inside while loop, and I'm flying! The processing
> down from ~15 minutes to 4 secs! But, for longer date range (2 w s
> time limit) it clocked at 25 wallclock secs (with processing ~350k mysql
> return entries). Any more way to process this beast any faster? Any good
> reading or reference about this? Just afraid, if this rolled to
> production level, the report for months activities will take minutes to
> make!
>
> <snipped>
> $query = "SELECT src, sent, rcvd FROM no_name WHERE start_date
>
> $sth = $dbh->prepare($query);
> $sth -> execute() || &errorhere("SQL Error: $DBI::errstr");
> #print "$query<br>";
>
> my %src = ();
> my ($totalsent, $totalrcvd);
>
> while(@row = $sth->fetchrow_array){
> my $key = $row[0];
> $src{$key} += $row[2];
> $totalsent = $totalsent + $row[1];
> $totalrcvd = $totalrcvd + $row[2];
> }
> </snipped>
Hi Toddy
(Please bottom-post your replies. One day everybody in the world will know to do
this and perl.beginners will be a happy, shiny place!)
Let the database engine do it for you!
my $query = qq(
SELECT src, SUM(sent) AS totalsent, SUM(rcvd) AS totalrcvd
FROM no_name
WHERE start_date >= '$fromdate' AND start_date <= '$todate'
GROUP BY src
);
my $sth = $dbh->prepare($query);
$sth->execute() || errorhere("SQL Error: $DBI::errstr");
my $src = $sth->fetchall_hashref('src');
foreach my $srcval(keys %$src) {
$totalsent += $src->{$srcval}{totalsent};
$totalrcvd += $src->{$srcval}{totalrcvd};
}
(I couldn't easily test this, although it is syntax checked, so beware)
Now, $src is a reference to a hash similar to your original %src, but has
another hash reference as its values, so you can extract
my $totalsent = $src{src1}{totalsent};
and so on.
HTH,
Rob
| |
| Dr.Ruud 2006-09-29, 7:57 am |
| "Toddy Prawiraharjo" schreef:
Don't top-post. Don't quote text that is no longer relevant, especially
intros and signatures.
> [SQL-emulation in Perl]
> Any more way to process this beast any faster?
> Any good reading or reference about this? Just afraid, if this rolled
> to production level, the report for months activities will take
> minutes to make!
>
> <snipped>
> $query = "SELECT src, sent, rcvd FROM no_name WHERE start_date
> $sth = $dbh->prepare($query);
> $sth -> execute() || &errorhere("SQL Error: $DBI::errstr");
> #print "$query<br>";
>
> my %src = ();
> my ($totalsent, $totalrcvd);
>
> while(@row = $sth->fetchrow_array){
> my $key = $row[0];
> $src{$key} += $row[2];
> $totalsent = $totalsent + $row[1];
> $totalrcvd = $totalrcvd + $row[2];
> }
> </snipped>
You are doing things in Perl that you should do in SQL. Look for GROUP
BY and SUM.
Also check out DBI's prepare().
(Don't call your database "very big" unless some important process takes
more than 24 hours.)
--
Affijn, Ruud
"Gewoon is een tijger."
| |
| Toddy Prawiraharjo 2006-10-03, 4:00 am |
| > Toddy Prawiraharjo wrote:
> but it should be
> program to catch all
> database, so later
> database itself is pretty
> firewall) worth 80 megs in
> script giving very slow
> quick, less than 2
> start_date >=
> $DBI::errstr");
> $src{$source} + $row[1];
> than 15 minutes, and
> records($fromdate-$todate) So, if I
> Did I make
> they will access
> if you get an
> problem. It
> processing
> range (2 w s
> processing ~350k mysql
> faster? Any good
> take minutes to
> WHERE start_date
>
> Hi Toddy
>
> (Please bottom-post your replies. One day everybody in the
> world will know to do
> this and perl.beginners will be a happy, shiny place!)
>
> Let the database engine do it for you!
>
> my $query = qq(
> SELECT src, SUM(sent) AS totalsent, SUM(rcvd) AS totalrcvd
> FROM no_name
> WHERE start_date >= '$fromdate' AND start_date <= '$todate'
> GROUP BY src
> );
>
> my $sth = $dbh->prepare($query);
> $sth->execute() || errorhere("SQL Error: $DBI::errstr");
>
> my $src = $sth->fetchall_hashref('src');
>
> foreach my $srcval(keys %$src) {
> $totalsent += $src->{$srcval}{totalsent};
> $totalrcvd += $src->{$srcval}{totalrcvd};
> }
>
>
> (I couldn't easily test this, although it is syntax checked,
> so beware)
>
> Now, $src is a reference to a hash similar to your original
> %src, but has
> another hash reference as its values, so you can extract
>
> my $totalsent = $src{src1}{totalsent};
>
> and so on.
>
> HTH,
Thanks all!
After many testing, benchmarking, and configs on which part should perl
or sql do, I can clocked that query to under 5 secs for almost 2 months
of available data. Learned the hard-way that perl's must not do
everything by itself.
Ps. Still looking for reference on how awstats (or zabbix or cacti) do
their reporting on many big databases in such a quick time.
Cheers,
Toddy Prawiraharjo
|
|
|
|
|