Home > Archive > PERL Beginners > September 2007 > database insert algorithm
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
database insert algorithm
|
|
|
| Hello,
I am looking for a proper, fastest and most reasonable way to insert
data from pretty big file (~1,000,000 lines) to database. I am using
Win32::ODBC (ActiveState Perl) module to connect with Access/MSSQL
database and inserting line after line. I was wondering if there is a
better way to do it...
Maybe create hash with part of data (maybe all of it - what are the
limitations ?)
What is other way to do it instead 'INSERT INTO...' statement after
reading each line ?
Thanks for any help in advance...
Luke
| |
| Dr.Ruud 2007-09-23, 6:59 pm |
| Luke schreef:
> I am looking for a proper, fastest and most reasonable way to insert
> data from pretty big file (~1,000,000 lines) to database.
Make the file have a format as needed by the data import tool of the
database system.
Often a CSV format is supported.
See also bcp:
http://technet.microsoft.com/en-us/...y/ms162802.aspx
--
Affijn, Ruud
"Gewoon is een tijger."
| |
| Nobull67@Gmail.Com 2007-09-23, 6:59 pm |
| On 22 Sep, 02:58, vieviu...@gmail.com (Luke) wrote:
> Hello,
> I am looking for a proper, fastest and most reasonable way to insert
> data from pretty big file (~1,000,000 lines) to database. I am using
> Win32::ODBC (ActiveState Perl) module to connect with Access/MSSQL
> database and inserting line after line. I was wondering if there is a
> better way to do it...
> Maybe create hash with part of data (maybe all of it - what are the
> limitations ?)
> What is other way to do it instead 'INSERT INTO...' statement after
> reading each line ?
DBI has an execute_array method that can allow DBD drivers to optimize
such operations.
Unfortunately AFAIK none of the DBD drivers I've encountered do any
significant optimisation.
For efficient bulk inserts I usually fall back on writing a file and
using the underlying database's bulk insert tool. This, of course,
does not give portability between databases.
| |
| Jenda Krynicky 2007-09-24, 7:59 am |
| From: Luke <vieviurka@gmail.com>
> Hello,
> I am looking for a proper, fastest and most reasonable way to insert
> data from pretty big file (~1,000,000 lines) to database. I am using
> Win32::ODBC (ActiveState Perl) module to connect with Access/MSSQL
> database and inserting line after line. I was wondering if there is a
> better way to do it...
> Maybe create hash with part of data (maybe all of it - what are the
> limitations ?)
> What is other way to do it instead 'INSERT INTO...' statement after
> reading each line ?
I you stick to Perl, you should use DBI and DBD::ODBC, not
Win32::ODBC. You should use ->prepare() and ->execute(). And you
should turn the AutoCommit off and commit only every 100 or 1000
rows.
Jenda
===== Jenda@Krynicky.cz === http://Jenda.Krynicky.cz =====
When it comes to wine, women and song, wizards are allowed
to get drunk and croon as much as they like.
-- Terry Pratchett in Sourcery
| |
| Rob Coops 2007-09-24, 7:59 am |
| First of all, you should try and use the database native tools for loading
tables this size...
If it has to be perl which does happen sometimes, you should definetly
switch of autocommit. Try doing a commit every 50000 lines or even every
100000 lines (ask your friendly DBA how big the transaction log and rollback
segment are and if they will support the 100k or if maybe 50k or 25k would
be better), the bigger the block the less the overhead. Also ask you DBA to
consider table locks and so on when doing an insert this big...
Then you should if at all posible prepare the insert statment so you just
have to give the variables before doing the execute, this will save a lot on
the overhead.
Then of course if you have the memory try and shove as much of the variables
in memory while the commit is being executed, you might be able to use two
threads one to read a chunk and one to shove the stuff in the database,
while the other is reading a new chunk.
Regards,
Rob
On 9/22/07, Luke <vieviurka@gmail.com> wrote:
>
> Hello,
> I am looking for a proper, fastest and most reasonable way to insert
> data from pretty big file (~1,000,000 lines) to database. I am using
> Win32::ODBC (ActiveState Perl) module to connect with Access/MSSQL
> database and inserting line after line. I was wondering if there is a
> better way to do it...
> Maybe create hash with part of data (maybe all of it - what are the
> limitations ?)
> What is other way to do it instead 'INSERT INTO...' statement after
> reading each line ?
>
> Thanks for any help in advance...
>
> Luke
>
>
> --
> To unsubscribe, e-mail: beginners-unsubscribe@perl.org
> For additional commands, e-mail: beginners-help@perl.org
> http://learn.perl.org/
>
>
>
| |
| Matthew Whipple 2007-09-24, 7:01 pm |
| I haven't dealt with MS databases in several years but some of this
stuff will likely still apply. First a couple notes on the underlying
databases, if you're running Access be sure to compact the database
after the insert particularly if this is going to be a repeated process,
in addition to the gained performance there may be a file size
limitation and the file can swell incredibly during a large insert If
you're running MS SQL then make sure that the cluster index on the table
isn't set up in a way that the insert will require a lot of rearranging
of data.
If you can take the table off line during the insert the best
solution would be to use Perl to dump the data into a standard format
(if it isn't already) and use the databases native importing tools. If
you are going to be inserting with the DB online then depending on the
quantity of data and how much the DB is accessed, then record locking
during inserts/updates can create a serious lag. My suggested solution
for this scenario would be to create long INSERT statements which will
add multiple rows at a time...most SQL databases support some form of
this with something along the lines of multiple sets of values in
parentheses but the specific command format varies. You can tailor the
number of lines to suit your need, balancing the extra speed of the
fewer, larger database transactions with momentarily freeing up the
system. The smaller inserts can also cause problems if they're not
getting processed by the DB fast enough and end up backed up. I think
MS SQL at least also allows for the setting of precedence for certain
SQL statements or processes, and creating a stored procedure may provide
an additional performance boost.
Luke wrote:
> Hello,
> I am looking for a proper, fastest and most reasonable way to insert
> data from pretty big file (~1,000,000 lines) to database. I am using
> Win32::ODBC (ActiveState Perl) module to connect with Access/MSSQL
> database and inserting line after line. I was wondering if there is a
> better way to do it...
> Maybe create hash with part of data (maybe all of it - what are the
> limitations ?)
> What is other way to do it instead 'INSERT INTO...' statement after
> reading each line ?
>
> Thanks for any help in advance...
>
> Luke
>
>
>
|
|
|
|
|