You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-user@db.apache.org by pu...@verizon.net on 2008/11/25 22:33:27 UTC
faster inserts in big tables
I have 10's of thousands of rows to add to a table, possibly 100's of
thousands, and I'm wondering if there's anything else I can do to speed
this up. The table could end up having a couple of million rows.
This is what I've done so far:
* Using a PreparedStatement that gets reused with each insert.
* Set locking level to TABLE for that table.
* Turned off autocommit.
* Set the connection to READ_COMMIT.
In addition to that, I'm also setting these system parameters, though
not
necessarily to improve insert performance:
* derby.system.durability=test
* derby.storage.pageSize=32768
The table has one odd feature: The last column is a VARCHAR(32672) FOR
BIT DATA. I've tried setting the length to something smaller, but it
didn't really seem to matter.
The primary key is an auto generated int with another 2-column index on
two BIGINT columns. Something I found interesting is that the inserts
seem to go 2x faster if I have the 2-column index in place than if I
have just the primary-key index.
I'm running
Derby 10.2.2
JRE 1.6.0_07
Windows XP SP2
Is there anything else I can do to speed up row inserts?
Thanks,
Brian
Re: faster inserts in big tables
Posted by Rick Hillegas <Ri...@Sun.COM>.
Knut Anders Hatlen wrote:
> Rick Hillegas <Ri...@Sun.COM> writes:
>
>
>> Knut Anders Hatlen wrote:
>>
>>> Rick Hillegas <Ri...@Sun.COM> writes:
>>>
>>>
>>>
>>>> Hi Brian,
>>>>
>>>> In a previous response Peter Ondruška noted that you could use bulk
>>>> import to speed up your inserts if your source data lived in a
>>>> properly formatted file.
>>>>
>>>> Even if your source data does not live in a file, you can still get
>>>> bulk import speed (and the benefits of your generated key) by using
>>>> table functions--provided that you upgrade to 10.4. If you wrap your
>>>> source data in a table function, then you can bulk insert your data as
>>>> follows:
>>>>
>>>> insert into MY_TABLE( NON_GEN_COL1, ... NON_GEN_COL_N )
>>>> select * from table( MY_TABLE_FUNCTION() ) s
>>>>
>>>>
>>> The internal SQL syntax allows you to add an optimizer override here
>>> (--DERBY-PROPERTIES insertMode=bulkInsert) which I think speeds it up
>>> even more, though I'm not sure exactly what it buys you. This override
>>> is used in the import code and in the SYSCS_UTIL.SYSCS_BULK_INSERT
>>> system procedure, but you're not allowed to use it in your own SQL
>>> queries, it seems. The (undocumented) SYSCS_BULK_INSERT procedure does
>>> exactly what Rick's example does, except that it also adds the optimizer
>>> override.
>>>
>>> The procedure only works on old-style table functions, though. Does
>>> anyone know what performance gains one can expect by using
>>> SYSCS_BULK_INSERT instead of INSERT INTO ... SELECT FROM? Would it be
>>> worth the effort to create a similar mechanism for the new-style table
>>> functions?
>>>
>>>
>>>
>> Hi Knut,
>>
>> It appears to me that this setting (insertMode=bulkInsert) is
>> inspected when binding an INSERT statement. If the bind() logic finds
>> this setting, then it escalates the lock mode for the target table to
>> be a table level lock. Other than that, I cannot find any other use of
>> this setting. Comments around this code, however, indicate that there
>> might once have been other implications and that this setting, which
>> is now unconditionally accepted, used to be silently rejected in
>> certain situations, including insertion into a synchronized client
>> database, insertion into a table which has triggers, and deferred mode
>> insertion (see the comments on InsertNode.verifyBulkInsert).
>>
>> It appears to me that the benefit now conferred by this setting can be
>> achieved by setting the lock mode on the table to be table level. So
>> you would preface the INSERT statement above with the following
>> statement:
>>
>> lock MY_TABLE in exclusive mode
>>
>> That, at least, is what I have gleaned from a code inspection.
>>
>
> Thanks Rick,
>
> Did you also look at the code in InsertResultSet.open()? It looks like
> it has some special handling of triggers, forreign keys and checking of
> constraints when running in bulk insert mode.
>
>
Hi Knut,
Thanks for finding that. It looks like InsertResultSet optimizes the
performance of bulk inserts as follows:
1) Claims to turn off row logging. That is, performs the insert via
TransactionController.createAndLoadConglomerate() and
recreateAndLoadConglomerate(). Perhaps you could take a look at those
methods and verify what they really do.
2) Bulk-enforces foreign keys defined on the table. This is an optimized
zippering together of the indexes which support a foreign/primary key
pair. As I look at this code, however, it seems to me that you may
actually get worse performance if you are bulk inserting (not replacing)
a small number of rows into a table which already has a lot of rows in it.
There is some additional trickiness dealing with the interaction of
BEFORE triggers and CHECK constrains, but (famous last words) it appears
to me that the processing is still row-by-row in the language layer so I
don't see any optimization involved here.
I guess the bottom line is that with bulk insert, you get better
performance because logging is disabled. The bulk enforcement of foreign
keys may improve or degrade your performance.
So those are a few more clues.
Cheers,
-Rick
Re: faster inserts in big tables
Posted by Knut Anders Hatlen <Kn...@Sun.COM>.
Rick Hillegas <Ri...@Sun.COM> writes:
> Knut Anders Hatlen wrote:
>> Rick Hillegas <Ri...@Sun.COM> writes:
>>
>>
>>> Hi Brian,
>>>
>>> In a previous response Peter Ondruška noted that you could use bulk
>>> import to speed up your inserts if your source data lived in a
>>> properly formatted file.
>>>
>>> Even if your source data does not live in a file, you can still get
>>> bulk import speed (and the benefits of your generated key) by using
>>> table functions--provided that you upgrade to 10.4. If you wrap your
>>> source data in a table function, then you can bulk insert your data as
>>> follows:
>>>
>>> insert into MY_TABLE( NON_GEN_COL1, ... NON_GEN_COL_N )
>>> select * from table( MY_TABLE_FUNCTION() ) s
>>>
>>
>> The internal SQL syntax allows you to add an optimizer override here
>> (--DERBY-PROPERTIES insertMode=bulkInsert) which I think speeds it up
>> even more, though I'm not sure exactly what it buys you. This override
>> is used in the import code and in the SYSCS_UTIL.SYSCS_BULK_INSERT
>> system procedure, but you're not allowed to use it in your own SQL
>> queries, it seems. The (undocumented) SYSCS_BULK_INSERT procedure does
>> exactly what Rick's example does, except that it also adds the optimizer
>> override.
>>
>> The procedure only works on old-style table functions, though. Does
>> anyone know what performance gains one can expect by using
>> SYSCS_BULK_INSERT instead of INSERT INTO ... SELECT FROM? Would it be
>> worth the effort to create a similar mechanism for the new-style table
>> functions?
>>
>>
> Hi Knut,
>
> It appears to me that this setting (insertMode=bulkInsert) is
> inspected when binding an INSERT statement. If the bind() logic finds
> this setting, then it escalates the lock mode for the target table to
> be a table level lock. Other than that, I cannot find any other use of
> this setting. Comments around this code, however, indicate that there
> might once have been other implications and that this setting, which
> is now unconditionally accepted, used to be silently rejected in
> certain situations, including insertion into a synchronized client
> database, insertion into a table which has triggers, and deferred mode
> insertion (see the comments on InsertNode.verifyBulkInsert).
>
> It appears to me that the benefit now conferred by this setting can be
> achieved by setting the lock mode on the table to be table level. So
> you would preface the INSERT statement above with the following
> statement:
>
> lock MY_TABLE in exclusive mode
>
> That, at least, is what I have gleaned from a code inspection.
Thanks Rick,
Did you also look at the code in InsertResultSet.open()? It looks like
it has some special handling of triggers, forreign keys and checking of
constraints when running in bulk insert mode.
--
Knut Anders
Re: faster inserts in big tables
Posted by Rick Hillegas <Ri...@Sun.COM>.
Knut Anders Hatlen wrote:
> Rick Hillegas <Ri...@Sun.COM> writes:
>
>
>> Hi Brian,
>>
>> In a previous response Peter Ondruška noted that you could use bulk
>> import to speed up your inserts if your source data lived in a
>> properly formatted file.
>>
>> Even if your source data does not live in a file, you can still get
>> bulk import speed (and the benefits of your generated key) by using
>> table functions--provided that you upgrade to 10.4. If you wrap your
>> source data in a table function, then you can bulk insert your data as
>> follows:
>>
>> insert into MY_TABLE( NON_GEN_COL1, ... NON_GEN_COL_N )
>> select * from table( MY_TABLE_FUNCTION() ) s
>>
>
> The internal SQL syntax allows you to add an optimizer override here
> (--DERBY-PROPERTIES insertMode=bulkInsert) which I think speeds it up
> even more, though I'm not sure exactly what it buys you. This override
> is used in the import code and in the SYSCS_UTIL.SYSCS_BULK_INSERT
> system procedure, but you're not allowed to use it in your own SQL
> queries, it seems. The (undocumented) SYSCS_BULK_INSERT procedure does
> exactly what Rick's example does, except that it also adds the optimizer
> override.
>
> The procedure only works on old-style table functions, though. Does
> anyone know what performance gains one can expect by using
> SYSCS_BULK_INSERT instead of INSERT INTO ... SELECT FROM? Would it be
> worth the effort to create a similar mechanism for the new-style table
> functions?
>
>
Hi Knut,
It appears to me that this setting (insertMode=bulkInsert) is inspected
when binding an INSERT statement. If the bind() logic finds this
setting, then it escalates the lock mode for the target table to be a
table level lock. Other than that, I cannot find any other use of this
setting. Comments around this code, however, indicate that there might
once have been other implications and that this setting, which is now
unconditionally accepted, used to be silently rejected in certain
situations, including insertion into a synchronized client database,
insertion into a table which has triggers, and deferred mode insertion
(see the comments on InsertNode.verifyBulkInsert).
It appears to me that the benefit now conferred by this setting can be
achieved by setting the lock mode on the table to be table level. So you
would preface the INSERT statement above with the following statement:
lock MY_TABLE in exclusive mode
That, at least, is what I have gleaned from a code inspection.
Regards,
-Rick
Re: faster inserts in big tables
Posted by Knut Anders Hatlen <Kn...@Sun.COM>.
Rick Hillegas <Ri...@Sun.COM> writes:
> Hi Brian,
>
> In a previous response Peter Ondruška noted that you could use bulk
> import to speed up your inserts if your source data lived in a
> properly formatted file.
>
> Even if your source data does not live in a file, you can still get
> bulk import speed (and the benefits of your generated key) by using
> table functions--provided that you upgrade to 10.4. If you wrap your
> source data in a table function, then you can bulk insert your data as
> follows:
>
> insert into MY_TABLE( NON_GEN_COL1, ... NON_GEN_COL_N )
> select * from table( MY_TABLE_FUNCTION() ) s
The internal SQL syntax allows you to add an optimizer override here
(--DERBY-PROPERTIES insertMode=bulkInsert) which I think speeds it up
even more, though I'm not sure exactly what it buys you. This override
is used in the import code and in the SYSCS_UTIL.SYSCS_BULK_INSERT
system procedure, but you're not allowed to use it in your own SQL
queries, it seems. The (undocumented) SYSCS_BULK_INSERT procedure does
exactly what Rick's example does, except that it also adds the optimizer
override.
The procedure only works on old-style table functions, though. Does
anyone know what performance gains one can expect by using
SYSCS_BULK_INSERT instead of INSERT INTO ... SELECT FROM? Would it be
worth the effort to create a similar mechanism for the new-style table
functions?
--
Knut Anders
Re: faster inserts in big tables
Posted by Rick Hillegas <Ri...@Sun.COM>.
Hi Brian,
In a previous response Peter Ondruška noted that you could use bulk
import to speed up your inserts if your source data lived in a properly
formatted file.
Even if your source data does not live in a file, you can still get bulk
import speed (and the benefits of your generated key) by using table
functions--provided that you upgrade to 10.4. If you wrap your source
data in a table function, then you can bulk insert your data as follows:
insert into MY_TABLE( NON_GEN_COL1, ... NON_GEN_COL_N )
select * from table( MY_TABLE_FUNCTION() ) s
For more information on table functions, see the white paper at
http://developers.sun.com/javadb/reference/whitepapers/index.jsp
Hope this helps,
-Rick
>
> publicayers wrote:
>
>> I have 10's of thousands of rows to add to a table, possibly 100's of
>> thousands, and I'm wondering if there's anything else I can do to speed
>> this up. The table could end up having a couple of million rows.
>>
>> This is what I've done so far:
>> * Using a PreparedStatement that gets reused with each insert.
>> * Set locking level to TABLE for that table.
>> * Turned off autocommit.
>> * Set the connection to READ_COMMIT.
>>
>> In addition to that, I'm also setting these system parameters, though
>> not
>> necessarily to improve insert performance:
>> * derby.system.durability=test
>> * derby.storage.pageSize=32768
>>
>> The table has one odd feature: The last column is a VARCHAR(32672) FOR
>> BIT DATA. I've tried setting the length to something smaller, but it
>> didn't really seem to matter.
>>
>> The primary key is an auto generated int with another 2-column index on
>> two BIGINT columns. Something I found interesting is that the inserts
>> seem to go 2x faster if I have the 2-column index in place than if I
>> have just the primary-key index.
>>
>> I'm running
>> Derby 10.2.2
>> JRE 1.6.0_07
>> Windows XP SP2
>>
>> Is there anything else I can do to speed up row inserts?
>>
>> Thanks,
>> Brian
>>
>>
>>
>
>
Re: faster inserts in big tables
Posted by Bocete <ko...@gmail.com>.
You could make a helper class that will bundle several inserts into a single
query.
pseudo-code:
class InsertQueryBundle {
insertRow(String values) {
if (valuesQueue.isFull())
flushValuesQueue();
addRowToQueue(values)
}
flushValuesQueue() {
StringBuilder builder = new StringBuilder("INSERT INTO " +
tableName + " (" + columns + ") VALUES ";
while ((String values = queue.poll()) != null) {
builder.append("(" + values + "), ");
}
executeQuery(builder.toString());
}
}
publicayers wrote:
>
> I have 10's of thousands of rows to add to a table, possibly 100's of
> thousands, and I'm wondering if there's anything else I can do to speed
> this up. The table could end up having a couple of million rows.
>
> This is what I've done so far:
> * Using a PreparedStatement that gets reused with each insert.
> * Set locking level to TABLE for that table.
> * Turned off autocommit.
> * Set the connection to READ_COMMIT.
>
> In addition to that, I'm also setting these system parameters, though
> not
> necessarily to improve insert performance:
> * derby.system.durability=test
> * derby.storage.pageSize=32768
>
> The table has one odd feature: The last column is a VARCHAR(32672) FOR
> BIT DATA. I've tried setting the length to something smaller, but it
> didn't really seem to matter.
>
> The primary key is an auto generated int with another 2-column index on
> two BIGINT columns. Something I found interesting is that the inserts
> seem to go 2x faster if I have the 2-column index in place than if I
> have just the primary-key index.
>
> I'm running
> Derby 10.2.2
> JRE 1.6.0_07
> Windows XP SP2
>
> Is there anything else I can do to speed up row inserts?
>
> Thanks,
> Brian
>
>
--
View this message in context: http://www.nabble.com/faster-inserts-in-big-tables-tp20690051p20957504.html
Sent from the Apache Derby Users mailing list archive at Nabble.com.
Re: faster inserts in big tables
Posted by Peter Ondruška <pe...@gmail.com>.
Go with latest Derby, if you load data from externat file use bulk
import. Using large log file may help.
2008/11/25, publicayers@verizon.net <pu...@verizon.net>:
> I have 10's of thousands of rows to add to a table, possibly 100's of
> thousands, and I'm wondering if there's anything else I can do to speed
> this up. The table could end up having a couple of million rows.
>
> This is what I've done so far:
> * Using a PreparedStatement that gets reused with each insert.
> * Set locking level to TABLE for that table.
> * Turned off autocommit.
> * Set the connection to READ_COMMIT.
>
> In addition to that, I'm also setting these system parameters, though
> not
> necessarily to improve insert performance:
> * derby.system.durability=test
> * derby.storage.pageSize=32768
>
> The table has one odd feature: The last column is a VARCHAR(32672) FOR
> BIT DATA. I've tried setting the length to something smaller, but it
> didn't really seem to matter.
>
> The primary key is an auto generated int with another 2-column index on
> two BIGINT columns. Something I found interesting is that the inserts
> seem to go 2x faster if I have the 2-column index in place than if I
> have just the primary-key index.
>
> I'm running
> Derby 10.2.2
> JRE 1.6.0_07
> Windows XP SP2
>
> Is there anything else I can do to speed up row inserts?
>
> Thanks,
> Brian
>