You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@trafodion.apache.org by "Song, Hao-Lin" <ha...@esgyn.cn> on 2018/04/28 02:50:26 UTC

Question about trafodion insert

Hi all

I have a question about trafodion insert that if trafodion executes query ‘insert into table values(i),(i+1)…,(i+100)’ N/100 times faster than executing query ‘insert into table values(i) N times. So I made a simple test:
I used jdbcT4 to execute query ‘insert into song values(i)’ 10K times (i from 1 to 10K, 10k rows)and it cost about 89 seconds. Then I executed query ‘insert into song values(i),(I +1),…,(i+100)’ 100 times(total of 10,000 rows) and it cost about 8.6 seconds.
I am not sure if it only because simple 1 needs to send buffer to mxosrvr 10k times. But I am more concerned about if query ‘insert into table values(i),(),(),(I + N)’ will be executed faster in sql engine than executing ‘insert into table values(i)’ N times.

Besides, how will the engine handle query ‘insert into table values(a),(b),(c)’? Will engine use execute plan cache?

Best,
宋昊霖 (Haolin(Leo) Song)


答复: Question about trafodion insert

Posted by "Song, Hao-Lin" <ha...@esgyn.cn>.
Hi Anoop,

Thanks very much for your explain!

Best,
宋昊霖 (Haolin(Leo) Song)

-----邮件原件-----
发件人: Anoop Sharma <an...@esgyn.com> 
发送时间: 2018年4月28日 12:46
收件人: dev@trafodion.apache.org
主题: RE: Question about trafodion insert

For each insert statement, sql engine need to:
- compile the query
- begin transaction
- send data to hbase server
- commit transaction.

For simple queries like an insert, most of the cost is in the 3rd step where values are sent to hbase server.
This is a process hop from mxosrvr to hbase region server.

When the insert query has multiple values, then all those values are buffered up and sent to hbase.
That will reduce the number of process sends that have to be done between mxosrvr and region server.

This is the same improvement that is seen if values are sent from client to server using rowsets.
In that mode, all rows that are included in the rowset are shipped in one call from client to server.
That will make it run much faster compared to executing the query with one row.
(there are limits of rowset size and buffer size sent to hbase. Values are internally buffered up to the max buffer/rowset size).

Compile time query cache will cache the insert statement.
 Which means that the statement ' insert into song values(i)' when issued multiple times will use the cached plan.
But that will not reduce the buffer being sent from mxosrvr to region server (step #3 above). That will be done whenever an insert stmt is issued.

In addition to use of rowsets or multiple values in one insert, there are other clauses that can be done to make inserts run faster. Those are use of upsert to avoid duplicate detection, use of 'upsert using load' to eliminate transaction being started/committed. Some of them depend on application need and its usage.

another note: using multiple values in one insert statement (insert ... values (i), (i+1)...) is ok if the number of values are 'small' but it is better to use rowsets if large number of values are to be inserted.

anoop

-----Original Message-----
From: Song, Hao-Lin <ha...@esgyn.cn>
Sent: Friday, April 27, 2018 7:50 PM
To: dev@trafodion.apache.org
Subject: Question about trafodion insert

Hi all

I have a question about trafodion insert that if trafodion executes query ‘insert into table values(i),(i+1)…,(i+100)’ N/100 times faster than executing query ‘insert into table values(i) N times. So I made a simple test:
I used jdbcT4 to execute query ‘insert into song values(i)’ 10K times (i from 1 to 10K, 10k rows)and it cost about 89 seconds. Then I executed query ‘insert into song values(i),(I +1),…,(i+100)’ 100 times(total of 10,000 rows) and it cost about 8.6 seconds.
I am not sure if it only because simple 1 needs to send buffer to mxosrvr 10k times. But I am more concerned about if query ‘insert into table values(i),(),(),(I + N)’ will be executed faster in sql engine than executing ‘insert into table values(i)’ N times.

Besides, how will the engine handle query ‘insert into table values(a),(b),(c)’? Will engine use execute plan cache?

Best,
宋昊霖 (Haolin(Leo) Song)


RE: Question about trafodion insert

Posted by Anoop Sharma <an...@esgyn.com>.
For each insert statement, sql engine need to:
- compile the query
- begin transaction
- send data to hbase server
- commit transaction.

For simple queries like an insert, most of the cost is in the 3rd step where values are sent to hbase server.
This is a process hop from mxosrvr to hbase region server.

When the insert query has multiple values, then all those values are buffered up and sent to hbase.
That will reduce the number of process sends that have to be done between mxosrvr and region server.

This is the same improvement that is seen if values are sent from client to server using rowsets.
In that mode, all rows that are included in the rowset are shipped in one call from client to server.
That will make it run much faster compared to executing the query with one row.
(there are limits of rowset size and buffer size sent to hbase. Values are internally buffered up to
the max buffer/rowset size).

Compile time query cache will cache the insert statement.
 Which means that the statement ' insert into song values(i)' when issued multiple times will use
the cached plan.
But that will not reduce the buffer being sent from mxosrvr to region server (step #3 above). That will
be done whenever an insert stmt is issued.

In addition to use of rowsets or multiple values in one insert, there are other clauses that can be done to make inserts 
run faster. Those are use of upsert to avoid duplicate detection, use of 'upsert using load' to eliminate transaction
being started/committed. Some of them depend on application need and its usage.

another note: using multiple values in one insert statement (insert ... values (i), (i+1)...) is ok if the number
of values are 'small' but it is better to use rowsets if large number of values are to be inserted.

anoop

-----Original Message-----
From: Song, Hao-Lin <ha...@esgyn.cn> 
Sent: Friday, April 27, 2018 7:50 PM
To: dev@trafodion.apache.org
Subject: Question about trafodion insert

Hi all

I have a question about trafodion insert that if trafodion executes query ‘insert into table values(i),(i+1)…,(i+100)’ N/100 times faster than executing query ‘insert into table values(i) N times. So I made a simple test:
I used jdbcT4 to execute query ‘insert into song values(i)’ 10K times (i from 1 to 10K, 10k rows)and it cost about 89 seconds. Then I executed query ‘insert into song values(i),(I +1),…,(i+100)’ 100 times(total of 10,000 rows) and it cost about 8.6 seconds.
I am not sure if it only because simple 1 needs to send buffer to mxosrvr 10k times. But I am more concerned about if query ‘insert into table values(i),(),(),(I + N)’ will be executed faster in sql engine than executing ‘insert into table values(i)’ N times.

Besides, how will the engine handle query ‘insert into table values(a),(b),(c)’? Will engine use execute plan cache?

Best,
宋昊霖 (Haolin(Leo) Song)