You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Philipp Suter (JIRA)" <ji...@apache.org> on 2016/02/19 15:58:19 UTC

[jira] [Updated] (OAK-4028) Use UPSERT (i.e. MERGE) for RDB to limit roundtrips on writing

     [ https://issues.apache.org/jira/browse/OAK-4028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Philipp Suter updated OAK-4028:
-------------------------------
    Description: 
All databases offer some kind of UPSERT functionality. Either as MERGE [] or INSERT .. ON CONFLICT UPDATE. Such statements could be used to prevent an additional roundtrip in case an INSERT was not successful and needed an actual UPDATE.

Problem: Distinguish if the DB actually did an INSERT or UPDATE for each statement. This is done differently for each supported database but possible, except most likely for Oracle:

MySQL: http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html
- 1 for insert, 2 for update, 0 for update but unchanged
	
MS SQL: https://msdn.microsoft.com/en-us/library/bb510625.aspx
- See example 'C' 
- Use the OUTPUT variable $action
- Needs some testing with JDBC
	
Oracle: https://community.oracle.com/thread/3803485
- Don't use merge

DB2: http://www-01.ibm.com/support/knowledgecenter/SSEPEK_11.0.0/com.ibm.db2z11.doc.apsg/src/tpc/db2z_selectvaluesmerge.dita?lang=en
- SELECT FROM MERGE
- use a column to identify if insert or update was done, e.g. createdDate or similar (Modified?)
- Needs some testing with JDBC

Postgres: http://www.postgresql.org/docs/current/static/sql-insert.html
- INSERT ... ON CONFLICT UPDATE ... RETURNING ...
- Similar to DB2: Use a DB field to separate freshly inserted values from updated ones

Ideally this is first tested with one database, e.g. MySQL due to the very convenient syntax in MySQL. If a difference in performance can be measured it could be implemented across the board.

  was:
All databases offer some kind of UPSERT functionality. Either as MERGE [] or INSERT .. ON CONFLICT UPDATE. Such statements could be used to prevent an additional roundtrip in case an INSERT was not successful and needed an actual UPDATE.

Problem: Distinguish if the DB actually did an INSERT or UPDATE for each statement. This is done differently for each supported database but possible, except for Oracle:

MySQL: 
http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html
- 1 for insert, 2 for update, 0 for update but unchanged
	
MS SQL:
https://msdn.microsoft.com/en-us/library/bb510625.aspx
- See example 'C' 
- Use the OUTPUT variable $action
- Needs some testing with JDBC
	
Oracle:
https://community.oracle.com/thread/3803485
- Don't use merge

DB2:
http://www-01.ibm.com/support/knowledgecenter/SSEPEK_11.0.0/com.ibm.db2z11.doc.apsg/src/tpc/db2z_selectvaluesmerge.dita?lang=en
- SELECT FROM MERGE
- use a column to identify if insert or update was done, e.g. createdDate or similar (Modified?)
- Needs some testing with JDBC

Postgres
http://www.postgresql.org/docs/current/static/sql-insert.html
- INSERT ... ON CONFLICT UPDATE ... RETURNING ...
- Similar to DB2: Use a DB field to separate freshly inserte values from updated ones

Ideally this is first tested with one database, e.g. MySQL due to the very convenient syntax in MySQL. If a difference in performance can be measured it could be implemented across the board.


> Use UPSERT (i.e. MERGE) for RDB to limit roundtrips on writing
> --------------------------------------------------------------
>
>                 Key: OAK-4028
>                 URL: https://issues.apache.org/jira/browse/OAK-4028
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: documentmk, rdbmk
>            Reporter: Philipp Suter
>
> All databases offer some kind of UPSERT functionality. Either as MERGE [] or INSERT .. ON CONFLICT UPDATE. Such statements could be used to prevent an additional roundtrip in case an INSERT was not successful and needed an actual UPDATE.
> Problem: Distinguish if the DB actually did an INSERT or UPDATE for each statement. This is done differently for each supported database but possible, except most likely for Oracle:
> MySQL: http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html
> - 1 for insert, 2 for update, 0 for update but unchanged
> 	
> MS SQL: https://msdn.microsoft.com/en-us/library/bb510625.aspx
> - See example 'C' 
> - Use the OUTPUT variable $action
> - Needs some testing with JDBC
> 	
> Oracle: https://community.oracle.com/thread/3803485
> - Don't use merge
> DB2: http://www-01.ibm.com/support/knowledgecenter/SSEPEK_11.0.0/com.ibm.db2z11.doc.apsg/src/tpc/db2z_selectvaluesmerge.dita?lang=en
> - SELECT FROM MERGE
> - use a column to identify if insert or update was done, e.g. createdDate or similar (Modified?)
> - Needs some testing with JDBC
> Postgres: http://www.postgresql.org/docs/current/static/sql-insert.html
> - INSERT ... ON CONFLICT UPDATE ... RETURNING ...
> - Similar to DB2: Use a DB field to separate freshly inserted values from updated ones
> Ideally this is first tested with one database, e.g. MySQL due to the very convenient syntax in MySQL. If a difference in performance can be measured it could be implemented across the board.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)