You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Brock Noland (JIRA)" <ji...@apache.org> on 2018/12/04 22:40:00 UTC

[jira] [Commented] (KUDU-1563) Add support for INSERT IGNORE

    [ https://issues.apache.org/jira/browse/KUDU-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16709366#comment-16709366 ] 

Brock Noland commented on KUDU-1563:
------------------------------------

Hey all,

I've got a use case which could really benefit from {{INSERT IGNORE DUPLICATE KEY}} since we will have duplicates at a ratio of 3x so I am trying to revive this work.

I am not sold on creating an extremely generic approach to server-side error ignoring because I think it'll be really easy to abuse. I feel like Kudu contributors should have some control over when ignoring errors is allowed so we understand and validate the use case.

Furthermore, {{INSERT INGNORE ALL ERRORS}} won't work for my use case because we are generating so many duplicates precisely because we are so concerned about data loss.

Therefore I am suggesting we add a session level property allows the user to ignore certain server side errors for {{{INSERT}},{{UPDATE}},{{DELETE}}} {{IGNORE}} operations. Below is a likely edited summary from [~adar] of my proposal:

* Move forward with a new operation {{INSERT IGNORE}}, with the understanding that {{UPDATE IGNORE}} and {{DELETE IGNORE}} would be good additions in the future. Together they comprise a new set of write operations that may ignore certain errors.
* Document that {{INSERT IGNORE}} isn't just about duplicate primary keys; the precise set of errors ignored by all of these new write operations is configurable.
* Add new {{KuduSession}} properties that control the set of errors ignored by write operations. This set will initially just be "duplicate primary key on insert". The properties should be combinable (i.e. I should be able to ignore duplicate primary keys AND missing partitions), but the granularity will be session-level, not operation-level.
Default no errors ignored, so that the user is forced to configure the precise set they want to ignore.


> Add support for INSERT IGNORE
> -----------------------------
>
>                 Key: KUDU-1563
>                 URL: https://issues.apache.org/jira/browse/KUDU-1563
>             Project: Kudu
>          Issue Type: New Feature
>            Reporter: Dan Burkert
>            Assignee: Brock Noland
>            Priority: Major
>              Labels: newbie
>
> The Java client currently has an [option to ignore duplicate row key errors| https://kudu.apache.org/apidocs/org/kududb/client/AsyncKuduSession.html#setIgnoreAllDuplicateRows-boolean-], which is implemented by filtering the errors on the client side.  If we are going to continue to support this feature (and the consensus seems to be that we probably should), we should promote it to a first class operation type that is handled on the server side.  This would have a modest perf. improvement since less errors are returned, and it would allow INSERT IGNORE ops to be mixed in the same batch as other INSERT, DELETE, UPSERT, etc. ops.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)