You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Will Berkeley (JIRA)" <ji...@apache.org> on 2019/05/02 23:50:00 UTC

[jira] [Created] (KUDU-2809) Incremental backup / diff scan does not handle rows that are inserted and deleted between two incrementals correctly

Will Berkeley created KUDU-2809:
-----------------------------------

             Summary: Incremental backup / diff scan does not handle rows that are inserted and deleted between two incrementals correctly
                 Key: KUDU-2809
                 URL: https://issues.apache.org/jira/browse/KUDU-2809
             Project: Kudu
          Issue Type: Bug
          Components: backup
    Affects Versions: 1.9.0
            Reporter: Will Berkeley


I did the following sequence of operations:

# Insert 100 million rows
# Update 1 out of every 11 rows
# Make a full backup
# Insert 100 million more rows, after the original rows in keyspace
# Delete 1 out of every 23 rows
# Make an incremental backup

Restore failed to apply the incremental backup, failing with an error like

{noformat}
java.lang.RuntimeException: failed to write 1000 rows from DataFrame to Kudu; sample errors:
{noformat}

Due to another bug, there's no sample errors, but after hacking around that bug, I found that the incremental contained a row with a DELETE action for a key that is not present in the full backup. That's because the row was inserted in step 4 and deleted in step 5, between backups.

We could fix this by
# Making diff scan not return a DELETE for such a row
# Implementing and using DELETE IGNORE in the restore job



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)