You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Mike Percy (Code Review)" <ge...@cloudera.org> on 2017/05/01 20:54:53 UTC

[kudu-CR] tablet copy: Ensure no data loss on tablet copy failure

Mike Percy has posted comments on this change.

Change subject: tablet copy: Ensure no data loss on tablet copy failure
......................................................................


Patch Set 1:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/6732/1//COMMIT_MSG
Commit Message:

PS1, Line 9: This patch adds an integration test to ensure that we don't see data
           : loss on tablet copy failure. Tablet copy failure is triggered a couple
           : of different ways at the source server side.
> If you unrevert the patch that Todd reverted, does this test fail in the de
Turns out it was flaky, now it fails reliably


http://gerrit.cloudera.org:8080/#/c/6732/1/src/kudu/integration-tests/tablet_copy-itest.cc
File src/kudu/integration-tests/tablet_copy-itest.cc:

Line 969:   const char* kTableA = "table_a";
> Move this down to L976.
Done


Line 985:   const char* kTableB = "table_b";
> Move down.
Done


Line 1000:   // Shut down replica with only [A].
> Maybe before doing this, assert that you got the replica distribution that 
Done; added to LoadTable()


Line 1022:     // The early timeout flag will also cause the tablet copy to fail, but
> ASSERT that failure_flag is the other flag.
Done


http://gerrit.cloudera.org:8080/#/c/6732/1/src/kudu/tserver/tablet_copy_service.cc
File src/kudu/tserver/tablet_copy_service.cc:

Line 69: TAG_FLAG(tablet_copy_early_session_timeout_prob, unsafe);
> Tag it hidden too. The rest as well (instead of unsafe), if they're suppose
No need; unsafe implies hidden.


Line 162:   // For testing: Close the session prematurely if unsafe gflag is set but
> Is there a particular reason why this must happen here (after the session h
I want this to simulate an expiring session, which means the client should successfully start a session on its side as well.

The reason this is here instead of in the copy data path is so we can probabilistically fail a certain proportion of tablet copy sessions instead of making it dependent on the amount of data being copied.


Line 288:     CHECK_OK(DoEndTabletCopySessionUnlocked(session_id, &app_error));
> Should probably be WARN_NOT_OK.
It can't fail in the current implementation but doesn't hurt to be defensive against future changes; done.


-- 
To view, visit http://gerrit.cloudera.org:8080/6732
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie57590e2473e09a6836bb02d5fccb7689614f8e7
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes