You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Mike Percy (Code Review)" <ge...@cloudera.org> on 2017/05/01 20:54:53 UTC
[kudu-CR] tablet copy: Ensure no data loss on tablet copy failure
Mike Percy has posted comments on this change.
Change subject: tablet copy: Ensure no data loss on tablet copy failure
......................................................................
Patch Set 1:
(8 comments)
http://gerrit.cloudera.org:8080/#/c/6732/1//COMMIT_MSG
Commit Message:
PS1, Line 9: This patch adds an integration test to ensure that we don't see data
: loss on tablet copy failure. Tablet copy failure is triggered a couple
: of different ways at the source server side.
> If you unrevert the patch that Todd reverted, does this test fail in the de
Turns out it was flaky, now it fails reliably
http://gerrit.cloudera.org:8080/#/c/6732/1/src/kudu/integration-tests/tablet_copy-itest.cc
File src/kudu/integration-tests/tablet_copy-itest.cc:
Line 969: const char* kTableA = "table_a";
> Move this down to L976.
Done
Line 985: const char* kTableB = "table_b";
> Move down.
Done
Line 1000: // Shut down replica with only [A].
> Maybe before doing this, assert that you got the replica distribution that
Done; added to LoadTable()
Line 1022: // The early timeout flag will also cause the tablet copy to fail, but
> ASSERT that failure_flag is the other flag.
Done
http://gerrit.cloudera.org:8080/#/c/6732/1/src/kudu/tserver/tablet_copy_service.cc
File src/kudu/tserver/tablet_copy_service.cc:
Line 69: TAG_FLAG(tablet_copy_early_session_timeout_prob, unsafe);
> Tag it hidden too. The rest as well (instead of unsafe), if they're suppose
No need; unsafe implies hidden.
Line 162: // For testing: Close the session prematurely if unsafe gflag is set but
> Is there a particular reason why this must happen here (after the session h
I want this to simulate an expiring session, which means the client should successfully start a session on its side as well.
The reason this is here instead of in the copy data path is so we can probabilistically fail a certain proportion of tablet copy sessions instead of making it dependent on the amount of data being copied.
Line 288: CHECK_OK(DoEndTabletCopySessionUnlocked(session_id, &app_error));
> Should probably be WARN_NOT_OK.
It can't fail in the current implementation but doesn't hurt to be defensive against future changes; done.
--
To view, visit http://gerrit.cloudera.org:8080/6732
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie57590e2473e09a6836bb02d5fccb7689614f8e7
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes