You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Mike Percy (Code Review)" <ge...@cloudera.org> on 2017/04/25 22:52:56 UTC

[kudu-CR] tablet copy: Ensure no data loss on tablet copy failure

Mike Percy has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6732

Change subject: tablet copy: Ensure no data loss on tablet copy failure
......................................................................

tablet copy: Ensure no data loss on tablet copy failure

This patch adds an integration test to ensure that we don't see data
loss on tablet copy failure. Tablet copy failure is triggered a couple
of different ways at the source server side.

This also implements the ThreadSafeRandom::NextDoubleFraction() method,
which existed in the Random class but was missing from the thread-safe
version.

Change-Id: Ie57590e2473e09a6836bb02d5fccb7689614f8e7
---
M src/kudu/integration-tests/cluster_itest_util.cc
M src/kudu/integration-tests/cluster_itest_util.h
M src/kudu/integration-tests/create-table-stress-test.cc
M src/kudu/integration-tests/delete_tablet-itest.cc
M src/kudu/integration-tests/external_mini_cluster-itest-base.cc
M src/kudu/integration-tests/tablet_copy-itest.cc
M src/kudu/integration-tests/ts_itest-base.h
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/integration-tests/ts_tablet_manager-itest.cc
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_copy_service.cc
M src/kudu/tserver/tablet_copy_service.h
M src/kudu/util/random.h
13 files changed, 242 insertions(+), 22 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/32/6732/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6732
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ie57590e2473e09a6836bb02d5fccb7689614f8e7
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <mp...@apache.org>

[kudu-CR] tablet copy: Ensure no data loss on tablet copy failure

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change.

Change subject: tablet copy: Ensure no data loss on tablet copy failure
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6732/5/src/kudu/integration-tests/tablet_copy-itest.cc
File src/kudu/integration-tests/tablet_copy-itest.cc:

Line 965:       WARN_NOT_OK(s, "Unable to list tablets on TS " + entry.first);
> The warning is still here; I thought you were going to remove it?
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6732
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie57590e2473e09a6836bb02d5fccb7689614f8e7
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] tablet copy: Ensure no data loss on tablet copy failure

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: tablet copy: Ensure no data loss on tablet copy failure
......................................................................


Patch Set 6: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6732
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie57590e2473e09a6836bb02d5fccb7689614f8e7
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] tablet copy: Ensure no data loss on tablet copy failure

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6732

to look at the new patch set (#3).

Change subject: tablet copy: Ensure no data loss on tablet copy failure
......................................................................

tablet copy: Ensure no data loss on tablet copy failure

This patch adds an integration test to ensure that we don't see data
loss on tablet copy failure. Tablet copy failure is triggered a couple
of different ways at the source server side.

This also implements the ThreadSafeRandom::NextDoubleFraction() method,
which existed in the Random class but was missing from the thread-safe
version.

Change-Id: Ie57590e2473e09a6836bb02d5fccb7689614f8e7
---
M src/kudu/integration-tests/cluster_itest_util.cc
M src/kudu/integration-tests/cluster_itest_util.h
M src/kudu/integration-tests/create-table-stress-test.cc
M src/kudu/integration-tests/delete_tablet-itest.cc
M src/kudu/integration-tests/external_mini_cluster-itest-base.cc
M src/kudu/integration-tests/tablet_copy-itest.cc
M src/kudu/integration-tests/ts_itest-base.h
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/integration-tests/ts_tablet_manager-itest.cc
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_copy_service.cc
M src/kudu/tserver/tablet_copy_service.h
M src/kudu/util/random.h
13 files changed, 301 insertions(+), 26 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/32/6732/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6732
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie57590e2473e09a6836bb02d5fccb7689614f8e7
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] tablet copy: Ensure no data loss on tablet copy failure

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change.

Change subject: tablet copy: Ensure no data loss on tablet copy failure
......................................................................


Patch Set 2:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/6732/2/src/kudu/integration-tests/cluster_itest_util.cc
File src/kudu/integration-tests/cluster_itest_util.cc:

PS2, Line 239: kDeadline - MonoTime::Now()
> Is it MonoDelta::FromSeconds(30) effectively?  If so, then kStartTime and k
Done


PS2, Line 242: const ListTabletServersResponsePB_Entry&
> const auto&  ?
Done


http://gerrit.cloudera.org:8080/#/c/6732/2/src/kudu/integration-tests/tablet_copy-itest.cc
File src/kudu/integration-tests/tablet_copy-itest.cc:

Line 936:   // ensure the specified number of blocks are persisted on one of the replicas.
> IIUC, you're using the minimum block count as a proxy for "has at least one
Yeah but I'd like to force more than 1 block to disk to increase the likelihood of overlap.


Line 953:       WARN_NOT_OK(s, "Unable to list tablets on TS " + entry.first);
> What's the purpose of the warning, exactly? If this is an unexpected case, 
Hmm. It's expected so I guess I can remove it


Line 973:     LOG(INFO) << "Blocks diff: " << blocks_diff;
> That's quite a bit of logging. Maybe KLOG_EVERY_N() or one of its variants 
Changed to KLOG_EVERY_N_SECS()


PS2, Line 987: 
             : INSTANTIATE_TEST_CASE_P(FaultFlags, BadTabletCopyITest,
             :                         ::testing::ValuesIn(tablet_copy_failure_flags));
             : 
             : // This test uses CountBlocksUnderManagement() which only works with the
             : // LogBlockManager.
             : #ifndef __APPLE__
             : 
             : // Ensure that a tablet copy failure results in no orphaned blocks and no data loss.
             : TEST_P(BadTabletCopyITest, TestBadCopy) {
> Would it compile if __APPLE__ is defined?  I would expect the testcase inst
There is a comment above the #ifdef to explain why it's there. However I can move it around the whole test case, sure.


Line 1008:   string failure_flag = GetParam();
> nit: const string& ?
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6732
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie57590e2473e09a6836bb02d5fccb7689614f8e7
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] tablet copy: Ensure no data loss on tablet copy failure

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: tablet copy: Ensure no data loss on tablet copy failure
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6732/1//COMMIT_MSG
Commit Message:

PS1, Line 9: This patch adds an integration test to ensure that we don't see data
           : loss on tablet copy failure. Tablet copy failure is triggered a couple
           : of different ways at the source server side.
If you unrevert the patch that Todd reverted, does this test fail in the desired manner?


-- 
To view, visit http://gerrit.cloudera.org:8080/6732
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie57590e2473e09a6836bb02d5fccb7689614f8e7
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] tablet copy: Ensure no data loss on tablet copy failure

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: tablet copy: Ensure no data loss on tablet copy failure
......................................................................


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/6732/2/src/kudu/integration-tests/cluster_itest_util.cc
File src/kudu/integration-tests/cluster_itest_util.cc:

PS2, Line 239: kDeadline - MonoTime::Now()
Is it MonoDelta::FromSeconds(30) effectively?  If so, then kStartTime and kDeadLine are not needed, if I'm not mistaken.


PS2, Line 242: const ListTabletServersResponsePB_Entry&
const auto&  ?


http://gerrit.cloudera.org:8080/#/c/6732/2/src/kudu/integration-tests/tablet_copy-itest.cc
File src/kudu/integration-tests/tablet_copy-itest.cc:

PS2, Line 987: 
             : INSTANTIATE_TEST_CASE_P(FaultFlags, BadTabletCopyITest,
             :                         ::testing::ValuesIn(tablet_copy_failure_flags));
             : 
             : // This test uses CountBlocksUnderManagement() which only works with the
             : // LogBlockManager.
             : #ifndef __APPLE__
             : 
             : // Ensure that a tablet copy failure results in no orphaned blocks and no data loss.
             : TEST_P(BadTabletCopyITest, TestBadCopy) {
Would it compile if __APPLE__ is defined?  I would expect the testcase instantiation to be under the #ifdef as well to be compilable.

Also, why is it disabled on OS X?  It would be nice to have a comment explaining this a bit.


Line 1008:   string failure_flag = GetParam();
nit: const string& ?


-- 
To view, visit http://gerrit.cloudera.org:8080/6732
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie57590e2473e09a6836bb02d5fccb7689614f8e7
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] tablet copy: Ensure no data loss on tablet copy failure

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: tablet copy: Ensure no data loss on tablet copy failure
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6732/4/src/kudu/integration-tests/tablet_copy-itest.cc
File src/kudu/integration-tests/tablet_copy-itest.cc:

PS4, Line 982: // This test uses CountBlocksUnderManagement() which only works with the
             : // LogBlockManager.
             : #ifndef __APPLE__
Since the BadTabletCopyITest class is not used anywhere else but under this #ifndef, maybe put the class definition under that ifndef as well?


-- 
To view, visit http://gerrit.cloudera.org:8080/6732
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie57590e2473e09a6836bb02d5fccb7689614f8e7
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] tablet copy: Ensure no data loss on tablet copy failure

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6732

to look at the new patch set (#6).

Change subject: tablet copy: Ensure no data loss on tablet copy failure
......................................................................

tablet copy: Ensure no data loss on tablet copy failure

This patch adds an integration test to ensure that we don't see data
loss on tablet copy failure. Tablet copy failure is triggered a couple
of different ways at the source server side.

This also implements the ThreadSafeRandom::NextDoubleFraction() method,
which existed in the Random class but was missing from the thread-safe
version.

Change-Id: Ie57590e2473e09a6836bb02d5fccb7689614f8e7
---
M src/kudu/integration-tests/cluster_itest_util.cc
M src/kudu/integration-tests/cluster_itest_util.h
M src/kudu/integration-tests/create-table-stress-test.cc
M src/kudu/integration-tests/delete_tablet-itest.cc
M src/kudu/integration-tests/external_mini_cluster-itest-base.cc
M src/kudu/integration-tests/tablet_copy-itest.cc
M src/kudu/integration-tests/ts_itest-base.h
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/integration-tests/ts_tablet_manager-itest.cc
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_copy_service.cc
M src/kudu/tserver/tablet_copy_service.h
M src/kudu/util/random.h
13 files changed, 299 insertions(+), 26 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/32/6732/6
-- 
To view, visit http://gerrit.cloudera.org:8080/6732
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie57590e2473e09a6836bb02d5fccb7689614f8e7
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] tablet copy: Ensure no data loss on tablet copy failure

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change.

Change subject: tablet copy: Ensure no data loss on tablet copy failure
......................................................................


Patch Set 1:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/6732/1//COMMIT_MSG
Commit Message:

PS1, Line 9: This patch adds an integration test to ensure that we don't see data
           : loss on tablet copy failure. Tablet copy failure is triggered a couple
           : of different ways at the source server side.
> If you unrevert the patch that Todd reverted, does this test fail in the de
Turns out it was flaky, now it fails reliably


http://gerrit.cloudera.org:8080/#/c/6732/1/src/kudu/integration-tests/tablet_copy-itest.cc
File src/kudu/integration-tests/tablet_copy-itest.cc:

Line 969:   const char* kTableA = "table_a";
> Move this down to L976.
Done


Line 985:   const char* kTableB = "table_b";
> Move down.
Done


Line 1000:   // Shut down replica with only [A].
> Maybe before doing this, assert that you got the replica distribution that 
Done; added to LoadTable()


Line 1022:     // The early timeout flag will also cause the tablet copy to fail, but
> ASSERT that failure_flag is the other flag.
Done


http://gerrit.cloudera.org:8080/#/c/6732/1/src/kudu/tserver/tablet_copy_service.cc
File src/kudu/tserver/tablet_copy_service.cc:

Line 69: TAG_FLAG(tablet_copy_early_session_timeout_prob, unsafe);
> Tag it hidden too. The rest as well (instead of unsafe), if they're suppose
No need; unsafe implies hidden.


Line 162:   // For testing: Close the session prematurely if unsafe gflag is set but
> Is there a particular reason why this must happen here (after the session h
I want this to simulate an expiring session, which means the client should successfully start a session on its side as well.

The reason this is here instead of in the copy data path is so we can probabilistically fail a certain proportion of tablet copy sessions instead of making it dependent on the amount of data being copied.


Line 288:     CHECK_OK(DoEndTabletCopySessionUnlocked(session_id, &app_error));
> Should probably be WARN_NOT_OK.
It can't fail in the current implementation but doesn't hurt to be defensive against future changes; done.


-- 
To view, visit http://gerrit.cloudera.org:8080/6732
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie57590e2473e09a6836bb02d5fccb7689614f8e7
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] tablet copy: Ensure no data loss on tablet copy failure

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6732

to look at the new patch set (#2).

Change subject: tablet copy: Ensure no data loss on tablet copy failure
......................................................................

tablet copy: Ensure no data loss on tablet copy failure

This patch adds an integration test to ensure that we don't see data
loss on tablet copy failure. Tablet copy failure is triggered a couple
of different ways at the source server side.

This also implements the ThreadSafeRandom::NextDoubleFraction() method,
which existed in the Random class but was missing from the thread-safe
version.

Change-Id: Ie57590e2473e09a6836bb02d5fccb7689614f8e7
---
M src/kudu/integration-tests/cluster_itest_util.cc
M src/kudu/integration-tests/cluster_itest_util.h
M src/kudu/integration-tests/create-table-stress-test.cc
M src/kudu/integration-tests/delete_tablet-itest.cc
M src/kudu/integration-tests/external_mini_cluster-itest-base.cc
M src/kudu/integration-tests/tablet_copy-itest.cc
M src/kudu/integration-tests/ts_itest-base.h
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/integration-tests/ts_tablet_manager-itest.cc
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_copy_service.cc
M src/kudu/tserver/tablet_copy_service.h
M src/kudu/util/random.h
13 files changed, 299 insertions(+), 23 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/32/6732/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6732
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie57590e2473e09a6836bb02d5fccb7689614f8e7
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] tablet copy: Ensure no data loss on tablet copy failure

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: tablet copy: Ensure no data loss on tablet copy failure
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6732/5/src/kudu/integration-tests/tablet_copy-itest.cc
File src/kudu/integration-tests/tablet_copy-itest.cc:

Line 965:       WARN_NOT_OK(s, "Unable to list tablets on TS " + entry.first);
The warning is still here; I thought you were going to remove it?


-- 
To view, visit http://gerrit.cloudera.org:8080/6732
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie57590e2473e09a6836bb02d5fccb7689614f8e7
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] tablet copy: Ensure no data loss on tablet copy failure

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: tablet copy: Ensure no data loss on tablet copy failure
......................................................................


Patch Set 6: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6732
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie57590e2473e09a6836bb02d5fccb7689614f8e7
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] tablet copy: Ensure no data loss on tablet copy failure

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: tablet copy: Ensure no data loss on tablet copy failure
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6732/2/src/kudu/integration-tests/tablet_copy-itest.cc
File src/kudu/integration-tests/tablet_copy-itest.cc:

Line 953:       WARN_NOT_OK(s, "Unable to list tablets on TS " + entry.first);
> Hmm. It's expected so I guess I can remove it
Still there as of PS3.


-- 
To view, visit http://gerrit.cloudera.org:8080/6732
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie57590e2473e09a6836bb02d5fccb7689614f8e7
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] tablet copy: Ensure no data loss on tablet copy failure

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: tablet copy: Ensure no data loss on tablet copy failure
......................................................................


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/6732/2/src/kudu/integration-tests/tablet_copy-itest.cc
File src/kudu/integration-tests/tablet_copy-itest.cc:

Line 936:   // ensure the specified number of blocks are persisted on one of the replicas.
IIUC, you're using the minimum block count as a proxy for "has at least one block flushed to disk?" Maybe you can use "bytes_flushed" instead? That's not LBM-specific, and so you won't need the __APPLE__ guard.


Line 953:       WARN_NOT_OK(s, "Unable to list tablets on TS " + entry.first);
What's the purpose of the warning, exactly? If this is an unexpected case, perhaps we should ASSERT_OK()? If it's expected, why warn?


Line 973:     LOG(INFO) << "Blocks diff: " << blocks_diff;
That's quite a bit of logging. Maybe KLOG_EVERY_N() or one of its variants would be useful here?


-- 
To view, visit http://gerrit.cloudera.org:8080/6732
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie57590e2473e09a6836bb02d5fccb7689614f8e7
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] tablet copy: Ensure no data loss on tablet copy failure

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has submitted this change and it was merged.

Change subject: tablet copy: Ensure no data loss on tablet copy failure
......................................................................


tablet copy: Ensure no data loss on tablet copy failure

This patch adds an integration test to ensure that we don't see data
loss on tablet copy failure. Tablet copy failure is triggered a couple
of different ways at the source server side.

This also implements the ThreadSafeRandom::NextDoubleFraction() method,
which existed in the Random class but was missing from the thread-safe
version.

Change-Id: Ie57590e2473e09a6836bb02d5fccb7689614f8e7
Reviewed-on: http://gerrit.cloudera.org:8080/6732
Reviewed-by: Adar Dembo <ad...@cloudera.com>
Tested-by: Kudu Jenkins
Reviewed-by: Alexey Serbin <as...@cloudera.com>
---
M src/kudu/integration-tests/cluster_itest_util.cc
M src/kudu/integration-tests/cluster_itest_util.h
M src/kudu/integration-tests/create-table-stress-test.cc
M src/kudu/integration-tests/delete_tablet-itest.cc
M src/kudu/integration-tests/external_mini_cluster-itest-base.cc
M src/kudu/integration-tests/tablet_copy-itest.cc
M src/kudu/integration-tests/ts_itest-base.h
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/integration-tests/ts_tablet_manager-itest.cc
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_copy_service.cc
M src/kudu/tserver/tablet_copy_service.h
M src/kudu/util/random.h
13 files changed, 299 insertions(+), 26 deletions(-)

Approvals:
  Adar Dembo: Looks good to me, approved
  Alexey Serbin: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/6732
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ie57590e2473e09a6836bb02d5fccb7689614f8e7
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] tablet copy: Ensure no data loss on tablet copy failure

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6732

to look at the new patch set (#5).

Change subject: tablet copy: Ensure no data loss on tablet copy failure
......................................................................

tablet copy: Ensure no data loss on tablet copy failure

This patch adds an integration test to ensure that we don't see data
loss on tablet copy failure. Tablet copy failure is triggered a couple
of different ways at the source server side.

This also implements the ThreadSafeRandom::NextDoubleFraction() method,
which existed in the Random class but was missing from the thread-safe
version.

Change-Id: Ie57590e2473e09a6836bb02d5fccb7689614f8e7
---
M src/kudu/integration-tests/cluster_itest_util.cc
M src/kudu/integration-tests/cluster_itest_util.h
M src/kudu/integration-tests/create-table-stress-test.cc
M src/kudu/integration-tests/delete_tablet-itest.cc
M src/kudu/integration-tests/external_mini_cluster-itest-base.cc
M src/kudu/integration-tests/tablet_copy-itest.cc
M src/kudu/integration-tests/ts_itest-base.h
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/integration-tests/ts_tablet_manager-itest.cc
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tserver/tablet_copy_service.cc
M src/kudu/tserver/tablet_copy_service.h
M src/kudu/util/random.h
13 files changed, 300 insertions(+), 26 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/32/6732/5
-- 
To view, visit http://gerrit.cloudera.org:8080/6732
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie57590e2473e09a6836bb02d5fccb7689614f8e7
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] tablet copy: Ensure no data loss on tablet copy failure

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change.

Change subject: tablet copy: Ensure no data loss on tablet copy failure
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6732/4/src/kudu/integration-tests/tablet_copy-itest.cc
File src/kudu/integration-tests/tablet_copy-itest.cc:

PS4, Line 982: // This test uses CountBlocksUnderManagement() which only works with the
             : // LogBlockManager.
             : #ifndef __APPLE__
> Since the BadTabletCopyITest class is not used anywhere else but under this
oops, sure


-- 
To view, visit http://gerrit.cloudera.org:8080/6732
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie57590e2473e09a6836bb02d5fccb7689614f8e7
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] tablet copy: Ensure no data loss on tablet copy failure

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: tablet copy: Ensure no data loss on tablet copy failure
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6732/2/src/kudu/integration-tests/tablet_copy-itest.cc
File src/kudu/integration-tests/tablet_copy-itest.cc:

PS2, Line 987: 
             : INSTANTIATE_TEST_CASE_P(FaultFlags, BadTabletCopyITest,
             :                         ::testing::ValuesIn(tablet_copy_failure_flags));
             : 
             : // This test uses CountBlocksUnderManagement() which only works with the
             : // LogBlockManager.
             : #ifndef __APPLE__
             : 
             : // Ensure that a tablet copy failure results in no orphaned blocks and no data loss.
             : TEST_P(BadTabletCopyITest, TestBadCopy) {
> There is a comment above the #ifdef to explain why it's there. However I ca
Uh-oh, somehow I missed that, sorry.

Yep, if that test is not to be run on OS X, may be it's worth if-deffing everything else around related to it.


-- 
To view, visit http://gerrit.cloudera.org:8080/6732
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie57590e2473e09a6836bb02d5fccb7689614f8e7
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] tablet copy: Ensure no data loss on tablet copy failure

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: tablet copy: Ensure no data loss on tablet copy failure
......................................................................


Patch Set 1:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/6732/1/src/kudu/integration-tests/tablet_copy-itest.cc
File src/kudu/integration-tests/tablet_copy-itest.cc:

Line 969:   const char* kTableA = "table_a";
Move this down to L976.


Line 985:   const char* kTableB = "table_b";
Move down.


Line 1000:   // Shut down replica with only [A].
Maybe before doing this, assert that you got the replica distribution that you wanted?


Line 1022:     // The early timeout flag will also cause the tablet copy to fail, but
ASSERT that failure_flag is the other flag.


http://gerrit.cloudera.org:8080/#/c/6732/1/src/kudu/tserver/tablet_copy_service.cc
File src/kudu/tserver/tablet_copy_service.cc:

Line 69: TAG_FLAG(tablet_copy_early_session_timeout_prob, unsafe);
Tag it hidden too. The rest as well (instead of unsafe), if they're supposed to be testing-only.


Line 162:   // For testing: Close the session prematurely if unsafe gflag is set but
Is there a particular reason why this must happen here (after the session has been created) vs. above, where responding with a failure doesn't require tearing down a session? Plus responding earlier with RPC_RETURN_NOT_OK means app_error will actually make it back to the client.


Line 288:     CHECK_OK(DoEndTabletCopySessionUnlocked(session_id, &app_error));
Should probably be WARN_NOT_OK.


-- 
To view, visit http://gerrit.cloudera.org:8080/6732
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie57590e2473e09a6836bb02d5fccb7689614f8e7
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes