You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Andrew Wong (Code Review)" <ge...@cloudera.org> on 2017/09/07 09:42:09 UTC

[kudu-CR] Fix flakiness of ts tablet manager itest TestFailedTabletsAreReplaced

Andrew Wong has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7993

Change subject: Fix flakiness of ts_tablet_manager_itest TestFailedTabletsAreReplaced
......................................................................

Fix flakiness of ts_tablet_manager_itest TestFailedTabletsAreReplaced

TestFailedTabletsAreReplaced manually fails the replica after only
verifying that the tablet exists, with no regard for its state. This can
cause the replica's bootstrap process to fail a check:
F0907 00:05:46.153576  2697 tablet_replica.cc:173] Check failed: BOOTSTRAPPING == state_ (0 vs. 2)

This is a test-only race where the replica successfully goes through the
bootstrap process, the tablet is failed in test, and the
TabletReplica::Start() is called on the replica, which requires its
state to be BOOTSTRAPPING. This is not an issue seen in production, as
bootstrapping is normally only run if the replica is not failed, but it
did result in 6/1000 failures when run in release mode with
--stress_cpu_thres=32.

To fix this, the replica is failed only after the it is verified to be
running. In doing so, the number of failures went from 6/1000 to 0/1000.

Change-Id: I93b41c8196397ea5af42ed9e2aa47e967f7a520e
---
M src/kudu/integration-tests/ts_tablet_manager-itest.cc
1 file changed, 14 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/93/7993/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7993
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I93b41c8196397ea5af42ed9e2aa47e967f7a520e
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>

[kudu-CR] Fix flakiness of ts tablet manager itest TestFailedTabletsAreReplaced

Posted by "Hao Hao (Code Review)" <ge...@cloudera.org>.
Hao Hao has posted comments on this change.

Change subject: Fix flakiness of ts_tablet_manager_itest TestFailedTabletsAreReplaced
......................................................................


Patch Set 2: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/7993
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I93b41c8196397ea5af42ed9e2aa47e967f7a520e
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-HasComments: No

[kudu-CR] Fix flakiness of ts tablet manager itest TestFailedTabletsAreReplaced

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change.

Change subject: Fix flakiness of ts_tablet_manager_itest TestFailedTabletsAreReplaced
......................................................................


Patch Set 2: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/7993
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I93b41c8196397ea5af42ed9e2aa47e967f7a520e
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-HasComments: No

[kudu-CR] Fix flakiness of ts tablet manager itest TestFailedTabletsAreReplaced

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/7993

to look at the new patch set (#2).

Change subject: Fix flakiness of ts_tablet_manager_itest TestFailedTabletsAreReplaced
......................................................................

Fix flakiness of ts_tablet_manager_itest TestFailedTabletsAreReplaced

TestFailedTabletsAreReplaced manually fails the replica after only
verifying that the tablet exists, with no regard for its state. This can
cause the replica's bootstrap process to fail a check:
F0907 00:05:46.153576  2697 tablet_replica.cc:173] Check failed: BOOTSTRAPPING == state_ (0 vs. 2)

This is a test-only race where the replica successfully goes through the
bootstrap process, the tablet is failed in test, and
TabletReplica::Start() is called on the replica, which requires its
state to be BOOTSTRAPPING. This is not an issue seen in production, as
bootstrapping is normally only run if the replica is not failed, but it
did result in 6/1000 failures when run in release mode with
--stress_cpu_thres=32.

To fix this, the replica is failed only after it is verified to be
running. In doing so, the number of failures went from 6/1000 to 0/1000.

Change-Id: I93b41c8196397ea5af42ed9e2aa47e967f7a520e
---
M src/kudu/integration-tests/ts_tablet_manager-itest.cc
1 file changed, 16 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/93/7993/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7993
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I93b41c8196397ea5af42ed9e2aa47e967f7a520e
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>

[kudu-CR] Fix flakiness of ts tablet manager itest TestFailedTabletsAreReplaced

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has submitted this change and it was merged.

Change subject: Fix flakiness of ts_tablet_manager_itest TestFailedTabletsAreReplaced
......................................................................


Fix flakiness of ts_tablet_manager_itest TestFailedTabletsAreReplaced

TestFailedTabletsAreReplaced manually fails the replica after only
verifying that the tablet exists, with no regard for its state. This can
cause the replica's bootstrap process to fail a check:
F0907 00:05:46.153576  2697 tablet_replica.cc:173] Check failed: BOOTSTRAPPING == state_ (0 vs. 2)

This is a test-only race where the replica successfully goes through the
bootstrap process, the tablet is failed in test, and
TabletReplica::Start() is called on the replica, which requires its
state to be BOOTSTRAPPING. This is not an issue seen in production, as
bootstrapping is normally only run if the replica is not failed, but it
did result in 6/1000 failures when run in release mode with
--stress_cpu_thres=32.

To fix this, the replica is failed only after it is verified to be
running. In doing so, the number of failures went from 6/1000 to 0/1000.

Change-Id: I93b41c8196397ea5af42ed9e2aa47e967f7a520e
Reviewed-on: http://gerrit.cloudera.org:8080/7993
Tested-by: Kudu Jenkins
Reviewed-by: Adar Dembo <ad...@cloudera.com>
Reviewed-by: Hao Hao <ha...@cloudera.com>
Reviewed-by: Mike Percy <mp...@apache.org>
---
M src/kudu/integration-tests/ts_tablet_manager-itest.cc
1 file changed, 16 insertions(+), 7 deletions(-)

Approvals:
  Hao Hao: Looks good to me, but someone else must approve
  Mike Percy: Looks good to me, approved
  Adar Dembo: Looks good to me, but someone else must approve
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/7993
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I93b41c8196397ea5af42ed9e2aa47e967f7a520e
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>

[kudu-CR] Fix flakiness of ts tablet manager itest TestFailedTabletsAreReplaced

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change.

Change subject: Fix flakiness of ts_tablet_manager_itest TestFailedTabletsAreReplaced
......................................................................


Patch Set 2:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/7993/1//COMMIT_MSG
Commit Message:

PS1, Line 15: 
> Nit: drop this
Done


PS1, Line 22: it 
> Nit: drop
Done


http://gerrit.cloudera.org:8080/#/c/7993/1/src/kudu/integration-tests/ts_tablet_manager-itest.cc
File src/kudu/integration-tests/ts_tablet_manager-itest.cc:

Line 146:     AssertEventually([&]{
> When using this form of AssertEventually you need to manually call NO_PENDI
Done


Line 150:     }, MonoDelta::FromSeconds(60));
> Is it important to override the default timeout value of 30s?
I ran into 3/1000 failures when it was 30s (couldn't get to RUNNING because it was still bootstrapping or copying, not the flake failure).


Line 153:   wait_until_running();
> Nit: add a blank line just before this.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7993
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I93b41c8196397ea5af42ed9e2aa47e967f7a520e
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] Fix flakiness of ts tablet manager itest TestFailedTabletsAreReplaced

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: Fix flakiness of ts_tablet_manager_itest TestFailedTabletsAreReplaced
......................................................................


Patch Set 2: Code-Review+1

Will defer to Mike.

-- 
To view, visit http://gerrit.cloudera.org:8080/7993
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I93b41c8196397ea5af42ed9e2aa47e967f7a520e
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-HasComments: No

[kudu-CR] Fix flakiness of ts tablet manager itest TestFailedTabletsAreReplaced

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: Fix flakiness of ts_tablet_manager_itest TestFailedTabletsAreReplaced
......................................................................


Patch Set 1:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/7993/1//COMMIT_MSG
Commit Message:

PS1, Line 15: the
Nit: drop this


PS1, Line 22: the
Nit: drop


http://gerrit.cloudera.org:8080/#/c/7993/1/src/kudu/integration-tests/ts_tablet_manager-itest.cc
File src/kudu/integration-tests/ts_tablet_manager-itest.cc:

Line 146:     AssertEventually([&]{
When using this form of AssertEventually you need to manually call NO_PENDING_FATALS() after it.


Line 150:     }, MonoDelta::FromSeconds(60));
Is it important to override the default timeout value of 30s?


Line 153:   {
Nit: add a blank line just before this.


-- 
To view, visit http://gerrit.cloudera.org:8080/7993
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I93b41c8196397ea5af42ed9e2aa47e967f7a520e
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-HasComments: Yes