You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Alexey Serbin (Code Review)" <ge...@cloudera.org> on 2017/11/28 06:49:47 UTC

[kudu-CR] KUDU-1097: 'gone-and-back tablet server' test scenario

Alexey Serbin has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8664


Change subject: KUDU-1097: 'gone-and-back tablet server' test scenario
......................................................................

KUDU-1097: 'gone-and-back tablet server' test scenario

Added a new test scenario for the new 3-4-3 re-replication
scheme.  The scenario addresses the situation when a tablet
server has not been running for some time, a bit over the
FLAGS_follower_unavailable_considered_failed_sec interval,
and then it comes back before the newly added non-voter replicas
are promoted.  As a result, the original voter replicas from
the tablet server should stay, but the newly added non-voter replicas
should be evicted.

Change-Id: I35eb6a0c7de5bfef962b5e96857c3f9c85a1a7b0
---
M src/kudu/integration-tests/raft_consensus_nonvoter-itest.cc
1 file changed, 95 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/64/8664/1
-- 
To view, visit http://gerrit.cloudera.org:8080/8664
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I35eb6a0c7de5bfef962b5e96857c3f9c85a1a7b0
Gerrit-Change-Number: 8664
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>

[kudu-CR] KUDU-1097: 'gone-and-back tablet server' test scenario

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/8664 )

Change subject: KUDU-1097: 'gone-and-back tablet server' test scenario
......................................................................


Patch Set 2:

(2 comments)

looks good

http://gerrit.cloudera.org:8080/#/c/8664/2/src/kudu/integration-tests/raft_consensus_nonvoter-itest.cc
File src/kudu/integration-tests/raft_consensus_nonvoter-itest.cc:

http://gerrit.cloudera.org:8080/#/c/8664/2/src/kudu/integration-tests/raft_consensus_nonvoter-itest.cc@1059
PS2, Line 1059: The catalog
              : // manager should spawn non-voter replicas to replace the non-responsive
              : // replicas, but as soon as the tablet server is back while the newly added
              : // non-voter replicas are still copying data, the catalog manager should detect
              : // the excess of replicas and evict the newly added non-voter replicas.
I think the explanation for this part is better in the commit message.


http://gerrit.cloudera.org:8080/#/c/8664/2/src/kudu/integration-tests/raft_consensus_nonvoter-itest.cc@1144
PS2, Line 1144:   NO_FATALS(cluster_->AssertNoCrashes());
Before exiting we should do something to ensure that the right replica got evicted, like:

  ASSERT_OK(GetConsensusState(ts, tablet_id, kTimeout, &cstate));
  ASSERT_TRUE(IsRaftConfigMember(ts_with_replica->uuid(), cstate.committed_config()));



-- 
To view, visit http://gerrit.cloudera.org:8080/8664
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I35eb6a0c7de5bfef962b5e96857c3f9c85a1a7b0
Gerrit-Change-Number: 8664
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Comment-Date: Tue, 28 Nov 2017 21:24:05 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1097: 'gone-and-back tablet server' test scenario

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/8664 )

Change subject: KUDU-1097: 'gone-and-back tablet server' test scenario
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8664/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8664/2//COMMIT_MSG@10
PS2, Line 10: The scenario addresses the situation when a tablet server has not been
            : running for some time (e.g., a bit over the time interval specified by
            : the 'follower_unavailable_considered_failed_sec' flag), and then it
            : comes back before the newly added non-voter replicas are promoted.
            : As a result, the original voter replicas from the tablet server should
            : stay, but the newly added non-voter replicas should be evicted.
This is a great explanation of the test. Would you mind putting this part of the description in the test comment?



-- 
To view, visit http://gerrit.cloudera.org:8080/8664
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I35eb6a0c7de5bfef962b5e96857c3f9c85a1a7b0
Gerrit-Change-Number: 8664
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Comment-Date: Tue, 28 Nov 2017 21:12:43 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1097: 'gone-and-back tablet server' test scenario

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/8664 )

Change subject: KUDU-1097: 'gone-and-back tablet server' test scenario
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8664/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8664/2//COMMIT_MSG@10
PS2, Line 10: The scenario addresses the situation when a tablet server has not been
            : running for some time (e.g., a bit over the time interval specified by
            : the 'follower_unavailable_considered_failed_sec' flag), and then it
            : comes back before the newly added non-voter replicas are promoted.
            : As a result, the original voter replicas from the tablet server should
            : stay, but the newly added non-voter replicas should be evicted.
> This is a great explanation of the test. Would you mind putting this part o
Sure, why not.  I'll replace the current comment with this part.



-- 
To view, visit http://gerrit.cloudera.org:8080/8664
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I35eb6a0c7de5bfef962b5e96857c3f9c85a1a7b0
Gerrit-Change-Number: 8664
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Comment-Date: Tue, 28 Nov 2017 21:15:19 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1097: 'gone-and-back tablet server' test scenario

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/8664 )

Change subject: KUDU-1097: 'gone-and-back tablet server' test scenario
......................................................................

KUDU-1097: 'gone-and-back tablet server' test scenario

Added a new test scenario for the new 3-4-3 re-replication scheme.
The scenario addresses the situation when a tablet server has not been
running for some time (e.g., a bit over the time interval specified by
the 'follower_unavailable_considered_failed_sec' flag), and then it
comes back before the newly added non-voter replicas are promoted.
As a result, the original voter replicas from the tablet server should
stay, but the newly added non-voter replicas should be evicted.

Change-Id: I35eb6a0c7de5bfef962b5e96857c3f9c85a1a7b0
Reviewed-on: http://gerrit.cloudera.org:8080/8664
Tested-by: Alexey Serbin <as...@cloudera.com>
Reviewed-by: Mike Percy <mp...@apache.org>
---
M src/kudu/integration-tests/raft_consensus_nonvoter-itest.cc
1 file changed, 122 insertions(+), 3 deletions(-)

Approvals:
  Alexey Serbin: Verified
  Mike Percy: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/8664
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I35eb6a0c7de5bfef962b5e96857c3f9c85a1a7b0
Gerrit-Change-Number: 8664
Gerrit-PatchSet: 5
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>

[kudu-CR] KUDU-1097: 'gone-and-back tablet server' test scenario

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Mike Percy, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8664

to look at the new patch set (#3).

Change subject: KUDU-1097: 'gone-and-back tablet server' test scenario
......................................................................

KUDU-1097: 'gone-and-back tablet server' test scenario

Added a new test scenario for the new 3-4-3 re-replication scheme.
The scenario addresses the situation when a tablet server has not been
running for some time (e.g., a bit over the time interval specified by
the 'follower_unavailable_considered_failed_sec' flag), and then it
comes back before the newly added non-voter replicas are promoted.
As a result, the original voter replicas from the tablet server should
stay, but the newly added non-voter replicas should be evicted.

Change-Id: I35eb6a0c7de5bfef962b5e96857c3f9c85a1a7b0
---
M src/kudu/integration-tests/raft_consensus_nonvoter-itest.cc
1 file changed, 121 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/64/8664/3
-- 
To view, visit http://gerrit.cloudera.org:8080/8664
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I35eb6a0c7de5bfef962b5e96857c3f9c85a1a7b0
Gerrit-Change-Number: 8664
Gerrit-PatchSet: 3
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>

[kudu-CR] KUDU-1097: 'gone-and-back tablet server' test scenario

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Mike Percy, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8664

to look at the new patch set (#2).

Change subject: KUDU-1097: 'gone-and-back tablet server' test scenario
......................................................................

KUDU-1097: 'gone-and-back tablet server' test scenario

Added a new test scenario for the new 3-4-3 re-replication scheme.
The scenario addresses the situation when a tablet server has not been
running for some time (e.g., a bit over the time interval specified by
the 'follower_unavailable_considered_failed_sec' flag), and then it
comes back before the newly added non-voter replicas are promoted.
As a result, the original voter replicas from the tablet server should
stay, but the newly added non-voter replicas should be evicted.

Change-Id: I35eb6a0c7de5bfef962b5e96857c3f9c85a1a7b0
---
M src/kudu/integration-tests/raft_consensus_nonvoter-itest.cc
1 file changed, 107 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/64/8664/2
-- 
To view, visit http://gerrit.cloudera.org:8080/8664
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I35eb6a0c7de5bfef962b5e96857c3f9c85a1a7b0
Gerrit-Change-Number: 8664
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>

[kudu-CR] KUDU-1097: 'gone-and-back tablet server' test scenario

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/8664 )

Change subject: KUDU-1097: 'gone-and-back tablet server' test scenario
......................................................................


Patch Set 4: Verified+1

Unrelated flake in OpenReadonlyFsITest.TestWriteAndVerify due to NTP error:

F1128 22:56:33.242112 22843 master_main.cc:74] Check failed: _s.ok() Bad status: Service unavailable: Cannot initialize clock: Error reading clock. Clock considered unsynchronized
*** Check failure stack trace: ***                                              
    @     0x7efcf4e9362d  google::LogMessage::Fail() at ??:0                    
    @     0x7efcf4e9564c  google::LogMessage::SendToLog() at ??:0               
    @     0x7efcf4e93189  google::LogMessage::Flush() at ??:0                   
    @     0x7efcf4e95fdf  google::LogMessageFatal::~LogMessageFatal() at ??:0   
    @           0x404fdc  kudu::master::MasterMain() at ??:0                    
    @           0x4053ab  main at ??:0                                          
    @     0x7efcf45c5f45  __libc_start_main at ??:0                             
    @           0x404a49  (unknown) at ??:0                                     
    @              (nil)  (unknown)                                             
/home/jenkins-slave/workspace/kudu-master/3/src/kudu/integration-tests/open-readonly-fs-itest.cc:84: Failure
Failed


-- 
To view, visit http://gerrit.cloudera.org:8080/8664
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I35eb6a0c7de5bfef962b5e96857c3f9c85a1a7b0
Gerrit-Change-Number: 8664
Gerrit-PatchSet: 4
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Comment-Date: Tue, 28 Nov 2017 23:17:18 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-1097: 'gone-and-back tablet server' test scenario

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/8664 )

Change subject: KUDU-1097: 'gone-and-back tablet server' test scenario
......................................................................


Patch Set 4: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/8664
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I35eb6a0c7de5bfef962b5e96857c3f9c85a1a7b0
Gerrit-Change-Number: 8664
Gerrit-PatchSet: 4
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Comment-Date: Wed, 29 Nov 2017 01:15:13 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-1097: 'gone-and-back tablet server' test scenario

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has removed Kudu Jenkins from this change.  ( http://gerrit.cloudera.org:8080/8664 )

Change subject: KUDU-1097: 'gone-and-back tablet server' test scenario
......................................................................


Removed reviewer Kudu Jenkins with the following votes:

* Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/8664
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteReviewer
Gerrit-Change-Id: I35eb6a0c7de5bfef962b5e96857c3f9c85a1a7b0
Gerrit-Change-Number: 8664
Gerrit-PatchSet: 4
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>

[kudu-CR] KUDU-1097: 'gone-and-back tablet server' test scenario

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Mike Percy, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8664

to look at the new patch set (#4).

Change subject: KUDU-1097: 'gone-and-back tablet server' test scenario
......................................................................

KUDU-1097: 'gone-and-back tablet server' test scenario

Added a new test scenario for the new 3-4-3 re-replication scheme.
The scenario addresses the situation when a tablet server has not been
running for some time (e.g., a bit over the time interval specified by
the 'follower_unavailable_considered_failed_sec' flag), and then it
comes back before the newly added non-voter replicas are promoted.
As a result, the original voter replicas from the tablet server should
stay, but the newly added non-voter replicas should be evicted.

Change-Id: I35eb6a0c7de5bfef962b5e96857c3f9c85a1a7b0
---
M src/kudu/integration-tests/raft_consensus_nonvoter-itest.cc
1 file changed, 122 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/64/8664/4
-- 
To view, visit http://gerrit.cloudera.org:8080/8664
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I35eb6a0c7de5bfef962b5e96857c3f9c85a1a7b0
Gerrit-Change-Number: 8664
Gerrit-PatchSet: 4
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>

[kudu-CR] KUDU-1097: 'gone-and-back tablet server' test scenario

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/8664 )

Change subject: KUDU-1097: 'gone-and-back tablet server' test scenario
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8664/2/src/kudu/integration-tests/raft_consensus_nonvoter-itest.cc
File src/kudu/integration-tests/raft_consensus_nonvoter-itest.cc:

http://gerrit.cloudera.org:8080/#/c/8664/2/src/kudu/integration-tests/raft_consensus_nonvoter-itest.cc@1059
PS2, Line 1059: The catalog
              : // manager should spawn non-voter replicas to replace the non-responsive
              : // replicas, but as soon as the tablet server is back while the newly added
              : // non-voter replicas are still copying data, the catalog manager should detect
              : // the excess of replicas and evict the newly added non-voter replicas.
> I think the explanation for this part is better in the commit message.
Replaced.


http://gerrit.cloudera.org:8080/#/c/8664/2/src/kudu/integration-tests/raft_consensus_nonvoter-itest.cc@1144
PS2, Line 1144:   NO_FATALS(cluster_->AssertNoCrashes());
> Before exiting we should do something to ensure that the right replica got 
Ah, sure, thanks!  I just verified that once based on the logs, but it's crucial to automate that part as well.



-- 
To view, visit http://gerrit.cloudera.org:8080/8664
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I35eb6a0c7de5bfef962b5e96857c3f9c85a1a7b0
Gerrit-Change-Number: 8664
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Comment-Date: Tue, 28 Nov 2017 22:28:26 +0000
Gerrit-HasComments: Yes