You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Alexey Serbin (Code Review)" <ge...@cloudera.org> on 2018/07/25 03:29:20 UTC

[kudu-CR] KUDU-2511 fix for SingleReplicasStayOrMove scenario

Alexey Serbin has uploaded this change for review. ( http://gerrit.cloudera.org:8080/11046


Change subject: KUDU-2511 fix for SingleReplicasStayOrMove scenario
......................................................................

KUDU-2511 fix for SingleReplicasStayOrMove scenario

The RebalancerAndSingleReplicaTablets.SingleReplicasStayOrMove
scenario had a rare flake: sometimes after the restart of tablet
servers that were shutdown to create unbalanced tablet distribution,
ksck result would be inconsistent because it fetched information
on tablet servers and tables, but failed to fetch consensus info
for corresponding tablets.  In that case, the input for the
rebalancing algorithm would contain no table/tablet information,
so the rebalancer would not move a single replica.

The issue with the consistency of the ksck results will be addressed
in a separate patch.

Change-Id: Ibfa74d5a78e89bf7ff1a0e914384999466460145
---
M src/kudu/tools/kudu-admin-test.cc
1 file changed, 19 insertions(+), 4 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/46/11046/1
-- 
To view, visit http://gerrit.cloudera.org:8080/11046
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ibfa74d5a78e89bf7ff1a0e914384999466460145
Gerrit-Change-Number: 11046
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>

[kudu-CR] KUDU-2511 fix for SingleReplicasStayOrMove scenario

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/11046 )

Change subject: KUDU-2511 fix for SingleReplicasStayOrMove scenario
......................................................................


Patch Set 1: Verified+1

Unrelated flake in:
  org.apache.kudu.backup.TestKuduBackup.Random Backup and Restore


-- 
To view, visit http://gerrit.cloudera.org:8080/11046
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibfa74d5a78e89bf7ff1a0e914384999466460145
Gerrit-Change-Number: 11046
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Comment-Date: Wed, 25 Jul 2018 05:11:27 +0000
Gerrit-HasComments: No

[kudu-CR] [kudu-admin-test] improvements on error reporting

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has removed Kudu Jenkins from this change.  ( http://gerrit.cloudera.org:8080/11046 )

Change subject: [kudu-admin-test] improvements on error reporting
......................................................................


Removed reviewer Kudu Jenkins with the following votes:

* Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/11046
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteReviewer
Gerrit-Change-Id: Ibfa74d5a78e89bf7ff1a0e914384999466460145
Gerrit-Change-Number: 11046
Gerrit-PatchSet: 3
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>

[kudu-CR] [kudu-admin-test] improvements on error reporting

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/11046 )

Change subject: [kudu-admin-test] improvements on error reporting
......................................................................


Patch Set 2: Verified+1

Unrelated flakes:
  org.apache.kudu.backup.TestKuduBackup.Random Backup and Restore   org.apache.kudu.client.TestSecurity.testImportExportAuthenticationCredentials


-- 
To view, visit http://gerrit.cloudera.org:8080/11046
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibfa74d5a78e89bf7ff1a0e914384999466460145
Gerrit-Change-Number: 11046
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Comment-Date: Thu, 26 Jul 2018 19:46:58 +0000
Gerrit-HasComments: No

[kudu-CR] [kudu-admin-test] improvements on error reporting

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11046 )

Change subject: [kudu-admin-test] improvements on error reporting
......................................................................


Patch Set 3: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/11046
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibfa74d5a78e89bf7ff1a0e914384999466460145
Gerrit-Change-Number: 11046
Gerrit-PatchSet: 3
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Comment-Date: Mon, 30 Jul 2018 03:54:50 +0000
Gerrit-HasComments: No

[kudu-CR] [kudu-admin-test] improvements on error reporting

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Mike Percy, Andrew Wong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11046

to look at the new patch set (#2).

Change subject: [kudu-admin-test] improvements on error reporting
......................................................................

[kudu-admin-test] improvements on error reporting

This changelist introduces improvements on logging in case
if an assertion triggers in the rebalancer-related scenarios.

I found it's hard to troubleshoot the flakiness reported lately
in the RebalancerAndSingleReplicaTablets.SingleReplicasStayOrMove
scenario.  With hindsight, it turned out that improved logging
on assertions would help a lot.

Change-Id: Ibfa74d5a78e89bf7ff1a0e914384999466460145
---
M src/kudu/tools/kudu-admin-test.cc
1 file changed, 50 insertions(+), 33 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/46/11046/2
-- 
To view, visit http://gerrit.cloudera.org:8080/11046
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibfa74d5a78e89bf7ff1a0e914384999466460145
Gerrit-Change-Number: 11046
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>

[kudu-CR] [kudu-admin-test] improvements on error reporting

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/11046 )

Change subject: [kudu-admin-test] improvements on error reporting
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11046/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11046/1//COMMIT_MSG@12
PS1, Line 12: ksck result would be inconsistent because it fetched information
            : on tablet servers and tables, but failed to fetch consensus info
            : for corresponding tablets
> nit: Mind clarifying what ksck's results being "inconsistent" means? IIUC, 
I'll do that in a separate changelist.  After additional research I found that this test 100% fails if running with Kudu having version like 1.8.0-RC1-SNAPSHOT or alike.

I decided to turn this changelist on improvement logging in case of assertion triggers during those tests, and post a separate changelist to fix that version parsing-related issue.



-- 
To view, visit http://gerrit.cloudera.org:8080/11046
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibfa74d5a78e89bf7ff1a0e914384999466460145
Gerrit-Change-Number: 11046
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Comment-Date: Thu, 26 Jul 2018 17:52:47 +0000
Gerrit-HasComments: Yes

[kudu-CR] [kudu-admin-test] improvements on error reporting

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has removed Kudu Jenkins from this change.  ( http://gerrit.cloudera.org:8080/11046 )

Change subject: [kudu-admin-test] improvements on error reporting
......................................................................


Removed reviewer Kudu Jenkins with the following votes:

* Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/11046
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteReviewer
Gerrit-Change-Id: Ibfa74d5a78e89bf7ff1a0e914384999466460145
Gerrit-Change-Number: 11046
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>

[kudu-CR] [kudu-admin-test] improvements on error reporting

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/11046 )

Change subject: [kudu-admin-test] improvements on error reporting
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11046/2/src/kudu/tools/kudu-admin-test.cc
File src/kudu/tools/kudu-admin-test.cc:

http://gerrit.cloudera.org:8080/#/c/11046/2/src/kudu/tools/kudu-admin-test.cc@1336
PS2, Line 1336: if (out) {
              :     str << " stdout: " << *out;
              :   }
              :   if (err) {
              :     str << " stderr: " << *err;
              :   }
> Seems like these are always non-null, in which case maybe use `const string
That's a good observation -- done.

The expected output from the rebalancer here is multi-line, so adding std::endl makes sense, yes.



-- 
To view, visit http://gerrit.cloudera.org:8080/11046
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibfa74d5a78e89bf7ff1a0e914384999466460145
Gerrit-Change-Number: 11046
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Comment-Date: Mon, 30 Jul 2018 03:19:18 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2511 fix for SingleReplicasStayOrMove scenario

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has removed Kudu Jenkins from this change.  ( http://gerrit.cloudera.org:8080/11046 )

Change subject: KUDU-2511 fix for SingleReplicasStayOrMove scenario
......................................................................


Removed reviewer Kudu Jenkins with the following votes:

* Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/11046
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteReviewer
Gerrit-Change-Id: Ibfa74d5a78e89bf7ff1a0e914384999466460145
Gerrit-Change-Number: 11046
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>

[kudu-CR] KUDU-2511 fix for SingleReplicasStayOrMove scenario

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11046 )

Change subject: KUDU-2511 fix for SingleReplicasStayOrMove scenario
......................................................................


Patch Set 1: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11046/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11046/1//COMMIT_MSG@12
PS1, Line 12: ksck result would be inconsistent because it fetched information
            : on tablet servers and tables, but failed to fetch consensus info
            : for corresponding tablets
nit: Mind clarifying what ksck's results being "inconsistent" means? IIUC, this means the table information exists, but the tablets for it do not yet exist (e.g. because they haven't yet been reported, or the tablets are bootstrapping). And since the rebalancing algorithm requires both table and tablet information to make moves, such unreported tablets are not moved.

That sort of explanation might also make sense in the test itself.



-- 
To view, visit http://gerrit.cloudera.org:8080/11046
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibfa74d5a78e89bf7ff1a0e914384999466460145
Gerrit-Change-Number: 11046
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Comment-Date: Wed, 25 Jul 2018 18:56:29 +0000
Gerrit-HasComments: Yes

[kudu-CR] [kudu-admin-test] improvements on error reporting

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11046 )

Change subject: [kudu-admin-test] improvements on error reporting
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11046/2/src/kudu/tools/kudu-admin-test.cc
File src/kudu/tools/kudu-admin-test.cc:

http://gerrit.cloudera.org:8080/#/c/11046/2/src/kudu/tools/kudu-admin-test.cc@1336
PS2, Line 1336: if (out) {
              :     str << " stdout: " << *out;
              :   }
              :   if (err) {
              :     str << " stderr: " << *err;
              :   }
Seems like these are always non-null, in which case maybe use `const string&` and always attach them to `str`?
Also how does the output look? Maybe would it help to separate with some newlines or do extra formatting?



-- 
To view, visit http://gerrit.cloudera.org:8080/11046
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibfa74d5a78e89bf7ff1a0e914384999466460145
Gerrit-Change-Number: 11046
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Comment-Date: Sat, 28 Jul 2018 17:49:51 +0000
Gerrit-HasComments: Yes

[kudu-CR] [kudu-admin-test] improvements on error reporting

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/11046 )

Change subject: [kudu-admin-test] improvements on error reporting
......................................................................

[kudu-admin-test] improvements on error reporting

This changelist introduces improvements on logging in case
if an assertion triggers in the rebalancer-related scenarios.

I found it's hard to troubleshoot the flakiness reported lately
in the RebalancerAndSingleReplicaTablets.SingleReplicasStayOrMove
scenario.  With hindsight, it turned out that improved logging
on assertions would help a lot.

Change-Id: Ibfa74d5a78e89bf7ff1a0e914384999466460145
Reviewed-on: http://gerrit.cloudera.org:8080/11046
Reviewed-by: Andrew Wong <aw...@cloudera.com>
Tested-by: Alexey Serbin <as...@cloudera.com>
---
M src/kudu/tools/kudu-admin-test.cc
1 file changed, 47 insertions(+), 33 deletions(-)

Approvals:
  Andrew Wong: Looks good to me, approved
  Alexey Serbin: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/11046
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ibfa74d5a78e89bf7ff1a0e914384999466460145
Gerrit-Change-Number: 11046
Gerrit-PatchSet: 4
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>

[kudu-CR] [kudu-admin-test] improvements on error reporting

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Mike Percy, Andrew Wong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11046

to look at the new patch set (#3).

Change subject: [kudu-admin-test] improvements on error reporting
......................................................................

[kudu-admin-test] improvements on error reporting

This changelist introduces improvements on logging in case
if an assertion triggers in the rebalancer-related scenarios.

I found it's hard to troubleshoot the flakiness reported lately
in the RebalancerAndSingleReplicaTablets.SingleReplicasStayOrMove
scenario.  With hindsight, it turned out that improved logging
on assertions would help a lot.

Change-Id: Ibfa74d5a78e89bf7ff1a0e914384999466460145
---
M src/kudu/tools/kudu-admin-test.cc
1 file changed, 47 insertions(+), 33 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/46/11046/3
-- 
To view, visit http://gerrit.cloudera.org:8080/11046
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibfa74d5a78e89bf7ff1a0e914384999466460145
Gerrit-Change-Number: 11046
Gerrit-PatchSet: 3
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>

[kudu-CR] [kudu-admin-test] improvements on error reporting

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/11046 )

Change subject: [kudu-admin-test] improvements on error reporting
......................................................................


Patch Set 3: Verified+1

Unrelated flakes in:

    kudu.tests.test_scantoken.TestScanToken.test_double_pred
    kudu.tests.test_scantoken.TestScanToken.test_unixtime_micros_pred
    org.apache.kudu.backup.TestKuduBackup.Backup and Restore Multiple Tables
    TlsSocketTest.TestRecvFailure
    org.apache.kudu.backup.TestKuduBackup.Random Backup and Restore


-- 
To view, visit http://gerrit.cloudera.org:8080/11046
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibfa74d5a78e89bf7ff1a0e914384999466460145
Gerrit-Change-Number: 11046
Gerrit-PatchSet: 3
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Comment-Date: Mon, 30 Jul 2018 15:06:42 +0000
Gerrit-HasComments: No