You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Todd Lipcon (Code Review)" <ge...@cloudera.org> on 2016/06/23 02:55:01 UTC
[kudu-CR](branch-0.9.x) KUDU-1477. Pending COMMIT message for failed write operation can prevent tablet startup
Todd Lipcon has uploaded a new change for review.
http://gerrit.cloudera.org:8080/3464
Change subject: KUDU-1477. Pending COMMIT message for failed write operation can prevent tablet startup
......................................................................
KUDU-1477. Pending COMMIT message for failed write operation can prevent tablet startup
This fixes a bug seen in a recent YCSB stress test that I ran
in which I was accidentally writing tens of thousands of duplicate
keys per second. After a tablet server restarted, it failed to come
up due to a pending commit which referred to no mutated stores
(e.g. because all of the operations were duplicate key inserts).
This patch tweaks the logic for this safety check: a commit with no
mutated stores trivially has "no active stores". However, that's not
the same as having "only inactive stores" -- the subtlety is in the
difference in behavior when a commit has no stores at all.
The patch adds a new targeted test in tablet_bootstrap-test as well as
a more end-to-end test in ts_recovery-itest. Both reproduced the bug
reliably before this patch.
Change-Id: I8ecf8d780de1aa89fae4e0510d8291eb1f1cee11
Reviewed-on: http://gerrit.cloudera.org:8080/3321
Tested-by: Kudu Jenkins
Reviewed-by: David Ribeiro Alves <dr...@apache.org>
(cherry picked from commit 6894438a406a635dc8a8f3bd77862294163cc7fb)
---
M src/kudu/consensus/log-test-base.h
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/integration-tests/test_workload.cc
M src/kudu/integration-tests/test_workload.h
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/tablet/tablet_bootstrap-test.cc
M src/kudu/tablet/tablet_bootstrap.cc
7 files changed, 148 insertions(+), 23 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/64/3464/1
--
To view, visit http://gerrit.cloudera.org:8080/3464
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newchange
Gerrit-Change-Id: I8ecf8d780de1aa89fae4e0510d8291eb1f1cee11
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: branch-0.9.x
Gerrit-Owner: Todd Lipcon <to...@apache.org>
[kudu-CR](branch-0.9.x) KUDU-1477. Pending COMMIT message for failed write operation can prevent tablet startup
Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has submitted this change and it was merged.
Change subject: KUDU-1477. Pending COMMIT message for failed write operation can prevent tablet startup
......................................................................
KUDU-1477. Pending COMMIT message for failed write operation can prevent tablet startup
This fixes a bug seen in a recent YCSB stress test that I ran
in which I was accidentally writing tens of thousands of duplicate
keys per second. After a tablet server restarted, it failed to come
up due to a pending commit which referred to no mutated stores
(e.g. because all of the operations were duplicate key inserts).
This patch tweaks the logic for this safety check: a commit with no
mutated stores trivially has "no active stores". However, that's not
the same as having "only inactive stores" -- the subtlety is in the
difference in behavior when a commit has no stores at all.
The patch adds a new targeted test in tablet_bootstrap-test as well as
a more end-to-end test in ts_recovery-itest. Both reproduced the bug
reliably before this patch.
Change-Id: I8ecf8d780de1aa89fae4e0510d8291eb1f1cee11
Reviewed-on: http://gerrit.cloudera.org:8080/3321
Tested-by: Kudu Jenkins
Reviewed-by: David Ribeiro Alves <dr...@apache.org>
(cherry picked from commit 6894438a406a635dc8a8f3bd77862294163cc7fb)
Reviewed-on: http://gerrit.cloudera.org:8080/3464
Reviewed-by: Todd Lipcon <to...@apache.org>
---
M src/kudu/consensus/log-test-base.h
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/integration-tests/test_workload.cc
M src/kudu/integration-tests/test_workload.h
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/tablet/tablet_bootstrap-test.cc
M src/kudu/tablet/tablet_bootstrap.cc
7 files changed, 148 insertions(+), 23 deletions(-)
Approvals:
Todd Lipcon: Looks good to me, approved
Kudu Jenkins: Verified
--
To view, visit http://gerrit.cloudera.org:8080/3464
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: merged
Gerrit-Change-Id: I8ecf8d780de1aa89fae4e0510d8291eb1f1cee11
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: branch-0.9.x
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
[kudu-CR](branch-0.9.x) KUDU-1477. Pending COMMIT message for failed write operation can prevent tablet startup
Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello David Ribeiro Alves, Kudu Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/3464
to look at the new patch set (#2).
Change subject: KUDU-1477. Pending COMMIT message for failed write operation can prevent tablet startup
......................................................................
KUDU-1477. Pending COMMIT message for failed write operation can prevent tablet startup
This fixes a bug seen in a recent YCSB stress test that I ran
in which I was accidentally writing tens of thousands of duplicate
keys per second. After a tablet server restarted, it failed to come
up due to a pending commit which referred to no mutated stores
(e.g. because all of the operations were duplicate key inserts).
This patch tweaks the logic for this safety check: a commit with no
mutated stores trivially has "no active stores". However, that's not
the same as having "only inactive stores" -- the subtlety is in the
difference in behavior when a commit has no stores at all.
The patch adds a new targeted test in tablet_bootstrap-test as well as
a more end-to-end test in ts_recovery-itest. Both reproduced the bug
reliably before this patch.
Change-Id: I8ecf8d780de1aa89fae4e0510d8291eb1f1cee11
Reviewed-on: http://gerrit.cloudera.org:8080/3321
Tested-by: Kudu Jenkins
Reviewed-by: David Ribeiro Alves <dr...@apache.org>
(cherry picked from commit 6894438a406a635dc8a8f3bd77862294163cc7fb)
---
M src/kudu/consensus/log-test-base.h
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/integration-tests/test_workload.cc
M src/kudu/integration-tests/test_workload.h
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/tablet/tablet_bootstrap-test.cc
M src/kudu/tablet/tablet_bootstrap.cc
7 files changed, 148 insertions(+), 23 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/64/3464/2
--
To view, visit http://gerrit.cloudera.org:8080/3464
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8ecf8d780de1aa89fae4e0510d8291eb1f1cee11
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: branch-0.9.x
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
[kudu-CR](branch-0.9.x) KUDU-1477. Pending COMMIT message for failed write operation can prevent tablet startup
Posted by "Kudu Jenkins (Code Review)" <ge...@cloudera.org>.
Kudu Jenkins has posted comments on this change.
Change subject: KUDU-1477. Pending COMMIT message for failed write operation can prevent tablet startup
......................................................................
Patch Set 2:
Build Started http://104.196.14.100/job/kudu-gerrit/1963/
--
To view, visit http://gerrit.cloudera.org:8080/3464
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: I8ecf8d780de1aa89fae4e0510d8291eb1f1cee11
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: branch-0.9.x
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-HasComments: No
[kudu-CR](branch-0.9.x) KUDU-1477. Pending COMMIT message for failed write operation can prevent tablet startup
Posted by "Kudu Jenkins (Code Review)" <ge...@cloudera.org>.
Kudu Jenkins has posted comments on this change.
Change subject: KUDU-1477. Pending COMMIT message for failed write operation can prevent tablet startup
......................................................................
Patch Set 1:
Build Started http://104.196.14.100/job/kudu-gerrit/1955/
--
To view, visit http://gerrit.cloudera.org:8080/3464
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: I8ecf8d780de1aa89fae4e0510d8291eb1f1cee11
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: branch-0.9.x
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-HasComments: No
[kudu-CR](branch-0.9.x) KUDU-1477. Pending COMMIT message for failed write operation can prevent tablet startup
Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.
Change subject: KUDU-1477. Pending COMMIT message for failed write operation can prevent tablet startup
......................................................................
Patch Set 2: Code-Review+2
--
To view, visit http://gerrit.cloudera.org:8080/3464
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: I8ecf8d780de1aa89fae4e0510d8291eb1f1cee11
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: branch-0.9.x
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No