You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org> on 2017/06/22 16:44:14 UTC

[kudu-CR] WIP: Fix SIGSEGV in ksck

David Ribeiro Alves has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7261

Change subject: WIP: Fix SIGSEGV in ksck
......................................................................

WIP: Fix SIGSEGV in ksck

We were hitting the a segfault when trying to run ksck on
a cluster where some tablet servers are missing on restart.
The reason is that we're sorting a vector of replicas by
their tserver uuid even if the tserver was not found, causing
a segmentation fault when trying to access the uuid.

Change-Id: I66ff69bc3aa567863b61ee8e686fc13c81db6fdf
WIP: as we're missing a test that covers this case, obviously.
---
M src/kudu/tools/ksck.cc
1 file changed, 4 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/61/7261/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7261
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I66ff69bc3aa567863b61ee8e686fc13c81db6fdf
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <da...@gmail.com>

[kudu-CR] WIP: Fix SIGSEGV in ksck

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has uploaded a new patch set (#2).

Change subject: WIP: Fix SIGSEGV in ksck
......................................................................

WIP: Fix SIGSEGV in ksck

We were hitting the a segfault when trying to run ksck on
a cluster where some tablet servers are missing on restart.
The reason is that we're sorting a vector of replicas by
their tserver uuid even if the tserver was not found, causing
a segmentation fault when trying to access the uuid.

Change-Id: I66ff69bc3aa567863b61ee8e686fc13c81db6fdf
WIP: as we're missing a test that covers this case, obviously.
---
M src/kudu/tools/ksck.cc
1 file changed, 4 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/61/7261/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7261
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I66ff69bc3aa567863b61ee8e686fc13c81db6fdf
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins

[kudu-CR] WIP: Fix SIGSEGV in ksck

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: WIP: Fix SIGSEGV in ksck
......................................................................


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7261/3//COMMIT_MSG
Commit Message:

PS3, Line 9: the 
nit: drop


http://gerrit.cloudera.org:8080/#/c/7261/3/src/kudu/tools/ksck.cc
File src/kudu/tools/ksck.cc:

PS3, Line 655:   for (const shared_ptr<KsckTabletReplica>& replica : tablet->replicas()) {
             :     replica_infos.emplace_back();
             :     auto* repl_info = &replica_infos.back();
             :     repl_info->replica = replica.get();
             :     VLOG(1) << Substitute("A replica of tablet $0 is on live tablet server $1",
             :                           tablet->id(), replica->ts_uuid());
             : 
             :     // Check for agreement on tablet assignment and state between the master
             :     // and the tablet server.
             :     auto ts = FindPointeeOrNull(cluster_->tablet_servers(), replica->ts_uuid());
BTW, what about entries which might be in cluster_->tablet_servers(), but not present in tablet->replicas()?

Or this cannot be the case at all?


-- 
To view, visit http://gerrit.cloudera.org:8080/7261
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I66ff69bc3aa567863b61ee8e686fc13c81db6fdf
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] Fix SIGSEGV in ksck

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change.

Change subject: Fix SIGSEGV in ksck
......................................................................


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7261/3//COMMIT_MSG
Commit Message:

PS3, Line 9: the 
> nit: drop
Done. Actually I rewrote this a bit.


http://gerrit.cloudera.org:8080/#/c/7261/3/src/kudu/tools/ksck.cc
File src/kudu/tools/ksck.cc:

PS3, Line 655:   for (const shared_ptr<KsckTabletReplica>& replica : tablet->replicas()) {
             :     replica_infos.emplace_back();
             :     auto* repl_info = &replica_infos.back();
             :     repl_info->replica = replica.get();
             :     VLOG(1) << Substitute("A replica of tablet $0 is on live tablet server $1",
             :                           tablet->id(), replica->ts_uuid());
             : 
             :     // Check for agreement on tablet assignment and state between the master
             :     // and the tablet server.
             :     auto ts = FindPointeeOrNull(cluster_->tablet_servers(), replica->ts_uuid());
> BTW, what about entries which might be in cluster_->tablet_servers(), but n
It's the common case in a real cluster-- most tablet servers won't be hosting a replica of a given tablet.

If there are tablet replicas known to some replica of the tablet but not to the master, then they should be exposed by the consensus config comparisons.


-- 
To view, visit http://gerrit.cloudera.org:8080/7261
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I66ff69bc3aa567863b61ee8e686fc13c81db6fdf
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-HasComments: Yes

[kudu-CR] Fix SIGSEGV in ksck

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has posted comments on this change.

Change subject: Fix SIGSEGV in ksck
......................................................................


Patch Set 4: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/7261
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I66ff69bc3aa567863b61ee8e686fc13c81db6fdf
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-HasComments: No

[kudu-CR] Fix SIGSEGV in ksck

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/7261

to look at the new patch set (#4).

Change subject: Fix SIGSEGV in ksck
......................................................................

Fix SIGSEGV in ksck

ksck will segfault when some tablet servers that host tablet
replicas are missing. This happens, for example, if the master
is still restarting and has not yet fully populated its list of
live tablet servers.

The root cause is that a vector of replicas is being sorted by
tserver uuid obtained from the master even if the master is not
aware of the tablet server was not found, causing a segmentation
fault when trying to access the uuid. The fix just checks for a
missing tserver reference and sorts such replicas first.

The included test segfaults without the fix.

Change-Id: I66ff69bc3aa567863b61ee8e686fc13c81db6fdf
---
M src/kudu/tools/ksck-test.cc
M src/kudu/tools/ksck.cc
2 files changed, 17 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/61/7261/4
-- 
To view, visit http://gerrit.cloudera.org:8080/7261
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I66ff69bc3aa567863b61ee8e686fc13c81db6fdf
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] Fix SIGSEGV in ksck

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has submitted this change and it was merged.

Change subject: Fix SIGSEGV in ksck
......................................................................


Fix SIGSEGV in ksck

ksck will segfault when some tablet servers that host tablet
replicas are missing. This happens, for example, if the master
is still restarting and has not yet fully populated its list of
live tablet servers.

The root cause is that a vector of replicas is being sorted by
tserver uuid obtained from the master even if the master is not
aware of the tablet server was not found, causing a segmentation
fault when trying to access the uuid. The fix just checks for a
missing tserver reference and sorts such replicas first.

The included test segfaults without the fix.

Change-Id: I66ff69bc3aa567863b61ee8e686fc13c81db6fdf
Reviewed-on: http://gerrit.cloudera.org:8080/7261
Tested-by: Kudu Jenkins
Reviewed-by: David Ribeiro Alves <da...@gmail.com>
---
M src/kudu/tools/ksck-test.cc
M src/kudu/tools/ksck.cc
2 files changed, 17 insertions(+), 2 deletions(-)

Approvals:
  David Ribeiro Alves: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/7261
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I66ff69bc3aa567863b61ee8e686fc13c81db6fdf
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>