You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Alexey Serbin (JIRA)" <ji...@apache.org> on 2018/07/19 18:30:00 UTC

[jira] [Created] (KUDU-2509) In some rare senarios, tserver may crash with SIGSEGV while boostrapping tablets

Alexey Serbin created KUDU-2509:
-----------------------------------

             Summary: In some rare senarios, tserver may crash with SIGSEGV while boostrapping tablets
                 Key: KUDU-2509
                 URL: https://issues.apache.org/jira/browse/KUDU-2509
             Project: Kudu
          Issue Type: Bug
          Components: tserver
    Affects Versions: 1.7.1, 1.7.0, 1.6.0, 1.5.0, 1.4.0, 1.3.1, 1.3.0, 1.2.0, 1.1.0, 1.0.1, 1.0.0, 0.9.1, 0.9.0, 0.8.0, 0.7.1, 0.7.0
            Reporter: Alexey Serbin
            Assignee: Alexey Serbin


As it's seen from the code snippet from {{src/kudu/tablet/tablet_bootstrap.cc}}, the {{TabletBootstrap::HandleCommitMessage()}} can return non-OK status while applying pending commits via {{ApplyCommitMessage()}}, when {{commit_entry}} is already deallocated after prior call to {{ApplyCommitMessage()}}:

{code:java}
  OpId last_applied = commit_entry->commit().commited_op_id();
  RETURN_NOT_OK(ApplyCommitMessage(state, commit_entry));
  delete commit_entry;

  auto iter = state->pending_commits.begin();
  while (iter != state->pending_commits.end()) {
    if ((*iter).first == last_applied.index() + 1) {
      gscoped_ptr<LogEntryPB> buffered_commit_entry((*iter).second);
      state->pending_commits.erase(iter++);
      last_applied = buffered_commit_entry->commit().commited_op_id();
      RETURN_NOT_OK(ApplyCommitMessage(state, buffered_commit_entry.get()));
      continue;
    }
    break;
  }

  return Status::OK();
{code}

That violates the contract of the {{TabletBootstrap::HandleCommitMessage()}}, so the following code does use-after-free and can get SIGSEGV while calling {{DebugInfo()}} to get more information on the {{entry}}:

{code:java}
      s = HandleEntry(&state, entry.get());
      if (!s.ok()) {
        DumpReplayStateToLog(state);
        RETURN_NOT_OK_PREPEND(s, DebugInfo(tablet_->tablet_id(),
                                           segment->header().sequence_number(),
                                           entry_count, segment->path(),
                                           *entry));
      }
{code}

The stack trace in the core file looks like the following:
{noformat}
#0  0x0000000001e343a0 in GetDescriptor (this=0x7ff9f4a5e8c0, message=..., generator=0x7ff9f4a5e7e0)  
    at /usr/src/debug/kudu-1.6.0-cdh5.14.0/thirdparty/src/protobuf-3.4.1/src/google/protobuf/message.h:332
#1  google::protobuf::TextFormat::Printer::Print (this=0x7ff9f4a5e8c0, message=..., generator=0x7ff9f4a5e7e0)
    at /usr/src/debug/kudu-1.6.0-cdh5.14.0/thirdparty/src/protobuf-3.4.1/src/google/protobuf/text_format.cc:1836
#2  0x0000000001e3460c in google::protobuf::TextFormat::Printer::Print (this=Unhandled dwarf expression opcode 0xf3
)                                                                               
    at /usr/src/debug/kudu-1.6.0-cdh5.14.0/thirdparty/src/protobuf-3.4.1/src/google/protobuf/text_format.cc:1759
#3  0x0000000001e346ad in google::protobuf::TextFormat::Printer::PrintToString (this=0x7ff9f4a5e8c0, message=..., output=0x7ff9f4a5eaf0)
    at /usr/src/debug/kudu-1.6.0-cdh5.14.0/thirdparty/src/protobuf-3.4.1/src/google/protobuf/text_format.cc:1742
#4  0x0000000001c83a95 in kudu::pb_util::SecureShortDebugString (msg=...)       
    at /usr/src/debug/kudu-1.6.0-cdh5.14.0/src/kudu/util/pb_util.cc:603         
#5  0x00000000009fe50b in DebugInfo (this=Unhandled dwarf expression opcode 0xf3)
    at /usr/src/debug/kudu-1.6.0-cdh5.14.0/src/kudu/tablet/tablet_bootstrap.cc:468
#6  kudu::tablet::TabletBootstrap::PlaySegments (this=Unhandled dwarf expression opcode 0xf3)
    at /usr/src/debug/kudu-1.6.0-cdh5.14.0/src/kudu/tablet/tablet_bootstrap.cc:1177
#7  0x00000000009ffc4b in kudu::tablet::TabletBootstrap::RunBootstrap (this=0x7ff9f4a5f510, rebuilt_tablet=0x7ff9f4a5f8a0, rebuilt_log=0x7ff9f4a5f870, consensus_info=0x7ff9f4a5f9d0)
    at /usr/src/debug/kudu-1.6.0-cdh5.14.0/src/kudu/tablet/tablet_bootstrap.cc:586
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)