You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Dan Burkert (JIRA)" <ji...@apache.org> on 2018/06/20 23:39:00 UTC

[jira] [Commented] (KUDU-2088) UpdateReplica accesses stack object after it is destroyed

    [ https://issues.apache.org/jira/browse/KUDU-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16518722#comment-16518722 ] 

Dan Burkert commented on KUDU-2088:
-----------------------------------

https://github.com/apache/kudu/commit/c38631097a466f50209e211218c6668789f4b445

> UpdateReplica accesses stack object after it is destroyed
> ---------------------------------------------------------
>
>                 Key: KUDU-2088
>                 URL: https://issues.apache.org/jira/browse/KUDU-2088
>             Project: Kudu
>          Issue Type: Bug
>          Components: consensus
>    Affects Versions: 1.4.0
>            Reporter: Adar Dembo
>            Assignee: Adar Dembo
>            Priority: Major
>             Fix For: 1.5.0
>
>
> {{RaftConsensus::UpdateReplica()}} has this bit of code in it:
> {code}
>     // 5 - We wait for the writes to be durable.
>     // Note that this is safe because dist consensus now only supports a single outstanding
>     // request at a time and this way we can allow commits to proceed while we wait.
>     TRACE("Waiting on the replicates to finish logging");
>     TRACE_EVENT0("consensus", "Wait for log");
>     Status s;
>     do {
>       s = log_synchronizer.WaitFor(
>           MonoDelta::FromMilliseconds(FLAGS_raft_heartbeat_interval_ms));
>       // If just waiting for our log append to finish lets snooze the timer.
>       // We don't want to fire leader election because we're waiting on our own log.
>       if (s.IsTimedOut()) {
>         RETURN_NOT_OK(SnoozeFailureDetector());
>       }
>     } while (s.IsTimedOut());
>     RETURN_NOT_OK(s);
> {code}
> {{log_synchronizer}} is a stack-allocated {{Synchronizer}}. A reference to it is passed into an asynchronous log append function. The purpose of this code is to wait for that asynchronous function to finish while periodically snoozing the failure detector.
> However, if {{SnoozeFailureDetector()}} were to return an error, we'll exit the function early and destroy {{log_synchronizer}}. This can lead to a crash if the reference to {{log_synchronizer}} is accessed later by the asynchronous log append function. Here's one such crash stack trace:
> {noformat}
> F0801 02:58:43.488010 13715 mutex.cc:76] Check failed: rv == 0 || rv == 16 . Invalid argument. Owner tid: 0; Self tid: 128; To collect the owner stack trace, enable the flag --debug_mutex_collect_stacktrace
> *** Check failure stack trace: ***
>     @     0x7f843b5d22fd  google::LogMessage::Fail() at ??:0
>     @     0x7f843b5d41bd  google::LogMessage::SendToLog() at ??:0
>     @     0x7f843b5d1e39  google::LogMessage::Flush() at ??:0
>     @     0x7f843b5d4c5f  google::LogMessageFatal::~LogMessageFatal() at ??:0
>     @     0x7f843c49dc46  kudu::Mutex::TryAcquire() at ??:0
>     @     0x7f843c49dcd1  kudu::Mutex::Acquire() at ??:0
>     @     0x7f8444243290  kudu::MutexLock::MutexLock() at ??:0
>     @     0x7f8444274d02  kudu::CountDownLatch::CountDown() at ??:0
>     @     0x7f8444274dd1  kudu::CountDownLatch::CountDown() at ??:0
>     @     0x7f84428c4d4f  kudu::Synchronizer::StatusCB() at ??:0
>     @     0x7f84428cf73e  kudu::internal::RunnableAdapter<>::Run() at ??:0
>     @     0x7f84428ce716  kudu::internal::InvokeHelper<>::MakeItSo() at ??:0
>     @     0x7f84428ccf37  kudu::internal::Invoker<>::Run() at ??:0
>     @     0x7f8442879e6f  kudu::Callback<>::Run() at ??:0
>     @     0x7f844286e28e  kudu::consensus::PeerMessageQueue::LocalPeerAppendFinished() at ??:0
>     @     0x7f8442882275  kudu::internal::RunnableAdapter<>::Run() at ??:0
>     @     0x7f8442880649  kudu::internal::InvokeHelper<>::MakeItSo() at ??:0
>     @     0x7f844287e87f  kudu::internal::Invoker<>::Run() at ??:0
>     @     0x7f8442879e6f  kudu::Callback<>::Run() at ??:0
>     @     0x7f8442891eec  kudu::consensus::LogCache::LogCallback() at ??:0
>     @     0x7f8442897e94  kudu::internal::RunnableAdapter<>::Run() at ??:0
>     @     0x7f844289797d  kudu::internal::InvokeHelper<>::MakeItSo() at ??:0
>     @     0x7f8442896fff  kudu::internal::Invoker<>::Run() at ??:0
>     @     0x7f8442879e6f  kudu::Callback<>::Run() at ??:0
>     @     0x7f844250fa1f  kudu::log::Log::AppendThread::HandleGroup() at ??:0
>     @     0x7f844250ee5c  kudu::log::Log::AppendThread::DoWork() at ??:0
>     @     0x7f8442527c81  kudu::internal::RunnableAdapter<>::Run() at ??:0
>     @     0x7f8442526773  kudu::internal::InvokeHelper<>::MakeItSo() at ??:0
>     @     0x7f844252475a  kudu::internal::Invoker<>::Run() at ??:0
>     @     0x7f844288b654  kudu::Callback<>::Run() at ??:0
>     @     0x7f843c4e9a10  kudu::ClosureRunnable::Run() at ??:0
>     @     0x7f843c4e8a73  kudu::ThreadPool::DispatchThread() at ??:0
> {noformat}
> A simple fix would be to treat failures in {{SnoozeFailureDetectors}} as non-fatal and stay in the do-while loop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)