You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Adar Dembo (Jira)" <ji...@apache.org> on 2019/09/18 23:04:00 UTC

[jira] [Assigned] (KUDU-2946) Waiting not allowed when destructing a service pool

     [ https://issues.apache.org/jira/browse/KUDU-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Adar Dembo reassigned KUDU-2946:
--------------------------------

    Assignee: Adar Dembo

> Waiting not allowed when destructing a service pool
> ---------------------------------------------------
>
>                 Key: KUDU-2946
>                 URL: https://issues.apache.org/jira/browse/KUDU-2946
>             Project: Kudu
>          Issue Type: Bug
>          Components: rpc
>            Reporter: Andrew Wong
>            Assignee: Adar Dembo
>            Priority: Major
>
> I have a precommit that failed in TabletServerTest.TestStatus with the following stack trace:
> {code}
> W0918 22:12:07.614951 30074 reactor.cc:670] Failed to create an outbound connection to 255.255.255.255:1 because connect() failed: Network error: connect(2) error: Network is unreachable (error 101)
> W0918 22:12:07.615072 30137 heartbeater.cc:357] Failed 3 heartbeats in a row: no longer allowing fast heartbeat attempts.
> I0918 22:12:07.614991 30138 consensus_queue.cc:206] T ffffffffffffffffffffffffffffffff P 4a511a2ea982499e8681b544635aaef9 [LEADER]: Queue going to LEADER mode. State: All replicated index: 0, Majority replicated index: 74, Committed index: 74, Last appended: 74.74, Last appended by leader: 74, Current term: 75, Majority size: 1, State: 0, Mode: LEADER, active raft config: opid_index: -1 peers { permanent_uuid: "4a511a2ea982499e8681b544635aaef9" member_type: VOTER last_known_addr { host: "127.0.0.1" port: 37531 } }
> I0918 22:12:07.615285 30141 maintenance_manager.cc:271] Maintenance manager is disabled. Stopping thread.
> I0918 22:12:07.615489 24505 tablet_server.cc:152] TabletServer@127.0.0.1:37531 shutting down...
> F0918 22:12:07.617386 30074 thread_restrictions.cc:79] Check failed: LoadTLS()->wait_allowed Waiting is not allowed to be used on this thread to prevent server-wide latency aberrations and deadlocks. Thread 30074 (name: "rpc reactor", category: "reactor")
> *** Check failure stack trace: ***
> *** Aborted at 1568844727 (unix time) try "date -d @1568844727" if you are using GNU date ***
> PC: @     0x7fba67e74c37 gsignal
> *** SIGABRT (@0x3e800005fb9) received by PID 24505 (TID 0x7fba5f109700) from PID 24505; stack trace: ***
> I0918 22:12:07.626417 24505 ts_tablet_manager.cc:1159] Shutting down tablet manager...
> I0918 22:12:07.626611 24505 tablet_replica.cc:273] T ffffffffffffffffffffffffffffffff P 4a511a2ea982499e8681b544635aaef9: stopping tablet replica
> I0918 22:12:07.626811 24505 raft_consensus.cc:2147] T ffffffffffffffffffffffffffffffff P 4a511a2ea982499e8681b544635aaef9 [term 75 LEADER]: Raft consensus shutting down.
> I0918 22:12:07.626994 24505 raft_consensus.cc:2174] T ffffffffffffffffffffffffffffffff P 4a511a2ea982499e8681b544635aaef9 [term 75 FOLLOWER]: Raft consensus is shut down!
>     @     0x7fba737ed330 (unknown) at ??:0
>     @     0x7fba67e74c37 gsignal at ??:0
>     @     0x7fba67e78028 abort at ??:0
>     @     0x7fba6b94ae09 google::logging_fail() at ??:0
>     @     0x7fba6b94c62d google::LogMessage::Fail() at ??:0
>     @     0x7fba6b94e64c google::LogMessage::SendToLog() at ??:0
>     @     0x7fba6b94c189 google::LogMessage::Flush() at ??:0
>     @     0x7fba6b94efdf google::LogMessageFatal::~LogMessageFatal() at ??:0
>     @     0x7fba6cbdb786 kudu::ThreadRestrictions::AssertWaitAllowed() at ??:0
>     @           0x74059f kudu::CountDownLatch::WaitUntil() at /home/jenkins-slave/workspace/kudu-master/0/src/kudu/util/countdown_latch.h:81
>     @           0x70c85a kudu::CountDownLatch::WaitFor() at /home/jenkins-slave/workspace/kudu-master/0/src/kudu/util/countdown_latch.h:94
>     @     0x7fba6cb9bb28 kudu::ThreadJoiner::Join() at ??:0
>     @     0x7fba6fc15cec kudu::rpc::ServicePool::Shutdown() at ??:0
>     @     0x7fba6fc14604 kudu::rpc::ServicePool::~ServicePool() at ??:0
>     @     0x7fba6fc14736 kudu::rpc::ServicePool::~ServicePool() at ??:0
>     @     0x7fba78bd8f9f scoped_refptr<>::~scoped_refptr() at ??:0
>     @     0x7fba6fb4fa56 kudu::rpc::Messenger::QueueInboundCall() at ??:0
>     @     0x7fba6fb1cb5b kudu::rpc::Connection::HandleIncomingCall() at ??:0
>     @     0x7fba6fb1b431 kudu::rpc::Connection::ReadHandler() at ??:0
>     @     0x7fba6ac9e606 ev_invoke_pending at ??:0
>     @     0x7fba6fb8593a kudu::rpc::ReactorThread::InvokePendingCb() at ??:0
>     @     0x7fba6ac9f4f8 ev_run at ??:0
>     @     0x7fba6fb85c89 kudu::rpc::ReactorThread::RunThread() at ??:0
>     @     0x7fba6fb9d503 boost::_bi::bind_t<>::operator()() at ??:0
>     @     0x7fba6fb750fc boost::function0<>::operator()() at ??:0
>     @     0x7fba6cb9e3cb kudu::Thread::SuperviseThread() at ??:0
>     @     0x7fba737e5184 start_thread at ??:0
>     @     0x7fba67f3bffd clone at ??:0
> {code}
> We appear to be waiting on the destruction of the ServicePool. Might be related to https://github.com/apache/kudu/commit/0ecc2c7715505fa6d5a03f8ef967a1a96d4f55d5 which adjusted some locking in the Messenger recently.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)