You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Mostafa Mokhtar (JIRA)" <ji...@apache.org> on 2018/04/02 04:51:00 UTC

[jira] [Created] (IMPALA-6788) Query fragments can spend lots of time starting up then fail right after "starting" all backends

Mostafa Mokhtar created IMPALA-6788:
---------------------------------------

             Summary: Query fragments can spend lots of time starting up then fail right after "starting" all backends
                 Key: IMPALA-6788
                 URL: https://issues.apache.org/jira/browse/IMPALA-6788
             Project: IMPALA
          Issue Type: Sub-task
          Components: Distributed Exec
            Reporter: Mostafa Mokhtar
         Attachments: connect_thread_busy_queries_failing.txt

Logs from a large cluster show that query startup can take a long time, then once the startup completes the query is cancelled, this is because one of the intermediate rpcs failed. 

Not clear what the right answer is as fragments are started asynchronously, possibly a timeout?

{code}
I0401 21:25:30.776803 1830900 coordinator.cc:99] Exec() query_id=334cc7dd9758c36c:ec38aeb400000000 stmt=with customer_total_return as
I0401 21:25:30.813993 1830900 coordinator.cc:357] starting execution on 644 backends for query_id=334cc7dd9758c36c:ec38aeb400000000
I0401 21:29:58.406466 1830900 coordinator.cc:370] started execution on 644 backends for query_id=334cc7dd9758c36c:ec38aeb400000000
I0401 21:29:58.412132 1830900 coordinator.cc:896] Cancel() query_id=334cc7dd9758c36c:ec38aeb400000000
I0401 21:29:59.188817 1830900 coordinator.cc:906] CancelBackends() query_id=334cc7dd9758c36c:ec38aeb400000000, tried to cancel 643 backends
I0401 21:29:59.189177 1830900 coordinator.cc:1092] Release admission control resources for query_id=334cc7dd9758c36c:ec38aeb400000000
{code}

{code}
I0401 21:23:48.218379 1830386 coordinator.cc:99] Exec() query_id=e44d553b04d47cfb:28f06bb800000000 stmt=with customer_total_return as
I0401 21:23:48.270226 1830386 coordinator.cc:357] starting execution on 640 backends for query_id=e44d553b04d47cfb:28f06bb800000000
I0401 21:29:58.402195 1830386 coordinator.cc:370] started execution on 640 backends for query_id=e44d553b04d47cfb:28f06bb800000000
I0401 21:29:58.403818 1830386 coordinator.cc:896] Cancel() query_id=e44d553b04d47cfb:28f06bb800000000
I0401 21:29:59.255903 1830386 coordinator.cc:906] CancelBackends() query_id=e44d553b04d47cfb:28f06bb800000000, tried to cancel 639 backends
I0401 21:29:59.256251 1830386 coordinator.cc:1092] Release admission control resources for query_id=e44d553b04d47cfb:28f06bb800000000
{code}

Checked the coordinator and threads appear to be spending lots of time waiting on exec_complete_barrier_
{code}
#0  0x00007fd928c816d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x0000000001222944 in impala::Promise<bool>::Get() ()
#2  0x0000000001220d7b in impala::Coordinator::StartBackendExec() ()
#3  0x0000000001221c87 in impala::Coordinator::Exec() ()
#4  0x0000000000c3a925 in impala::ClientRequestState::ExecQueryOrDmlRequest(impala::TQueryExecRequest const&) ()
#5  0x0000000000c41f7e in impala::ClientRequestState::Exec(impala::TExecRequest*) ()
#6  0x0000000000bff597 in impala::ImpalaServer::ExecuteInternal(impala::TQueryCtx const&, std::shared_ptr<impala::ImpalaServer::SessionState>, bool*, std::shared_ptr<impala::ClientRequestState>*) ()
#7  0x0000000000c061d9 in impala::ImpalaServer::Execute(impala::TQueryCtx*, std::shared_ptr<impala::ImpalaServer::SessionState>, std::shared_ptr<impala::ClientRequestState>*) ()
#8  0x0000000000c561c5 in impala::ImpalaServer::query(beeswax::QueryHandle&, beeswax::Query const&) ()
/StartBackendExec
#11 0x0000000000d60c9a in boost::detail::thread_data<boost::_bi::bind_t<void, void (*)(std::string const&, std::string const&, boost::function<void ()>, impala::ThreadDebugInfo const*, impala::Promise<long>*), boost::_bi::list5<boost::_bi::value<std::string>, boost::_bi::value<std::string>, boost::_bi::value<boost::function<void ()> >, boost::_bi::value<impala::ThreadDebugInfo*>, boost::_bi::value<impala::Promise<long>*> > > >::run() ()
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)