You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Michael Ho (Jira)" <ji...@apache.org> on 2019/08/20 05:00:00 UTC

[jira] [Assigned] (IMPALA-6788) Abort ExecFInstance() RPC loop early after query failure

     [ https://issues.apache.org/jira/browse/IMPALA-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Ho reassigned IMPALA-6788:
----------------------------------

    Assignee: Thomas Tauber-Marshall

> Abort ExecFInstance() RPC loop early after query failure
> --------------------------------------------------------
>
>                 Key: IMPALA-6788
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6788
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Distributed Exec
>    Affects Versions: Impala 2.12.0
>            Reporter: Mostafa Mokhtar
>            Assignee: Thomas Tauber-Marshall
>            Priority: Major
>              Labels: krpc, rpc
>         Attachments: connect_thread_busy_queries_failing.txt, impalad.va1007.foo.com.impala.log.INFO.20180401-200453.1800807.zip
>
>
> Logs from a large cluster show that query startup can take a long time, then once the startup completes the query is cancelled, this is because one of the intermediate rpcs failed. 
> Not clear what the right answer is as fragments are started asynchronously, possibly a timeout?
> {code}
> I0401 21:25:30.776803 1830900 coordinator.cc:99] Exec() query_id=334cc7dd9758c36c:ec38aeb400000000 stmt=with customer_total_return as
> I0401 21:25:30.813993 1830900 coordinator.cc:357] starting execution on 644 backends for query_id=334cc7dd9758c36c:ec38aeb400000000
> I0401 21:29:58.406466 1830900 coordinator.cc:370] started execution on 644 backends for query_id=334cc7dd9758c36c:ec38aeb400000000
> I0401 21:29:58.412132 1830900 coordinator.cc:896] Cancel() query_id=334cc7dd9758c36c:ec38aeb400000000
> I0401 21:29:59.188817 1830900 coordinator.cc:906] CancelBackends() query_id=334cc7dd9758c36c:ec38aeb400000000, tried to cancel 643 backends
> I0401 21:29:59.189177 1830900 coordinator.cc:1092] Release admission control resources for query_id=334cc7dd9758c36c:ec38aeb400000000
> {code}
> {code}
> I0401 21:23:48.218379 1830386 coordinator.cc:99] Exec() query_id=e44d553b04d47cfb:28f06bb800000000 stmt=with customer_total_return as
> I0401 21:23:48.270226 1830386 coordinator.cc:357] starting execution on 640 backends for query_id=e44d553b04d47cfb:28f06bb800000000
> I0401 21:29:58.402195 1830386 coordinator.cc:370] started execution on 640 backends for query_id=e44d553b04d47cfb:28f06bb800000000
> I0401 21:29:58.403818 1830386 coordinator.cc:896] Cancel() query_id=e44d553b04d47cfb:28f06bb800000000
> I0401 21:29:59.255903 1830386 coordinator.cc:906] CancelBackends() query_id=e44d553b04d47cfb:28f06bb800000000, tried to cancel 639 backends
> I0401 21:29:59.256251 1830386 coordinator.cc:1092] Release admission control resources for query_id=e44d553b04d47cfb:28f06bb800000000
> {code}
> Checked the coordinator and threads appear to be spending lots of time waiting on exec_complete_barrier_
> {code}
> #0  0x00007fd928c816d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
> #1  0x0000000001222944 in impala::Promise<bool>::Get() ()
> #2  0x0000000001220d7b in impala::Coordinator::StartBackendExec() ()
> #3  0x0000000001221c87 in impala::Coordinator::Exec() ()
> #4  0x0000000000c3a925 in impala::ClientRequestState::ExecQueryOrDmlRequest(impala::TQueryExecRequest const&) ()
> #5  0x0000000000c41f7e in impala::ClientRequestState::Exec(impala::TExecRequest*) ()
> #6  0x0000000000bff597 in impala::ImpalaServer::ExecuteInternal(impala::TQueryCtx const&, std::shared_ptr<impala::ImpalaServer::SessionState>, bool*, std::shared_ptr<impala::ClientRequestState>*) ()
> #7  0x0000000000c061d9 in impala::ImpalaServer::Execute(impala::TQueryCtx*, std::shared_ptr<impala::ImpalaServer::SessionState>, std::shared_ptr<impala::ClientRequestState>*) ()
> #8  0x0000000000c561c5 in impala::ImpalaServer::query(beeswax::QueryHandle&, beeswax::Query const&) ()
> /StartBackendExec
> #11 0x0000000000d60c9a in boost::detail::thread_data<boost::_bi::bind_t<void, void (*)(std::string const&, std::string const&, boost::function<void ()>, impala::ThreadDebugInfo const*, impala::Promise<long>*), boost::_bi::list5<boost::_bi::value<std::string>, boost::_bi::value<std::string>, boost::_bi::value<boost::function<void ()> >, boost::_bi::value<impala::ThreadDebugInfo*>, boost::_bi::value<impala::Promise<long>*> > > >::run() ()
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org