You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Dan Hecht (JIRA)" <ji...@apache.org> on 2017/08/23 20:54:00 UTC

[jira] [Resolved] (IMPALA-4645) Fragments close after query is unregistered

     [ https://issues.apache.org/jira/browse/IMPALA-4645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dan Hecht resolved IMPALA-4645.
-------------------------------
    Resolution: Not A Problem

> Fragments close after query is unregistered
> -------------------------------------------
>
>                 Key: IMPALA-4645
>                 URL: https://issues.apache.org/jira/browse/IMPALA-4645
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 2.8.0
>            Reporter: Matthew Jacobs
>              Labels: query-lifecycle
>         Attachments: impalad-fragment-not-close.log, stacks.out
>
>
> Testing the latest Kudu master exposes an Impala issue where a query appears to complete successfully even though a fragment continues to run. This can be observed in  TestKuduMemLimits::test_low_mem_limit_low_selectivity_scan when using the latest version of Kudu in the minicluster (IMPALA_TOOLCHAIN_BUILD_ID=308-96a4cc516e  and  IMPALA_KUDU_VERSION=e018a83 ). The test completes and the test table is deleted yet a fragment remains afterwards. This then breaks the next test that runs (TestFragmentLifecycle) which expects no fragments to be 'in flight'. See IMPALA-4642 for that test failure.
> *This JIRA is not about Kudu changes or the test failure (tracked by IMPALA-4642), but that fragment instances are closed after sending the 'done' report (which means the query may unregister first).*
> In the repro, after the test completed, no queries were reported running but 2 fragments are still in flight. The impalad web UI shows "impala-server.num-fragments-in-flight: 2"
> I captured the stacks on the coordinator using gdb and see 2 fragment execution threads, e.g.:
> {code}
> Thread 10 (Thread 0x7f27af541700 (LWP 14882)):
> #0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
> #1  0x0000000001185b9b in boost::condition_variable::wait (this=0x12fbee58, m=...) at /home/mj/dev/Impala/toolchain/boost-1.57.0/include/boost/thread/pthread/condition_variable.hpp:73
> #2  0x0000000001a55a0c in boost::thread::join_noexcept() ()
> #3  0x00000000013316c1 in boost::thread::join (this=0x131214f0) at /home/mj/dev/Impala/toolchain/boost-1.57.0/include/boost/thread/detail/thread.hpp:767
> #4  0x0000000001331d2e in impala::Thread::Join (this=0x11a6bd80) at /home/mj/dev/Impala/be/src/util/thread.h:104
> #5  0x00000000015eecc4 in impala::ThreadGroup::JoinAll (this=0xc2516a0) at /home/mj/dev/Impala/be/src/util/thread.cc:327
> #6  0x000000000178df9c in impala::KuduScanNode::Close (this=0xc251440, state=0xcdd8e00) at /home/mj/dev/Impala/be/src/exec/kudu-scan-node.cc:209
> #7  0x00000000019e8e3d in impala::PlanFragmentExecutor::Close (this=0xa76c2b0) at /home/mj/dev/Impala/be/src/runtime/plan-fragment-executor.cc:502
> #8  0x00000000019e08fd in impala::FragmentInstanceState::Exec (this=0xa76c000) at /home/mj/dev/Impala/be/src/runtime/fragment-instance-state.cc:69
> #9  0x00000000019ebfd3 in impala::QueryExecMgr::ExecFInstance (this=0xad09140, fis=0xa76c000) at /home/mj/dev/Impala/be/src/runtime/query-exec-mgr.cc:100
> #10 0x00000000019ef51e in boost::_mfi::mf1<void, impala::QueryExecMgr, impala::FragmentInstanceState*>::operator() (this=0xb71c5e0, p=0xad09140, a1=0xa76c000) at /home/mj/dev/Impala/toolchain/boost-1.57.0/include/boost/bind/mem_fn_template.hpp:165
> #11 0x00000000019ef3c1 in boost::_bi::list2<boost::_bi::value<impala::QueryExecMgr*>, boost::_bi::value<impala::FragmentInstanceState*> >::operator()<boost::_mfi::mf1<void, impala::QueryExecMgr, impala::FragmentInstanceState*>, boost::_bi::list0> (this=0xb71c5f0, f=..., a=...) at /home/mj/dev/Impala/toolchain/boost-1.57.0/include/boost/bind/bind.hpp:313
> ...
> {code}
> There are no coordinator threads as the query has completed. The log for query id e4100828b01f01c:5a7c79e200000000 shows this as well.
> {code}
> I1212 11:10:26.125466  2681 impala-server.cc:907] UnregisterQuery(): query_id=e4100828b01f01c:5a7c79e200000000
> I1212 11:10:26.125478  2681 impala-server.cc:993] Cancel(): query_id=e4100828b01f01c:5a7c79e200000000
> I1212 11:10:26.125489  2681 coordinator.cc:1362] Cancel() query_id=e4100828b01f01c:5a7c79e200000000
> I1212 11:10:26.125504  2681 coordinator.cc:1428] CancelFragmentInstances() query_id=e4100828b01f01c:5a7c79e200000000, tried to cancel 0 fragment instances
> I1212 11:10:26.126605  2681 admission-controller.cc:538] Released query id=e4100828b01f01c:5a7c79e200000000 agg_num_running=1, agg_num_queued=0, agg_mem_reserved=32.63 MB,  local_host(local_mem_admitted=3.75 GB, num_admitted_running=1, num_queued=0, backend_mem_reserved=10.21 MB)
> I1212 11:10:26.126647  2681 query-exec-mgr.cc:138] ReleaseQueryState(): query_id=e4100828b01f01c:5a7c79e200000000 refcnt=2
> I1212 11:10:26.131104  2681 impala-hs2-server.cc:373] CloseSession(): request=TCloseSessionReq {...}
> I1212 11:10:26.185884  2681 status.cc:114] Session closed
> {code}
> It's not until almost a minute later that a fragment instance from this query completes:
> {code}
> I1212 11:11:03.275807 14882 query-exec-mgr.cc:103] Instance completed. instance_id=e4100828b01f01c:5a7c79e200000003
> I1212 11:11:03.275813 14882 query-exec-mgr.cc:138] ReleaseQueryState(): query_id=e4100828b01f01c:5a7c79e200000000 refcnt=1
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)