You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Pranay Singh (JIRA)" <ji...@apache.org> on 2018/03/28 22:23:00 UTC
[jira] [Created] (IMPALA-6762)
DataStreamRecvr::SenderQueue::GetBatch encounters an exception doing a
data_arrival_cv_.Wait(l)
Pranay Singh created IMPALA-6762:
------------------------------------
Summary: DataStreamRecvr::SenderQueue::GetBatch encounters an exception doing a data_arrival_cv_.Wait(l)
Key: IMPALA-6762
URL: https://issues.apache.org/jira/browse/IMPALA-6762
Project: IMPALA
Issue Type: Bug
Reporter: Pranay Singh
Fix For: Impala 2.12.0, Impala 2.13.0, Impala 2.11.0, Impala 2.10.0, Impala 2.9.0, Impala 2.8.0, Impala 2.7.0, Impala 2.6.0
Problem: In the function impala::DataStreamRecvr::SenderQueue::GetBatch() while
calling data_arrival_cv_.Wait() an exception is encountered in boost library, which
results in a SIGABRT. The probable cause of this issue is that lock has been freed.
Evidence: We have a minidump for the issue seen; the two suspected threads involved in the issue are listed below.
Thread encountered SIGABRT
Crash reason: SIGABRT
Crash address: 0x3d300008b2f
Process uptime: not available
Thread 959 (crashed)
0 libc-2.17.so + 0x351f7
rax = 0x0000000000000000 rdx = 0x0000000000000006
rcx = 0xffffffffffffffff rbx = 0x00007f1291116f18
rsi = 0x000000000001a041 rdi = 0x0000000000008b2f
rbp = 0x0000000002ad97c0 rsp = 0x00007f102ac0cd48
r8 = 0x000000000000000a r9 = 0x00007f102ac0e700
r10 = 0x0000000000000008 r11 = 0x0000000000000202
r12 = 0x00007f1291116f00 r13 = 0x00007f102ac0cfb0
r14 = 0x0000000000000000 r15 = 0x0000000000000000
rip = 0x00007f13ec6601f7
Found by: given as instruction pointer in context
1 libc-2.17.so + 0x368e8
rsp = 0x00007f102ac0cd50 rip = 0x00007f13ec6618e8
Found by: stack scanning
.
.
.
9 impalad!<name omitted>
rax = 0x0000000000000001 rdx = 0x0000000000000001
rbx = 0x00007f102ac0d390 rbp = 0x00007f12c68c13a0
rsp = 0x00007f102ac0d390 r12 = 0x00007f12cc820cc0
r13 = 0x00007f1244ab5600 r14 = 0x00007f102ac0d4e0
r15 = 0x0000000000000001 rip = 0x000000000080fe65
Found by: call frame info
10 impalad!<name omitted>
rbx = 0x00007f102ac0d4e0 rbp = 0x00007f1244ab5630
rsp = 0x00007f102ac0d3e0 r12 = 0x00007f12cc820cc0
r13 = 0x00007f1244ab5600 r14 = 0x00007f102ac0d4e0
r15 = 0x0000000000000001 rip = 0x000000000080fe8c
Found by: call frame info
11 impalad!<name omitted>
rbx = 0x0000000000000000 rbp = 0x00007f1244ab5630
rsp = 0x00007f102ac0d430 r12 = 0x00007f12cc820cc0
r13 = 0x00007f1244ab5600 r14 = 0x00007f102ac0d4e0
r15 = 0x0000000000000001 rip = 0x0000000000810294
Found by: call frame info
12 impalad!impala::DataStreamRecvr::(impala::RowBatch**)
rbx = 0x00007f12cc820c60 rbp = 0x00007f102ac0d500
rsp = 0x00007f102ac0d4c0 r12 = 0x00007f102ac0d530
r13 = 0x00007f12cc820c90 r14 = 0x00007f127242f338
r15 = 0x00007f12cc820d48 rip = 0x0000000000a280f3
Found by: call frame info
13 impalad!impala::DataStreamRecvr::GetBatch(impala::RowBatch**)
rbx = 0x00007f102ac0d5c0 rbp = 0x00007f102ac0d5c0
rsp = 0x00007f102ac0d5a0 r12 = 0x00007f121f464100
r13 = 0x00007f127242f180 r14 = 0x00007f121f464100
r15 = 0x00007f102ac0d760 rip = 0x0000000000a284c3
Found by: call frame info
14 impalad!impala::ExchangeNode::FillInputRowBatch(impala::RuntimeState*)
rbx = 0x00007f102ac0d690 rbp = 0x00007f102ac0d5c0
rsp = 0x00007f102ac0d5b0 r12 = 0x00007f121f464100
r13 = 0x00007f127242f180 r14 = 0x00007f121f464100
r15 = 0x00007f102ac0d760 rip = 0x0000000000beffa5
Found by: call frame info
15 impalad!impala::ExchangeNode::Open(impala::RuntimeState*)
rbx = 0x00007f121f464100 rbp = 0x00007f102ac0d8d0
rsp = 0x00007f102ac0d640 r12 = 0x00007f127242f180
r13 = 0x00007f102ac0d690 r14 = 0x00007f121f464100
r15 = 0x00007f102ac0d760 rip = 0x0000000000bf0d9e
Found by: call frame info
Thread 336
----------------
13 impalad!<name omitted> [TBufferTransports.h : 69 + 0xe]
rbx = 0x0000000000000000 rbp = 0x0000000000000004
rsp = 0x00007f13077b9840 r12 = 0x0000000000000004
r13 = 0x00007f13077b98b0 r14 = 0x00007f12c3f6f270
r15 = 0x00007f12d5a7c034 rip = 0x000000000080be6e
Found by: call frame info
14 impalad!apache::thrift::protocol::TBinaryProtocolT<apache::thrift::transport::TTransport>::readMessageBegin(std::string&, apache::thrift::protocol::TMessageType&, int&)
rbx = 0x00007f13077b98b0 rbp = 0x00007f13077b98f8
rsp = 0x00007f13077b98a0 r12 = 0x00007f13077b98fc
r13 = 0x00007f13077b9900 r14 = 0x00007f12406cd0e0
r15 = 0x00007f13077b9b80 rip = 0x00000000009ca5bf
Found by: call frame info
15 impalad!impala::ImpalaInternalServiceClient::recv_CancelPlanFragment(impala::TCancelPlanFragmentResult&)
rbx = 0x000000001f9241c0 rbp = 0x00007f13ed2106a0
rsp = 0x00007f13077b98f0 r12 = 0x00007f13077b9900
r13 = 0x00007f13077b9b80 r14 = 0x00007f13077b9b50
r15 = 0x00007f13077b9b80 rip = 0x0000000000cba069
Found by: call frame info
16 impalad!impala::Status impala::ClientConnection<impala::ImpalaBackendClient>::DoRpc<void (impala::ImpalaInternalServiceClient::*)(impala::TCancelPlanFragmentResult&, impala::TCancelPlanFragmentParams const&), impala::TCancelPlanFragmentParams, impala::TCancelPlanFragmentResult>(void (impala::ImpalaInternalServiceClient::* const&)(impala::TCancelPlanFragmentResult&, impala::TCancelPlanFragmentParams const&), impala::TCancelPlanFragmentParams const&, impala::TCancelPlanFragmentResult*, bool*)
rbx = 0x00007f13077b9b20 rbp = 0x00007f13077b9ae0
rsp = 0x00007f13077b9970 r12 = 0x00007f13077b9bc0
r13 = 0x00007f13077b9acf r14 = 0x00007f13077b9b50
r15 = 0x00007f13077b9b80 rip = 0x0000000000d79031
Found by: call frame info
17 impalad!impala::Coordinator::CancelRemoteFragments()
rbx = 0x0000000000000000 rbp = 0x00007f12d8533f40
rsp = 0x00007f13077b9a60 r12 = 0x00007f12d8533fa0
r13 = 0x00007f13077b9bc0 r14 = 0x000000003dc58000
r15 = 0x00007f13077b9b20 rip = 0x0000000000d6818f
Found by: call frame info
18 impalad!impala::Coordinator::CancelInternal()
rbx = 0x000000003dc58000 rbp = 0x00007f13077b9d70
rsp = 0x00007f13077b9d70 r12 = 0x00007f127209f600
r13 = 0x00007f13077b9ff0 r14 = 0x000000003dc58000
r15 = 0x00007f13077b9de0 rip = 0x0000000000d6f7f2
Found by: call frame info
19 impalad!impala::Coordinator::Cancel(impala::Status const*)
rbx = 0x000000003dc58000 rbp = 0x000000003dc58390
rsp = 0x00007f13077b9da0 r12 = 0x00007f13077b9ff0
r13 = 0x00007f13077b9ff0 r14 = 0x000000003dc58000
r15 = 0x00007f13077b9de0 rip = 0x0000000000d71b83
Found by: call frame info
20 impalad!impala::ImpalaServer::QueryExecState::Cancel(bool, impala::Status const*)
rbx = 0x00007f12b928e000 rbp = 0x00007f12b928e2b8
rsp = 0x00007f13077b9dc0 r12 = 0x00007f13077b9e60
r13 = 0x00007f13077b9ff0 r14 = 0x000000003dc58000
r15 = 0x00007f13077b9de0 rip = 0x0000000000adba06
Found by: call frame info
21 impalad!impala::ImpalaServer::CancelInternal(impala::TUniqueId const&, bool, impala::Status const*)
rbx = 0x00007f13077b9e70 rbp = 0x00007f13077b9f50
rsp = 0x00007f13077b9e30 r12 = 0x00007f13077b9e60
r13 = 0x00007f13ed2106a0 r14 = 0x000000000f8b1100
r15 = 0x00007f13077b9ff0 rip = 0x0000000000a8597a
Found by: call frame info
Cause of the issue
------------------------
DataStreamRecvr::SenderQueue::Cancel() or DataStreamRecvr::CancelStream() does not wait for threads inside impala::DataStreamRecvr::SenderQueue::GetBatch() to finish, that leads to a situation where the ~DataStreamRecv() will be called with thread still in impala::DataStreamRecvr::SenderQueue::GetBatch() which may sometime result in this crash.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)