You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Tim Armstrong (JIRA)" <ji...@apache.org> on 2018/01/05 23:20:00 UTC
[jira] [Resolved] (IMPALA-6362) Queries don't make progress due to
what seems like a memory reservation deadlock while running the stress
tests
[ https://issues.apache.org/jira/browse/IMPALA-6362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tim Armstrong resolved IMPALA-6362.
-----------------------------------
Resolution: Fixed
Fix Version/s: Impala 2.12.0
IMPALA-6362: avoid Reservation/MemTracker deadlock
Avoid the circular dependency between ReservationTracker::lock_ and
MemTracker::child_trackers_lock_ by not acquiring
ReservationTracker::lock_ in GetReservation(), where an atomic
operation is sufficient.
Testing:
Added a unit test that reproed the deadlock.
Change-Id: Id7adbe961a925075422c685690dd3d1609779ced
Reviewed-on: http://gerrit.cloudera.org:8080/8933
Reviewed-by: Tim Armstrong <ta...@cloudera.com>
Tested-by: Impala Public Jenkins
---
> Queries don't make progress due to what seems like a memory reservation deadlock while running the stress tests
> ----------------------------------------------------------------------------------------------------------------
>
> Key: IMPALA-6362
> URL: https://issues.apache.org/jira/browse/IMPALA-6362
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Affects Versions: Impala 2.12.0
> Reporter: Mostafa Mokhtar
> Assignee: Tim Armstrong
> Priority: Critical
> Labels: hang
> Fix For: Impala 2.12.0
>
> Attachments: stress_debug_without_krpc_vd1304.halxg.cloudera.com_1.txt.zip, stress_debug_without_krpc_vd1304.halxg.cloudera.com_2.txt.zip
>
>
> Queries stopped making progress, many of the fragment threads are trying to increase or decrease memory reservation and non of those threads is making progress.
> Did some quick analysis on the threads and I couldn't find any thread making progress, so this might be a deadlock.
> cat stress_debug_without_krpc_vd1304.halxg.cloudera.com_1.txt | grep 0x0000000001b01006 -B 4 | awk '{print $4}' | sort -nr | uniq -c | sort -nr
> 1312 impala::SpinLock::lock()
> 1312 impala::ReservationTracker::IncreaseReservationInternalLocked(long,
> 1312 boost::lock_guard<impala::SpinLock>::lock_guard(impala::SpinLock&)
> 1312 base::SpinLock::SlowLock()
> 1312 base::SpinLock::Lock()
> 1311
> cat stress_debug_without_krpc_vd1304.halxg.cloudera.com_1.txt | grep 0x0000000001b017c6 -B 4 | awk '{print $4}' | sort -nr | uniq -c | sort -nr
> 688 impala::ReservationTracker::DecreaseReservation(long,
> 688 impala::ReservationTracker::DecreaseReservationLocked(long,
> 400 impala::SpinLock::lock()
> 400 boost::lock_guard<impala::SpinLock>::lock_guard(impala::SpinLock&)
> 400 base::SpinLock::Lock()
> 399
> {code}
> #0 0x0000000003bd6944 in sys_futex ()
> #1 0x0000000003bd6a85 in base::internal::SpinLockDelay(int volatile*, int, int) ()
> #2 0x0000000003bd6835 in base::SpinLock::SlowLock() ()
> #3 0x00000000015f75fd in base::SpinLock::Lock() ()
> #4 0x00000000015f7672 in impala::SpinLock::lock() ()
> #5 0x00000000015f8d4c in boost::lock_guard<impala::SpinLock>::lock_guard(impala::SpinLock&) ()
> #6 0x0000000001b015bf in impala::ReservationTracker::DecreaseReservation(long, bool) ()
> #7 0x0000000001b017c6 in impala::ReservationTracker::DecreaseReservationLocked(long, bool) ()
> #8 0x0000000001b015d6 in impala::ReservationTracker::DecreaseReservation(long, bool) ()
> #9 0x0000000001b017c6 in impala::ReservationTracker::DecreaseReservationLocked(long, bool) ()
> #10 0x0000000001b015d6 in impala::ReservationTracker::DecreaseReservation(long, bool) ()
> #11 0x00000000018aabf0 in impala::ReservationTracker::DecreaseReservation(long) ()
> #12 0x00000000018aaaee in impala::InitialReservations::Return(impala::BufferPool::ClientHandle*, long) ()
> #13 0x0000000001b5e8e9 in impala::ExecNode::Close(impala::RuntimeState*) ()
> #14 0x000000000293ef2c in impala::BlockingJoinNode::Close(impala::RuntimeState*) ()
> #15 0x00000000028d639f in impala::PartitionedHashJoinNode::Close(impala::RuntimeState*) ()
> #16 0x00000000018a51aa in impala::FragmentInstanceState::Close() ()
> #17 0x00000000018a24b8 in impala::FragmentInstanceState::Exec() ()
> #18 0x000000000188afe6 in impala::QueryState::ExecFInstance(impala::FragmentInstanceState*) ()
> #19 0x0000000001889886 in impala::QueryState::StartFInstances()::{lambda()#1}::operator()() const ()
> #20 0x000000000188bc25 in boost::detail::function::void_function_obj_invoker0<impala::QueryState::StartFInstances()::{lambda()#1}, void>::invoke(boost::detail::function::function_buffer&) ()
> {code}
> {code}
> #0 0x0000000003bd6944 in sys_futex ()
> #1 0x0000000003bd6a85 in base::internal::SpinLockDelay(int volatile*, int, int) ()
> #2 0x0000000003bd6835 in base::SpinLock::SlowLock() ()
> #3 0x00000000015f75fd in base::SpinLock::Lock() ()
> #4 0x00000000015f7672 in impala::SpinLock::lock() ()
> #5 0x00000000015f8d4c in boost::lock_guard<impala::SpinLock>::lock_guard(impala::SpinLock&) ()
> #6 0x0000000001b01006 in impala::ReservationTracker::IncreaseReservationInternalLocked(long, bool, bool, impala::Status*) ()
> #7 0x0000000001b01031 in impala::ReservationTracker::IncreaseReservationInternalLocked(long, bool, bool, impala::Status*) ()
> #8 0x0000000001b01031 in impala::ReservationTracker::IncreaseReservationInternalLocked(long, bool, bool, impala::Status*) ()
> #9 0x0000000001b006f5 in impala::ReservationTracker::IncreaseReservationToFit(long, impala::Status*) ()
> #10 0x0000000001af738e in impala::BufferPool::ClientHandle::IncreaseReservationToFit(long) ()
> #11 0x0000000002c66574 in impala::BufferedTupleStream::AdvanceWritePage(long, bool*) ()
> #12 0x0000000002c692d9 in impala::BufferedTupleStream::AddRowCustomBeginSlow(long, impala::Status*) ()
> #13 0x0000000002c69111 in impala::BufferedTupleStream::AddRowSlow(impala::TupleRow*, impala::Status*) ()
> #14 0x0000000002c69b5e in impala::BufferedTupleStream::AddRow(impala::TupleRow*, impala::Status*) ()
> #15 0x00007f059f628148 in impala::PhjBuilder::ProcessBuildBatch ()
> #16 0x000000000295e10c in impala::PhjBuilder::Send(impala::RuntimeState*, impala::RowBatch*) ()
> #17 0x0000000002941fda in impala::Status impala::BlockingJoinNode::SendBuildInputToSink<false>(impala::RuntimeState*, impala::DataSink*) ()
> #18 0x000000000293fe59 in impala::BlockingJoinNode::ProcessBuildInputAndOpenProbe(impala::RuntimeState*, impala::DataSink*) ()
> #19 0x00000000028d58cf in impala::PartitionedHashJoinNode::Open(impala::RuntimeState*) ()
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)