You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Joe McDonnell (JIRA)" <ji...@apache.org> on 2017/09/07 16:46:00 UTC
[jira] [Resolved] (IMPALA-5750) Handle uncaught exceptions in
thread creation
[ https://issues.apache.org/jira/browse/IMPALA-5750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joe McDonnell resolved IMPALA-5750.
-----------------------------------
Resolution: Fixed
Fix Version/s: Impala 2.11.0
commit e993b9712c81dfb66fdf65bb5269cdc38a8eef18
Author: Joe McDonnell <jo...@cloudera.com>
Date: Wed Aug 16 17:16:45 2017 -0700
IMPALA-5750: Catch exceptions from boost thread creation
The boost thread constructor will throw boost::thread_resource_error
if it is unable to spawn a thread on the system
(e.g. due to a ulimit). This uncaught exception crashes
Impala. Systems with a large number of nodes and threads
are hitting this limit.
This change catches the exception from the thread
constructor and converts it to a Status. This requires
several changes:
1. util/thread.h's Thread constructor is now private
and all Threads are constructed via a new Create()
static factory method.
2. util/thread-pool.h's ThreadPool requires that Init()
be called after the ThreadPool is constructed.
3. To propagate the Status, Threads cannot be created in
constructors, so this is moved to initialization methods
that can return Status.
4. Threads now use unique_ptr's for management in all
cases. Threads cannot be used as stack-allocated local
variables or direct declarations in classes.
Query execution code paths will now handle the error:
1. If the scan node fails to spawn any scanner thread,
it will abort the query.
2. Failing to spawn a fragment instance from the query
state in StartFInstances() will correctly report the error
to the coordinator and tear down the query.
Testing:
This introduces the parameter thread_creation_fault_injection,
which will cause Thread::Create() calls in eligible
locations to fail randomly roughly 1% of the time.
Quite a few locations of Thread::Create() and
ThreadPool::Init() are necessary for startup and cannot
be eligible. However, all the locations used for query
execution are marked as eligible and governed by this
parameter. The code was tested by setting this parameter
to true and running queries to verify that queries either
run to completion with the correct result or fail with
appropriate status.
Change-Id: I15a2f278dc71892b7fec09593f81b1a57ab725c0
Reviewed-on: http://gerrit.cloudera.org:8080/7730
Reviewed-by: Alex Behm <al...@cloudera.com>
Tested-by: Impala Public Jenkins
> Handle uncaught exceptions in thread creation
> ---------------------------------------------
>
> Key: IMPALA-5750
> URL: https://issues.apache.org/jira/browse/IMPALA-5750
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Affects Versions: Impala 2.9.0
> Reporter: Silvius Rus
> Assignee: Joe McDonnell
> Priority: Critical
> Labels: crash
> Fix For: Impala 2.11.0
>
>
> Impalad uses noexcept API of boost library whenever possible. However, there are certain API which don't implement the noexcept variant. One example of this is the thread creation interface:
> {noformat}
> void Thread::StartThread(const ThreadFunctor& functor) {
> DCHECK(thread_manager.get() != nullptr)
> << "Thread created before InitThreading called";
> DCHECK(tid_ == UNINITIALISED_THREAD_ID) << "StartThread called twice";
> Promise<int64_t> thread_started;
> thread_.reset(
> new thread(&Thread::SuperviseThread, name_, category_, functor, &thread_started));
> // TODO: This slows down thread creation although not enormously. To make this faster,
> // consider delaying thread_started.Get() until the first call to tid(), but bear in
> // mind that some coordination is required between SuperviseThread() and this to make
> // sure that the thread is still available to have its tid set.
> tid_ = thread_started.Get();
> VLOG(2) << "Started thread " << tid_ << " - " << category_ << ":" << name_;
> }
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)