You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2018/04/28 05:29:00 UTC

[jira] [Commented] (IMPALA-2831) Impala can spin up too many scanner threads

    [ https://issues.apache.org/jira/browse/IMPALA-2831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16457369#comment-16457369 ] 

ASF subversion and git services commented on IMPALA-2831:
---------------------------------------------------------

Commit 789c5aac23480acc6e18c057b767b65fdd791c97 in impala's branch refs/heads/master from [~tarmstrong@cloudera.com]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=789c5aa ]

IMPALA-6920: fix inconsistencies with scanner thread tokens

The first scanner thread to start now takes a "required" token,
which always succeeds. Only additional threads try to get
"optional" tokens, which can fail. Previously threads always
requested optional tokens, which could fail and leave the scan
node without any running threads until its callback is invoked.

This allows us to remove the "reserved optional token" and
set_max_quota() interfaces from ThreadResourceManager. There should
be no behavioural changes in ThreadResourceMgr in cases when those
features are not used.

Also switch Kudu to using the same logic for implementing
NUM_SCANNER_THREADS (it was not switched over to the improved
HDFS scanner logic added in IMPALA-2831).

Do some cleanup in ThreadResourceMgr code while we're here:
* Fix some benign data races in ThreadResourceMgr by switching to
  AtomicInt* classes.
* Remove pointless object caching (TCMalloc will do better).
* Reduce dependencies on the thread-resource-mgr.h header.

Testing:
Ran core tests.

Ran a few queries under TSAN, checked that it didn't report any more
races in this code after fixing those data races.

I couldn't construct a regression test because there are no easily
testable consequences of the change - the main difference is that
some scanner threads start earlier when there is pressure on scanner
thread tokens but that is hard to construct a robust test around.

Change-Id: I16d31d72441aff7293759281d0248e641df43704
Reviewed-on: http://gerrit.cloudera.org:8080/10186
Reviewed-by: Tim Armstrong <ta...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Impala can spin up too many scanner threads
> -------------------------------------------
>
>                 Key: IMPALA-2831
>                 URL: https://issues.apache.org/jira/browse/IMPALA-2831
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 2.2, Impala 2.3.0, Impala 2.5.0
>            Reporter: Tim Armstrong
>            Assignee: Michael Ho
>            Priority: Major
>              Labels: resource-management
>             Fix For: Impala 2.7.0
>
>
> We have observed a number of problems with the way Impala dynamically creates scanner threads, where more scanner threads are created than is ideal.
> * The scanner memory heuristic can lead to excessive memory consumption, especially for very selective scans with wide rows. The current heuristic for limiting memory consumption does not do well in these cases. There are likely several interlinked causes here, which will need further investigation.
> * The non-deterministic scanner thread heuristic can lead to a great deal of performance variability. At a minimum, the number of scanner threads should always converge to the same number for the same plan and data if the query is the only one running on the cluster.
> * Beyond a point, adding additional scanner threads does not improve performance (and can degrade it), but the heuristic will keep on spinning up scanner threads if there are tokens and memory available.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org