You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2018/04/28 05:29:00 UTC

[jira] [Commented] (IMPALA-6920) Multithreaded scans are not guaranteed to get a thread token immediately

    [ https://issues.apache.org/jira/browse/IMPALA-6920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16457368#comment-16457368 ] 

ASF subversion and git services commented on IMPALA-6920:
---------------------------------------------------------

Commit 789c5aac23480acc6e18c057b767b65fdd791c97 in impala's branch refs/heads/master from [~tarmstrong@cloudera.com]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=789c5aa ]

IMPALA-6920: fix inconsistencies with scanner thread tokens

The first scanner thread to start now takes a "required" token,
which always succeeds. Only additional threads try to get
"optional" tokens, which can fail. Previously threads always
requested optional tokens, which could fail and leave the scan
node without any running threads until its callback is invoked.

This allows us to remove the "reserved optional token" and
set_max_quota() interfaces from ThreadResourceManager. There should
be no behavioural changes in ThreadResourceMgr in cases when those
features are not used.

Also switch Kudu to using the same logic for implementing
NUM_SCANNER_THREADS (it was not switched over to the improved
HDFS scanner logic added in IMPALA-2831).

Do some cleanup in ThreadResourceMgr code while we're here:
* Fix some benign data races in ThreadResourceMgr by switching to
  AtomicInt* classes.
* Remove pointless object caching (TCMalloc will do better).
* Reduce dependencies on the thread-resource-mgr.h header.

Testing:
Ran core tests.

Ran a few queries under TSAN, checked that it didn't report any more
races in this code after fixing those data races.

I couldn't construct a regression test because there are no easily
testable consequences of the change - the main difference is that
some scanner threads start earlier when there is pressure on scanner
thread tokens but that is hard to construct a robust test around.

Change-Id: I16d31d72441aff7293759281d0248e641df43704
Reviewed-on: http://gerrit.cloudera.org:8080/10186
Reviewed-by: Tim Armstrong <ta...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Multithreaded scans are not guaranteed to get a thread token immediately
> ------------------------------------------------------------------------
>
>                 Key: IMPALA-6920
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6920
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 2.12.0
>            Reporter: Tim Armstrong
>            Assignee: Tim Armstrong
>            Priority: Major
>              Labels: resource-management
>
> This bug applies to multithreaded HDFS and Kudu scans.
> So what happens is that we reserve an optional token for the first scanner thread but that can be taken by any other operator in the same fragment. What happens in one fragment in TPC-DS q18a is:
> 1. The hash join grabs an extra token for the join build. I guess it does this early so it gets an optional token before other fragments can grab them.
> 2. The scan node reserves an optional token in Open(). This optional token is already in use by the hash join.
> 3. The scan node tries to start the first scanner thread, but there are no optional tokens available, so it can't start any.
> 4. Eventually the optional token is given up and the scanner thread can start.
> If #4 always happens without the scan making progress, then no deadlock is possible, but if there's any kind of circular dependency, this can deadlock.
> Kudu scans also do not implement the num_scanner_threads query option in the same way as HDFS scans - the IMPALA-2831 changes were not applied to kudu.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org