You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Michael Ho (JIRA)" <ji...@apache.org> on 2019/06/19 22:13:00 UTC

[jira] [Created] (IMPALA-8685) Evaluate default configuration of NUM_REMOTE_EXECUTOR_CANDIDATES

Michael Ho created IMPALA-8685:
----------------------------------

             Summary: Evaluate default configuration of NUM_REMOTE_EXECUTOR_CANDIDATES
                 Key: IMPALA-8685
                 URL: https://issues.apache.org/jira/browse/IMPALA-8685
             Project: IMPALA
          Issue Type: Improvement
          Components: Backend
            Reporter: Michael Ho


The query option {{NUM_REMOTE_EXECUTOR_CANDIDATES}} is set to 3 by default. This means that there are potentially 3 different executors which can process a remote scan range. Over time, the data of a given remote scan range will be spread across these 3 executors. My understanding of why this is not set to 1 is to avoid hot spots in pathological cases. On the other hand, this may mean that we may not maximize the utilization of the file handle cache and data cache. Also, for small clusters (e.g. a 3 node cluster), the default value may render deterministic remote scan range scheduling ineffective. We may want to re-evaluate the default value of {{NUM_REMOTE_EXECUTOR_CANDIDATES}}. One idea is to set it to min(3, half of cluster size) so it works okay with small cluster, which may be rather common for demo purposes. There may also be other criteria for evaluating the default value.

cc'ing [~joemcdonnell], [~tlipcon] and [~drorke]




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)