You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2019/08/14 23:11:00 UTC
[jira] [Commented] (IMPALA-8677) Removing an unused node does not leave consistent remote scheduling unchanged

    [ https://issues.apache.org/jira/browse/IMPALA-8677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16907686#comment-16907686 ] 

ASF subversion and git services commented on IMPALA-8677:
---------------------------------------------------------

Commit 33475a3e2c482681f1bab307055b75b0643ecdee in impala's branch refs/heads/master from Joe McDonnell
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=33475a3 ]

IMPALA-8685,IMPALA-8677: Use consistent scheduling for small clusters

In the original change for consistent scheduling, if a cluster has
fewer nodes than the number of remote executor candidates, then
the scheduler falls back to using the old SelectRemoteExecutor().
SelectRemoteExecutor() considers all backends and picks the backend
with the least assigned bytes; to break ties, it uses randomness.
This means that clusters with fewer backends than
num_remote_executor_candidates do not have consistent placement.

For the file handle cache (the original user of consistent
placement), this is not a major problem. However, for data caching,
it can result in slower warm up of the data cache and greater
duplication of the same data across different nodes.

This changes the algorithm to use consistent placement even for
small clusters (num nodes <= num_remote_executor_candidates).
To make this more predictable, it increases the maximum number
of iterations.

This also changes GetRemoteExecutorCandidates() to return the
candidates in the order that they were selected. While still
using a set for detecting duplicate backends, the vector of
distinct backends is constructed directly rather than by
iterating over the set.

Testing:
 - Modify the scheduler-test backend test to verify that small
   clusters use consistent scheduling.

Change-Id: Icfdb2cc53d7206e316ea8a1cc28ad443f246f741
Reviewed-on: http://gerrit.cloudera.org:8080/14026
Reviewed-by: Lars Volker <lv...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Removing an unused node does not leave consistent remote scheduling unchanged
> -----------------------------------------------------------------------------
>
>                 Key: IMPALA-8677
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8677
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 3.2.0
>            Reporter: Joe McDonnell
>            Priority: Major
>
> When working on IMPALA-8630, I discovered that SchedulerTest::RemoteExecutorCandidateConsistency works mostly by happenstance.
> The root of the issue is that in Scheduler::GetRemotExecutorCandidates() we want to avoid returning duplicates and put all the IpAddrs in a set:
> {code:java}
> set<IpAddr> distinct_backends;
> ...
> distinct_backends.insert(*executor_addr);
> ...
> for (const IpAddr& addr : distinct_backends) {
>   remote_executor_candidates->push_back(addr);
> }{code}
> This sorts the IpAddrs, and the remote_executor_candidates does not return elements in the order in which they are encountered.
> Suppose that we are running with num_remote_executor_candidates=2 and random replicas is false. There is exactly one file. GetRemoteExecutorCandidates() returns these executor candidates (IpAddrs):
> {192.168.1.2, 192.168.1.3}
> The first entry is chosen because it is first. Nothing was scheduled on 192.168.1.3, but removing it may change the scheduling outcome. This is because of the sort. Suppose 192.168.1.3 is gone, but the next closest executor is 192.168.1.1 (or some node less than 192.168.1.2). Even though it is farther in the context of the hashring, GetRemoteExecutorCandidates() would return:
> {192.168.1.1, 192.168.1.2}
> and the first entry would be chosen.
> To eliminate this inconsistency, it might be useful to retain the order in which elements match via the hashring.
> In terms of impact, this would increase the number of files that would potentially change scheduling when a node leaves. It might have unnecessary changes. If using random replica set to true, it doesn't matter. It is unclear how much this would impact otherwise.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org