You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Joe McDonnell (Code Review)" <ge...@cloudera.org> on 2019/01/11 04:28:13 UTC

[Impala-ASF-CR] IMPALA-7928: Consistent remote read scheduling

Hello Michael Ho, Lars Volker, Philip Zeyliger, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/12037

to look at the new patch set (#2).

Change subject: IMPALA-7928: Consistent remote read scheduling
......................................................................

IMPALA-7928: Consistent remote read scheduling

Currently, remote reads for a particular file are not
scheduled to a consistent set of nodes. This reduces the
efficiency of the HDFS file handle cache (and any other
cache that is at the file level).

This schedules remote reads consistently by generating a
set of simluated remote replicas for each file. The simulated
remote replicas are generated by hashing the filename multiple
times and finding the closest nodes in a hash ring. This is a
consistent hash that is designed to limit the number of files
remapped when cluster nodes come and go. The number of simulated
remote replicas is controlled by a query option
'num_simulated_remote_replicas', which defaults to 3.

Once the simulated remote replicas are chosen, the algorithm
for picking a specific replica uses the same algorithm as
picking a local replica. It picks the node with the minimum
number of assigned bytes and uses 'schedule_random_replica'
to determine how to break ties.

It leaves the normal algorithms in place for local
files, Kudu, and HBase. If 'num_simulated_remote_replicas'
is set to 0, simulated remote replicas are disabled and
the previous remote scheduling algorithm is used.

Change-Id: Icbf74088a8bd8c285ab7285ea3a01acd1bb53a45
---
M be/src/experiments/CMakeLists.txt
A be/src/experiments/hash-ring-test.cc
M be/src/scheduling/scheduler-test-util.h
M be/src/scheduling/scheduler-test.cc
M be/src/scheduling/scheduler.cc
M be/src/scheduling/scheduler.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
10 files changed, 332 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/12037/2
-- 
To view, visit http://gerrit.cloudera.org:8080/12037
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Icbf74088a8bd8c285ab7285ea3a01acd1bb53a45
Gerrit-Change-Number: 12037
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <ph...@cloudera.com>