You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2020/06/01 23:53:00 UTC

[jira] [Commented] (IMPALA-9655) Dynamic intra-node load balancing for HDFS scans

    [ https://issues.apache.org/jira/browse/IMPALA-9655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17121409#comment-17121409 ] 

ASF subversion and git services commented on IMPALA-9655:
---------------------------------------------------------

Commit 052129c16a29891a72427b351bcc4e087d772fbe in impala's branch refs/heads/master from Bikramjeet Vig
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=052129c ]

IMPALA-9655: Dynamic intra-node load balancing for HDFS scans

This patch ameliorates intra node execution skew for multithreaded
HDFS scans by implementing a shared queue of scan ranges for all
instances. Some points to highlight:
- The scan node's PlanNode will go through all the TScanRanges
  assigned to all instances and aggregate them into file descriptors.
- These files will be statically and evenly distributed among the
  instances.
- The instances would then pick up their set of files and issue
  initial ranges by adding them to a shared queue.
- Instances would then fetch their next range to read from this
  shared queue.
- Other relevant data structures that will also be shared are:
 * remaining_scan_range_submissions_
 * partition_template_tuple_map_
 * per_file_metadata_
- Removed the longest-processing time (LPT) algorithm in the scheduler
  which tries to distribute scan load among instances on a host. This
  will have no effect after this patch since now the ranges are shared.
- Added missing lifecycle events from MT scan nodes.

This approach guarantees the following:
- All shared data structures are allocated at the FragmentState
  level ensuring they'll stick around till the query is closed.
- All instance local buffers for reading a scan range will be
  allocated after it has been taken off the shared queue.
- Since the scheduler can assign ranges from the same file to
  different instances, aggregating them into file descriptors early
  will ensure the initial footer or header range (depending on file
  format) for it will only be issued once.

Limitation:
- For HDFS MT scans we lose out on the optimization within
  ReaderContext which ensures cached ranges are read first.
- As a compromise, ranges that are marked to use the hdfs
  cache are added to the front of the shared queue.

Testing:
- Added a regression test in test_mt_dop which would almost always
  fail without this patch
- Added some test coverage for mt scans where parquet files are split
  across blocks and scanning tables with mixed file formats.
- Ran core tests on ASAN build
- Ran exhaustive tests

Performance:

No significant perf change.
Ran TPCH (scale 30) with mt_dop set to 4 on a 3 node minicluster on my
desktop.

+----------+-----------------------+---------+------------+------------+----------------+
| Workload | File Format           | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) |
+----------+-----------------------+---------+------------+------------+----------------+
| TPCH(30) | parquet / none / none | 5.85    | +0.35%     | 4.35       | +0.60%         |
+----------+-----------------------+---------+------------+------------+----------------+

+----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------+----------------+---------+-------+
| Workload | Query    | File Format           | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | Tval  |
+----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------+----------------+---------+-------+
| TPCH(30) | TPCH-Q15 | parquet / none / none | 4.46   | 4.30        |   +3.63%   |   1.64%   |   2.23%        | 30    |   +3.56%       | 5.28    | 7.09  |
| TPCH(30) | TPCH-Q13 | parquet / none / none | 9.51   | 9.23        |   +3.12%   |   1.19%   |   0.79%        | 30    |   +2.84%       | 6.48    | 11.69 |
| TPCH(30) | TPCH-Q2  | parquet / none / none | 2.37   | 2.32        |   +1.89%   |   3.76%   |   3.26%        | 30    |   +2.16%       | 1.97    | 2.06  |
| TPCH(30) | TPCH-Q9  | parquet / none / none | 16.84  | 16.55       |   +1.77%   |   1.24%   |   0.57%        | 30    |   +1.62%       | 5.75    | 6.99  |
| TPCH(30) | TPCH-Q19 | parquet / none / none | 3.65   | 3.60        |   +1.43%   |   2.19%   |   1.72%        | 30    |   +1.39%       | 2.53    | 2.79  |
| TPCH(30) | TPCH-Q18 | parquet / none / none | 12.64  | 12.48       |   +1.33%   |   1.51%   |   1.53%        | 30    |   +1.28%       | 2.67    | 3.36  |
| TPCH(30) | TPCH-Q14 | parquet / none / none | 2.94   | 2.91        |   +1.03%   |   1.91%   |   1.77%        | 30    |   +1.48%       | 1.98    | 2.16  |
| TPCH(30) | TPCH-Q6  | parquet / none / none | 1.31   | 1.29        |   +1.44%   |   5.48%   |   3.66%        | 30    |   +0.43%       | 1.10    | 1.18  |
| TPCH(30) | TPCH-Q7  | parquet / none / none | 5.49   | 5.44        |   +0.78%   |   0.59%   |   1.16%        | 30    |   +0.95%       | 3.49    | 3.27  |
| TPCH(30) | TPCH-Q8  | parquet / none / none | 5.40   | 5.35        |   +0.83%   |   0.95%   |   0.95%        | 30    |   +0.83%       | 2.81    | 3.37  |
| TPCH(30) | TPCH-Q20 | parquet / none / none | 2.82   | 2.81        |   +0.50%   |   1.58%   |   1.72%        | 30    |   +0.30%       | 1.22    | 1.16  |
| TPCH(30) | TPCH-Q11 | parquet / none / none | 1.43   | 1.42        |   +0.64%   |   2.42%   |   1.96%        | 30    |   +0.09%       | 0.68    | 1.13  |
| TPCH(30) | TPCH-Q4  | parquet / none / none | 2.59   | 2.58        |   +0.28%   |   1.54%   |   1.99%        | 30    |   +0.23%       | 1.17    | 0.60  |
| TPCH(30) | TPCH-Q22 | parquet / none / none | 2.17   | 2.17        |   +0.15%   |   3.04%   |   3.68%        | 30    |   +0.15%       | 0.30    | 0.18  |
| TPCH(30) | TPCH-Q16 | parquet / none / none | 2.02   | 2.02        |   +0.21%   |   2.73%   |   2.85%        | 30    |   +0.07%       | 0.26    | 0.30  |
| TPCH(30) | TPCH-Q5  | parquet / none / none | 4.16   | 4.15        |   +0.04%   |   4.49%   |   4.38%        | 30    |   -0.03%       | -0.13   | 0.03  |
| TPCH(30) | TPCH-Q3  | parquet / none / none | 4.75   | 4.76        |   -0.28%   |   1.34%   |   1.47%        | 30    |   -0.10%       | -0.57   | -0.77 |
| TPCH(30) | TPCH-Q10 | parquet / none / none | 4.92   | 4.95        |   -0.52%   |   1.45%   |   2.71%        | 30    |   +0.03%       | 0.12    | -0.94 |
| TPCH(30) | TPCH-Q12 | parquet / none / none | 2.82   | 2.85        |   -0.78%   |   1.50%   |   1.68%        | 30    |   -0.65%       | -1.66   | -1.91 |
| TPCH(30) | TPCH-Q1  | parquet / none / none | 4.49   | 4.53        |   -0.97%   |   2.26%   |   2.69%        | 30    |   -0.68%       | -0.99   | -1.52 |
| TPCH(30) | TPCH-Q17 | parquet / none / none | 10.11  | 10.19       |   -0.85%   |   3.27%   |   3.71%        | 30    |   -0.84%       | -0.85   | -0.94 |
| TPCH(30) | TPCH-Q21 | parquet / none / none | 21.75  | 22.28       |   -2.37%   |   3.51%   |   7.00%        | 30    |   -0.88%       | -1.35   | -1.66 |
+----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------+----------------+---------+-------+

Change-Id: I9a101d0d98dff6e3779f85bc466e4c0bdb38094b
Reviewed-on: http://gerrit.cloudera.org:8080/15926
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Dynamic intra-node load balancing for HDFS scans
> ------------------------------------------------
>
>                 Key: IMPALA-9655
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9655
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Backend
>            Reporter: Tim Armstrong
>            Assignee: Bikramjeet Vig
>            Priority: Major
>              Labels: multithreading, performance
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org