You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2020/06/01 23:53:00 UTC
[jira] [Commented] (IMPALA-9655) Dynamic intra-node load balancing
for HDFS scans
[ https://issues.apache.org/jira/browse/IMPALA-9655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17121409#comment-17121409 ]
ASF subversion and git services commented on IMPALA-9655:
---------------------------------------------------------
Commit 052129c16a29891a72427b351bcc4e087d772fbe in impala's branch refs/heads/master from Bikramjeet Vig
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=052129c ]
IMPALA-9655: Dynamic intra-node load balancing for HDFS scans
This patch ameliorates intra node execution skew for multithreaded
HDFS scans by implementing a shared queue of scan ranges for all
instances. Some points to highlight:
- The scan node's PlanNode will go through all the TScanRanges
assigned to all instances and aggregate them into file descriptors.
- These files will be statically and evenly distributed among the
instances.
- The instances would then pick up their set of files and issue
initial ranges by adding them to a shared queue.
- Instances would then fetch their next range to read from this
shared queue.
- Other relevant data structures that will also be shared are:
* remaining_scan_range_submissions_
* partition_template_tuple_map_
* per_file_metadata_
- Removed the longest-processing time (LPT) algorithm in the scheduler
which tries to distribute scan load among instances on a host. This
will have no effect after this patch since now the ranges are shared.
- Added missing lifecycle events from MT scan nodes.
This approach guarantees the following:
- All shared data structures are allocated at the FragmentState
level ensuring they'll stick around till the query is closed.
- All instance local buffers for reading a scan range will be
allocated after it has been taken off the shared queue.
- Since the scheduler can assign ranges from the same file to
different instances, aggregating them into file descriptors early
will ensure the initial footer or header range (depending on file
format) for it will only be issued once.
Limitation:
- For HDFS MT scans we lose out on the optimization within
ReaderContext which ensures cached ranges are read first.
- As a compromise, ranges that are marked to use the hdfs
cache are added to the front of the shared queue.
Testing:
- Added a regression test in test_mt_dop which would almost always
fail without this patch
- Added some test coverage for mt scans where parquet files are split
across blocks and scanning tables with mixed file formats.
- Ran core tests on ASAN build
- Ran exhaustive tests
Performance:
No significant perf change.
Ran TPCH (scale 30) with mt_dop set to 4 on a 3 node minicluster on my
desktop.
+----------+-----------------------+---------+------------+------------+----------------+
| Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) |
+----------+-----------------------+---------+------------+------------+----------------+
| TPCH(30) | parquet / none / none | 5.85 | +0.35% | 4.35 | +0.60% |
+----------+-----------------------+---------+------------+------------+----------------+
+----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------+----------------+---------+-------+
| Workload | Query | File Format | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | Tval |
+----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------+----------------+---------+-------+
| TPCH(30) | TPCH-Q15 | parquet / none / none | 4.46 | 4.30 | +3.63% | 1.64% | 2.23% | 30 | +3.56% | 5.28 | 7.09 |
| TPCH(30) | TPCH-Q13 | parquet / none / none | 9.51 | 9.23 | +3.12% | 1.19% | 0.79% | 30 | +2.84% | 6.48 | 11.69 |
| TPCH(30) | TPCH-Q2 | parquet / none / none | 2.37 | 2.32 | +1.89% | 3.76% | 3.26% | 30 | +2.16% | 1.97 | 2.06 |
| TPCH(30) | TPCH-Q9 | parquet / none / none | 16.84 | 16.55 | +1.77% | 1.24% | 0.57% | 30 | +1.62% | 5.75 | 6.99 |
| TPCH(30) | TPCH-Q19 | parquet / none / none | 3.65 | 3.60 | +1.43% | 2.19% | 1.72% | 30 | +1.39% | 2.53 | 2.79 |
| TPCH(30) | TPCH-Q18 | parquet / none / none | 12.64 | 12.48 | +1.33% | 1.51% | 1.53% | 30 | +1.28% | 2.67 | 3.36 |
| TPCH(30) | TPCH-Q14 | parquet / none / none | 2.94 | 2.91 | +1.03% | 1.91% | 1.77% | 30 | +1.48% | 1.98 | 2.16 |
| TPCH(30) | TPCH-Q6 | parquet / none / none | 1.31 | 1.29 | +1.44% | 5.48% | 3.66% | 30 | +0.43% | 1.10 | 1.18 |
| TPCH(30) | TPCH-Q7 | parquet / none / none | 5.49 | 5.44 | +0.78% | 0.59% | 1.16% | 30 | +0.95% | 3.49 | 3.27 |
| TPCH(30) | TPCH-Q8 | parquet / none / none | 5.40 | 5.35 | +0.83% | 0.95% | 0.95% | 30 | +0.83% | 2.81 | 3.37 |
| TPCH(30) | TPCH-Q20 | parquet / none / none | 2.82 | 2.81 | +0.50% | 1.58% | 1.72% | 30 | +0.30% | 1.22 | 1.16 |
| TPCH(30) | TPCH-Q11 | parquet / none / none | 1.43 | 1.42 | +0.64% | 2.42% | 1.96% | 30 | +0.09% | 0.68 | 1.13 |
| TPCH(30) | TPCH-Q4 | parquet / none / none | 2.59 | 2.58 | +0.28% | 1.54% | 1.99% | 30 | +0.23% | 1.17 | 0.60 |
| TPCH(30) | TPCH-Q22 | parquet / none / none | 2.17 | 2.17 | +0.15% | 3.04% | 3.68% | 30 | +0.15% | 0.30 | 0.18 |
| TPCH(30) | TPCH-Q16 | parquet / none / none | 2.02 | 2.02 | +0.21% | 2.73% | 2.85% | 30 | +0.07% | 0.26 | 0.30 |
| TPCH(30) | TPCH-Q5 | parquet / none / none | 4.16 | 4.15 | +0.04% | 4.49% | 4.38% | 30 | -0.03% | -0.13 | 0.03 |
| TPCH(30) | TPCH-Q3 | parquet / none / none | 4.75 | 4.76 | -0.28% | 1.34% | 1.47% | 30 | -0.10% | -0.57 | -0.77 |
| TPCH(30) | TPCH-Q10 | parquet / none / none | 4.92 | 4.95 | -0.52% | 1.45% | 2.71% | 30 | +0.03% | 0.12 | -0.94 |
| TPCH(30) | TPCH-Q12 | parquet / none / none | 2.82 | 2.85 | -0.78% | 1.50% | 1.68% | 30 | -0.65% | -1.66 | -1.91 |
| TPCH(30) | TPCH-Q1 | parquet / none / none | 4.49 | 4.53 | -0.97% | 2.26% | 2.69% | 30 | -0.68% | -0.99 | -1.52 |
| TPCH(30) | TPCH-Q17 | parquet / none / none | 10.11 | 10.19 | -0.85% | 3.27% | 3.71% | 30 | -0.84% | -0.85 | -0.94 |
| TPCH(30) | TPCH-Q21 | parquet / none / none | 21.75 | 22.28 | -2.37% | 3.51% | 7.00% | 30 | -0.88% | -1.35 | -1.66 |
+----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------+----------------+---------+-------+
Change-Id: I9a101d0d98dff6e3779f85bc466e4c0bdb38094b
Reviewed-on: http://gerrit.cloudera.org:8080/15926
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
> Dynamic intra-node load balancing for HDFS scans
> ------------------------------------------------
>
> Key: IMPALA-9655
> URL: https://issues.apache.org/jira/browse/IMPALA-9655
> Project: IMPALA
> Issue Type: Sub-task
> Components: Backend
> Reporter: Tim Armstrong
> Assignee: Bikramjeet Vig
> Priority: Major
> Labels: multithreading, performance
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org