You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2020/03/18 06:19:00 UTC

[jira] [Commented] (IMPALA-9422) Improve join builder profiles

    [ https://issues.apache.org/jira/browse/IMPALA-9422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17061436#comment-17061436 ] 

ASF subversion and git services commented on IMPALA-9422:
---------------------------------------------------------

Commit 08acccf9ebb13445ab7e3d5d2cd6ceb122bde1f3 in impala's branch refs/heads/master from Tim Armstrong
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=08acccf ]

IMPALA-9156: share broadcast join builds

The scheduler will only create one join build finstance per
backend in cases where this is supported.

The builder is aware of the number of finstances executing the
probe and hands off the build data structures to the builders.

Nested loop join requires minimal modifications because the
build data structures are read-only after initial construction.
The only significant change is that memory can't be transferred
to the multiple consumers, so MarkNeedsDeepCopy() needs to be
used instead.

Hash join requires additional synchronisation because the
spilling algorithm mutates build-side data structures. This
patch adds synchronisation so that rebuilding spilled
partitions is done in a thread-safe manner, using a single
thread. This uses the CyclicBarrier added in an earlier patch.

Threads blocked on CyclicBarrier need to be cancellable,
which is handled by cancelling the barrier when cancelling
fragments on the backend.

BufferPool now correctly handles multiple threads calling
CleanPages() concurrently, which makes other methods thread-safe.

Update planner to cost broadcast join and estimate memory
consumption based on a single instance per node.

Planner estimates of number of instances are improved. Instead of
assuming mt_dop instances per node, use the total number of input
splits (also called scan ranges in places) as an upper bound on
the number of instances generated by scans. These instance
estimates from the scan nodes are then propagated up the
plan tree in the same way as the numNodes estimates. The instance
estimate for the join build fragment is fixed to be based on
the destination fragment.

The profile now correctly accounts for time waiting for the
builder, counting it in inactive time and showing it in the
node timeline. Additional improvements/cleanup to the time
accounting are deferring until IMPALA-9422.

Testing:
* Updated planner tests
* Ran a single node stress test with TPC-H and TPC-DS
* Add a targeted test for spilling broadcast joins, both repartitioning
  and not repartitioning.
* Add a targeted test for a spilling broadcast join with empty probe
* Add a targeted test for spilling broadcast join with empty build
  partitions.
* Add a broadcast join to test_cancellation and test_failpoints.

Perf:

I did a single node run on my desktop:
+----------+-----------------------+---------+------------+------------+----------------+
| Workload | File Format           | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) |
+----------+-----------------------+---------+------------+------------+----------------+
| TPCH(30) | parquet / none / none | 6.26    | -15.70%    | 4.63       | -16.16%        |
+----------+-----------------------+---------+------------+------------+----------------+

+----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------+----------------+---------+---------+
| Workload | Query    | File Format           | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | Tval    |
+----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------+----------------+---------+---------+
| TPCH(30) | TPCH-Q21 | parquet / none / none | 24.97  | 23.25       | R +7.38%   |   0.51%   |   0.22%        | 5     | R +6.95%       | 2.31    | 27.93   |
| TPCH(30) | TPCH-Q4  | parquet / none / none | 2.83   | 2.79        |   +1.31%   |   1.86%   |   0.36%        | 5     |   +1.88%       | 1.15    | 1.53    |
| TPCH(30) | TPCH-Q6  | parquet / none / none | 1.28   | 1.28        |   -0.01%   |   1.64%   |   1.63%        | 5     |   -0.11%       | -0.58   | -0.01   |
| TPCH(30) | TPCH-Q22 | parquet / none / none | 2.65   | 2.68        |   -0.94%   |   0.84%   |   1.46%        | 5     |   -0.21%       | -0.87   | -1.25   |
| TPCH(30) | TPCH-Q1  | parquet / none / none | 4.69   | 4.72        |   -0.56%   |   1.29%   |   0.52%        | 5     |   -1.04%       | -1.15   | -0.89   |
| TPCH(30) | TPCH-Q13 | parquet / none / none | 10.64  | 10.80       |   -1.48%   |   0.61%   |   0.60%        | 5     |   -1.39%       | -1.73   | -3.91   |
| TPCH(30) | TPCH-Q15 | parquet / none / none | 4.11   | 4.32        |   -4.92%   |   0.05%   |   0.40%        | 5     |   -4.93%       | -2.31   | -27.46  |
| TPCH(30) | TPCH-Q20 | parquet / none / none | 3.47   | 3.67        | I -5.41%   |   0.81%   |   0.03%        | 5     | I -5.70%       | -2.31   | -15.75  |
| TPCH(30) | TPCH-Q17 | parquet / none / none | 7.58   | 8.14        | I -6.93%   |   3.13%   |   2.62%        | 5     | I -9.31%       | -2.02   | -3.96   |
| TPCH(30) | TPCH-Q9  | parquet / none / none | 15.59  | 17.02       | I -8.38%   |   0.95%   |   0.43%        | 5     | I -8.92%       | -2.31   | -19.37  |
| TPCH(30) | TPCH-Q14 | parquet / none / none | 2.90   | 3.25        | I -10.93%  |   1.42%   |   4.41%        | 5     | I -10.28%      | -2.31   | -5.33   |
| TPCH(30) | TPCH-Q12 | parquet / none / none | 2.69   | 3.13        | I -14.31%  |   4.50%   |   1.40%        | 5     | I -17.79%      | -2.31   | -7.80   |
| TPCH(30) | TPCH-Q16 | parquet / none / none | 2.50   | 3.03        | I -17.54%  |   0.10%   |   0.79%        | 5     | I -20.55%      | -2.31   | -49.31  |
| TPCH(30) | TPCH-Q10 | parquet / none / none | 4.76   | 5.92        | I -19.52%  |   0.78%   |   0.33%        | 5     | I -24.31%      | -2.31   | -61.63  |
| TPCH(30) | TPCH-Q2  | parquet / none / none | 2.56   | 3.33        | I -23.18%  |   2.13%   |   0.85%        | 5     | I -30.39%      | -2.31   | -28.14  |
| TPCH(30) | TPCH-Q18 | parquet / none / none | 12.59  | 16.41       | I -23.26%  |   1.73%   |   0.90%        | 5     | I -30.43%      | -2.31   | -32.36  |
| TPCH(30) | TPCH-Q11 | parquet / none / none | 1.83   | 2.41        | I -24.04%  |   1.83%   |   2.22%        | 5     | I -30.48%      | -2.31   | -20.54  |
| TPCH(30) | TPCH-Q8  | parquet / none / none | 4.43   | 5.94        | I -25.33%  |   0.96%   |   0.54%        | 5     | I -34.54%      | -2.31   | -63.01  |
| TPCH(30) | TPCH-Q5  | parquet / none / none | 3.81   | 5.37        | I -29.08%  |   1.43%   |   0.69%        | 5     | I -41.47%      | -2.31   | -53.11  |
| TPCH(30) | TPCH-Q7  | parquet / none / none | 13.34  | 21.49       | I -37.92%  |   0.46%   |   0.30%        | 5     | I -60.69%      | -2.31   | -203.08 |
| TPCH(30) | TPCH-Q3  | parquet / none / none | 4.73   | 7.73        | I -38.81%  |   4.90%   |   1.35%        | 5     | I -61.68%      | -2.31   | -26.40  |
| TPCH(30) | TPCH-Q19 | parquet / none / none | 3.71   | 6.61        | I -43.83%  |   1.63%   |   0.09%        | 5     | I -77.12%      | -2.31   | -106.61 |
+----------+----------+-----------------------+--------+-------------+------------+-----------+----------------+-------+----------------+---------+---------+

Change-Id: I4c67e4b2c87ed0fba648f1e1710addb885d66dc7
Reviewed-on: http://gerrit.cloudera.org:8080/15096
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Improve join builder profiles
> -----------------------------
>
>                 Key: IMPALA-9422
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9422
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>            Reporter: Tim Armstrong
>            Assignee: Tim Armstrong
>            Priority: Major
>              Labels: multithreading
>
> We should clean up/improve the join builder profiles for the separate build.
> First, for the separate build, we should ensure that all time spent in the builder is counted against the builder. E.g. calls into public methods like BeginSpilledProbe(). These should be counted as idle time for the actual join implementation, so that we can see that the time is spent in the (serial) builder instead of the (parallel) probe.
> We might need to fix things like Send() being called by RepartitionBuildInput, resulting in double counting.
> Second, we should revisit the assortment of timers - BuildRowsPartitionTime, HashTablesBuildTime, RepartitionTime. Maybe it makes sense to make them child counters of total time to make the relationship clearer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org