You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Tim Armstrong (Code Review)" <ge...@cloudera.org> on 2020/05/05 05:09:47 UTC

[Impala-ASF-CR] IMPALA-9712: fix mem consumption of operators above selective scan

Tim Armstrong has uploaded this change for review. ( http://gerrit.cloudera.org:8080/15863


Change subject: IMPALA-9712: fix mem consumption of operators above selective scan
......................................................................

IMPALA-9712: fix mem consumption of operators above selective scan

This change is motivated by excessive memory consumption of
TPC-H Q19 which has a hash join and non-grouping aggregate
above a selective scan.

There are two related parts to this change.

First, we fix RowBatch::AtCapacity() to account for the actual
memory consumed by the RowBatch. It used total_allocated_bytes(),
which does *not* account for unused space in the MemPool chunks.
Instead it now uses total_reserved_bytes(), which includes the
whole chunks. This reduced memory consumption of the agg from
60+MB to ~16MB.

Second, we make PartitionedHashJoinNode flush memory a bit more
aggressively by exiting loops when a small amount of memory
is accumulated in an empty batch. This reduced memory consumption
of the agg further from ~16MB to ~8MB.

Testing:
Ran TPC-H Q19 on parquet with mt_dop=8.  Aggregation mem usage was
reduced from 60+MB to ~8MB.

Performance:
No significant change on TPC-H single node run.

+----------+-----------------------+---------+------------+------------+----------------+
| Workload | File Format           | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) |
+----------+-----------------------+---------+------------+------------+----------------+
| TPCH(30) | parquet / none / none | 6.15    | -0.39%     | 4.52       | -0.45%         |
+----------+-----------------------+---------+------------+------------+----------------+

+----------+----------+-----------------------+--------+-------------+------------+------------+----------------+-------+----------------+---------+-------+
| Workload | Query    | File Format           | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%)  | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | Tval  |
+----------+----------+-----------------------+--------+-------------+------------+------------+----------------+-------+----------------+---------+-------+
| TPCH(30) | TPCH-Q2  | parquet / none / none | 2.82   | 2.80        |   +0.79%   |   2.36%    |   2.50%        | 40    |   +1.59%       | 1.33    | 1.45  |
| TPCH(30) | TPCH-Q8  | parquet / none / none | 5.29   | 5.26        |   +0.49%   |   1.72%    |   1.73%        | 40    |   +0.78%       | 1.50    | 1.26  |
| TPCH(30) | TPCH-Q9  | parquet / none / none | 13.78  | 13.76       |   +0.18%   |   1.51%    |   1.64%        | 40    |   +0.32%       | 0.60    | 0.51  |
| TPCH(30) | TPCH-Q16 | parquet / none / none | 1.80   | 1.80        |   +0.31%   |   2.95%    |   2.24%        | 40    |   +0.09%       | 1.27    | 0.53  |
| TPCH(30) | TPCH-Q21 | parquet / none / none | 22.26  | 22.24       |   +0.07%   |   1.86%    |   1.83%        | 40    |   +0.17%       | 0.56    | 0.16  |
| TPCH(30) | TPCH-Q11 | parquet / none / none | 1.11   | 1.11        |   +0.13%   |   5.75%    |   3.68%        | 40    |   -0.13%       | -0.71   | 0.12  |
| TPCH(30) | TPCH-Q7  | parquet / none / none | 4.47   | 4.48        |   -0.15%   |   1.37%    |   1.86%        | 40    |   +0.01%       | 0.10    | -0.40 |
| TPCH(30) | TPCH-Q19 | parquet / none / none | 4.04   | 4.05        |   -0.22%   |   1.99%    |   2.13%        | 40    |   -0.03%       | -0.55   | -0.48 |
| TPCH(30) | TPCH-Q22 | parquet / none / none | 1.98   | 1.98        |   -0.25%   |   2.58%    |   3.10%        | 40    |   -0.04%       | -0.52   | -0.39 |
| TPCH(30) | TPCH-Q12 | parquet / none / none | 3.17   | 3.19        |   -0.42%   |   2.71%    |   1.73%        | 40    |   -0.11%       | -0.84   | -0.82 |
| TPCH(30) | TPCH-Q3  | parquet / none / none | 3.96   | 3.98        |   -0.47%   |   1.85%    |   1.52%        | 40    |   -0.17%       | -1.21   | -1.25 |
| TPCH(30) | TPCH-Q1  | parquet / none / none | 5.25   | 5.29        |   -0.81%   |   2.11%    |   6.02%        | 40    |   +0.08%       | 0.54    | -0.80 |
| TPCH(30) | TPCH-Q6  | parquet / none / none | 1.63   | 1.64        |   -0.69%   |   2.81%    |   2.72%        | 40    |   -0.07%       | -0.75   | -1.13 |
| TPCH(30) | TPCH-Q13 | parquet / none / none | 9.79   | 9.87        |   -0.79%   |   1.17%    |   0.94%        | 40    |   -0.61%       | -2.92   | -3.33 |
| TPCH(30) | TPCH-Q10 | parquet / none / none | 7.89   | 7.91        |   -0.24%   | * 13.08% * | * 11.07% *     | 40    |   -1.16%       | -1.34   | -0.09 |
| TPCH(30) | TPCH-Q18 | parquet / none / none | 14.07  | 13.79       |   +2.04%   | * 29.12% * | * 19.15% *     | 40    |   -3.46%       | -3.14   | 0.36  |
| TPCH(30) | TPCH-Q15 | parquet / none / none | 3.77   | 3.79        |   -0.66%   |   1.56%    |   1.48%        | 40    |   -0.82%       | -2.19   | -1.96 |
| TPCH(30) | TPCH-Q14 | parquet / none / none | 3.62   | 3.63        |   -0.27%   |   4.40%    |   2.64%        | 40    |   -1.23%       | -1.01   | -0.34 |
| TPCH(30) | TPCH-Q5  | parquet / none / none | 4.53   | 4.56        |   -0.81%   |   1.88%    |   1.33%        | 40    |   -1.06%       | -2.03   | -2.24 |
| TPCH(30) | TPCH-Q20 | parquet / none / none | 2.94   | 2.96        |   -0.87%   |   2.15%    |   2.04%        | 40    |   -1.52%       | -1.85   | -1.87 |
| TPCH(30) | TPCH-Q4  | parquet / none / none | 2.66   | 2.70        |   -1.63%   |   1.95%    |   2.37%        | 40    |   -1.79%       | -2.79   | -3.37 |
| TPCH(30) | TPCH-Q17 | parquet / none / none | 14.58  | 15.14       |   -3.72%   |   3.08%    |   2.98%        | 40    |   -3.44%       | -4.35   | -5.60 |
+----------+----------+-----------------------+--------+-------------+------------+------------+----------------+-------+----------------+---------+-------+

Change-Id: I6debae562826621411bbcbb757978e227b395441
---
M be/src/exec/partitioned-hash-join-node.cc
M be/src/runtime/row-batch.h
2 files changed, 14 insertions(+), 4 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/15863/1
-- 
To view, visit http://gerrit.cloudera.org:8080/15863
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I6debae562826621411bbcbb757978e227b395441
Gerrit-Change-Number: 15863
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-9712: fix mem consumption of operators above selective scan

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15863 )

Change subject: IMPALA-9712: fix mem consumption of operators above selective scan
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5957/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/15863
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6debae562826621411bbcbb757978e227b395441
Gerrit-Change-Number: 15863
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Tue, 05 May 2020 05:54:51 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9712: fix mem consumption of operators above selective scan

Posted by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/15863 )

Change subject: IMPALA-9712: fix mem consumption of operators above selective scan
......................................................................


Patch Set 3: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/15863
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6debae562826621411bbcbb757978e227b395441
Gerrit-Change-Number: 15863
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 05 May 2020 19:23:54 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9712: fix mem consumption of operators above selective scan

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has restored this change. ( http://gerrit.cloudera.org:8080/15863 )

Change subject: IMPALA-9712: fix mem consumption of operators above selective scan
......................................................................


Restored
-- 
To view, visit http://gerrit.cloudera.org:8080/15863
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: restore
Gerrit-Change-Id: I6debae562826621411bbcbb757978e227b395441
Gerrit-Change-Number: 15863
Gerrit-PatchSet: 5
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-9712: fix mem consumption of operators above selective scan

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15863 )

Change subject: IMPALA-9712: fix mem consumption of operators above selective scan
......................................................................


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5757/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/15863
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6debae562826621411bbcbb757978e227b395441
Gerrit-Change-Number: 15863
Gerrit-PatchSet: 4
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 05 May 2020 19:24:46 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9712: fix mem consumption of operators above selective scan

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Hello Thomas Tauber-Marshall, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/15863

to look at the new patch set (#6).

Change subject: IMPALA-9712: fix mem consumption of operators above selective scan
......................................................................

IMPALA-9712: fix mem consumption of operators above selective scan

This change is motivated by excessive memory consumption of
TPC-H Q19 which has a hash join and non-grouping aggregate
above a selective scan.

This change fixes RowBatch::AtCapacity() to account for the actual
memory consumed by the RowBatch. It used total_allocated_bytes(),
which does *not* account for unused space in the MemPool chunks.
Instead it now uses total_reserved_bytes(), which includes the
whole chunks. This reduced memory consumption of the agg from
60+MB to ~16MB.

Testing:
Ran TPC-H Q19 on parquet with mt_dop=8.  Aggregation mem usage was
reduced from 60+MB to ~16MB.

Added a targeted regression test that ran out of memory before this
fix.

Ran exhaustive tests.

Performance:
No significant change on TPC-H single node run with scale factor 30.

I also ran TPC-H nested scale factor 1 and there was no measureable
change, but generation of the report failed for some reason.

+----------+-----------------------+---------+------------+------------+----------------+
| Workload | File Format           | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) |
+----------+-----------------------+---------+------------+------------+----------------+
| TPCH(30) | parquet / none / none | 6.18    | -1.31%     | 4.54       | -1.03%         |
+----------+-----------------------+---------+------------+------------+----------------+

+----------+----------+-----------------------+--------+-------------+------------+------------+----------------+-------+----------------+---------+-------+
| Workload | Query    | File Format           | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%)  | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | Tval  |
+----------+----------+-----------------------+--------+-------------+------------+------------+----------------+-------+----------------+---------+-------+
| TPCH(30) | TPCH-Q10 | parquet / none / none | 8.08   | 7.98        |   +1.19%   | * 14.00% * | * 10.55% *     | 20    |   +0.63%       | 0.60    | 0.30  |
| TPCH(30) | TPCH-Q13 | parquet / none / none | 10.03  | 9.98        |   +0.44%   |   1.22%    |   0.92%        | 20    |   +0.48%       | 1.07    | 1.28  |
| TPCH(30) | TPCH-Q12 | parquet / none / none | 3.20   | 3.19        |   +0.34%   |   2.02%    |   2.68%        | 20    |   +0.08%       | 0.48    | 0.45  |
| TPCH(30) | TPCH-Q21 | parquet / none / none | 22.60  | 22.54       |   +0.24%   |   2.85%    |   2.80%        | 20    |   +0.17%       | 0.22    | 0.27  |
| TPCH(30) | TPCH-Q9  | parquet / none / none | 13.80  | 13.77       |   +0.17%   |   1.99%    |   1.70%        | 20    |   +0.06%       | 0.10    | 0.30  |
| TPCH(30) | TPCH-Q7  | parquet / none / none | 4.52   | 4.52        |   -0.01%   |   1.68%    |   1.71%        | 20    |   +0.03%       | 0.07    | -0.01 |
| TPCH(30) | TPCH-Q8  | parquet / none / none | 5.40   | 5.43        |   -0.52%   |   1.60%    |   1.92%        | 20    |   -0.23%       | -0.98   | -0.94 |
| TPCH(30) | TPCH-Q16 | parquet / none / none | 1.82   | 1.83        |   -0.63%   |   2.98%    |   2.62%        | 20    |   -0.17%       | -1.07   | -0.71 |
| TPCH(30) | TPCH-Q15 | parquet / none / none | 3.80   | 3.82        |   -0.46%   |   1.31%    |   1.21%        | 20    |   -0.41%       | -1.30   | -1.17 |
| TPCH(30) | TPCH-Q4  | parquet / none / none | 2.72   | 2.74        |   -0.68%   |   3.06%    |   2.36%        | 20    |   -0.23%       | -0.92   | -0.79 |
| TPCH(30) | TPCH-Q6  | parquet / none / none | 1.63   | 1.64        |   -0.98%   |   3.28%    |   2.66%        | 20    |   -0.23%       | -1.47   | -1.04 |
| TPCH(30) | TPCH-Q22 | parquet / none / none | 1.99   | 2.01        |   -0.98%   |   2.59%    |   2.99%        | 20    |   -0.68%       | -1.12   | -1.12 |
| TPCH(30) | TPCH-Q1  | parquet / none / none | 5.22   | 5.27        |   -0.96%   |   1.93%    |   2.25%        | 20    |   -0.93%       | -1.39   | -1.45 |
| TPCH(30) | TPCH-Q14 | parquet / none / none | 3.61   | 3.64        |   -0.73%   |   2.51%    |   2.40%        | 20    |   -1.26%       | -1.18   | -0.95 |
| TPCH(30) | TPCH-Q5  | parquet / none / none | 4.52   | 4.58        |   -1.23%   |   1.41%    |   1.39%        | 20    |   -1.15%       | -2.99   | -2.79 |
| TPCH(30) | TPCH-Q19 | parquet / none / none | 4.05   | 4.10        |   -1.19%   |   2.58%    |   2.48%        | 20    |   -1.27%       | -1.77   | -1.49 |
| TPCH(30) | TPCH-Q3  | parquet / none / none | 3.96   | 4.01        |   -1.32%   |   1.82%    |   1.82%        | 20    |   -1.31%       | -2.09   | -2.30 |
| TPCH(30) | TPCH-Q20 | parquet / none / none | 2.93   | 2.98        |   -1.47%   |   2.16%    |   2.60%        | 20    |   -1.63%       | -1.88   | -1.95 |
| TPCH(30) | TPCH-Q2  | parquet / none / none | 2.78   | 2.83        |   -1.70%   |   2.29%    |   2.19%        | 20    |   -1.77%       | -1.97   | -2.41 |
| TPCH(30) | TPCH-Q11 | parquet / none / none | 1.09   | 1.12        |   -2.64%   |   2.99%    |   3.81%        | 20    |   -3.71%       | -1.77   | -2.46 |
| TPCH(30) | TPCH-Q17 | parquet / none / none | 15.07  | 15.73       |   -4.20%   |   3.24%    |   2.26%        | 20    |   -5.00%       | -3.84   | -4.89 |
| TPCH(30) | TPCH-Q18 | parquet / none / none | 13.23  | 14.13       |   -6.38%   | * 11.87% * | * 26.04% *     | 20    |   -3.19%       | -2.41   | -1.01 |
+----------+----------+-----------------------+--------+-------------+------------+------------+----------------+-------+----------------+---------+-------+

Change-Id: I6debae562826621411bbcbb757978e227b395441
---
M be/src/runtime/row-batch.h
M tests/query_test/test_mem_usage_scaling.py
2 files changed, 28 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/15863/6
-- 
To view, visit http://gerrit.cloudera.org:8080/15863
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6debae562826621411bbcbb757978e227b395441
Gerrit-Change-Number: 15863
Gerrit-PatchSet: 6
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-9712: fix mem consumption of operators above selective scan

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Hello Thomas Tauber-Marshall, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/15863

to look at the new patch set (#2).

Change subject: IMPALA-9712: fix mem consumption of operators above selective scan
......................................................................

IMPALA-9712: fix mem consumption of operators above selective scan

This change is motivated by excessive memory consumption of
TPC-H Q19 which has a hash join and non-grouping aggregate
above a selective scan.

There are two related parts to this change.

First, we fix RowBatch::AtCapacity() to account for the actual
memory consumed by the RowBatch. It used total_allocated_bytes(),
which does *not* account for unused space in the MemPool chunks.
Instead it now uses total_reserved_bytes(), which includes the
whole chunks. This reduced memory consumption of the agg from
60+MB to ~16MB.

Second, we make PartitionedHashJoinNode flush memory a bit more
aggressively by exiting loops when a small amount of memory
is accumulated in an empty batch. This reduced memory consumption
of the agg further from ~16MB to ~8MB.

Testing:
Ran TPC-H Q19 on parquet with mt_dop=8.  Aggregation mem usage was
reduced from 60+MB to ~8MB.

Added a targeted regression test that ran out of memory before this
fix.

Performance:
No significant change on TPC-H single node run.

+----------+-----------------------+---------+------------+------------+----------------+
| Workload | File Format           | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) |
+----------+-----------------------+---------+------------+------------+----------------+
| TPCH(30) | parquet / none / none | 6.15    | -0.39%     | 4.52       | -0.45%         |
+----------+-----------------------+---------+------------+------------+----------------+

+----------+----------+-----------------------+--------+-------------+------------+------------+----------------+-------+----------------+---------+-------+
| Workload | Query    | File Format           | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%)  | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | Tval  |
+----------+----------+-----------------------+--------+-------------+------------+------------+----------------+-------+----------------+---------+-------+
| TPCH(30) | TPCH-Q2  | parquet / none / none | 2.82   | 2.80        |   +0.79%   |   2.36%    |   2.50%        | 40    |   +1.59%       | 1.33    | 1.45  |
| TPCH(30) | TPCH-Q8  | parquet / none / none | 5.29   | 5.26        |   +0.49%   |   1.72%    |   1.73%        | 40    |   +0.78%       | 1.50    | 1.26  |
| TPCH(30) | TPCH-Q9  | parquet / none / none | 13.78  | 13.76       |   +0.18%   |   1.51%    |   1.64%        | 40    |   +0.32%       | 0.60    | 0.51  |
| TPCH(30) | TPCH-Q16 | parquet / none / none | 1.80   | 1.80        |   +0.31%   |   2.95%    |   2.24%        | 40    |   +0.09%       | 1.27    | 0.53  |
| TPCH(30) | TPCH-Q21 | parquet / none / none | 22.26  | 22.24       |   +0.07%   |   1.86%    |   1.83%        | 40    |   +0.17%       | 0.56    | 0.16  |
| TPCH(30) | TPCH-Q11 | parquet / none / none | 1.11   | 1.11        |   +0.13%   |   5.75%    |   3.68%        | 40    |   -0.13%       | -0.71   | 0.12  |
| TPCH(30) | TPCH-Q7  | parquet / none / none | 4.47   | 4.48        |   -0.15%   |   1.37%    |   1.86%        | 40    |   +0.01%       | 0.10    | -0.40 |
| TPCH(30) | TPCH-Q19 | parquet / none / none | 4.04   | 4.05        |   -0.22%   |   1.99%    |   2.13%        | 40    |   -0.03%       | -0.55   | -0.48 |
| TPCH(30) | TPCH-Q22 | parquet / none / none | 1.98   | 1.98        |   -0.25%   |   2.58%    |   3.10%        | 40    |   -0.04%       | -0.52   | -0.39 |
| TPCH(30) | TPCH-Q12 | parquet / none / none | 3.17   | 3.19        |   -0.42%   |   2.71%    |   1.73%        | 40    |   -0.11%       | -0.84   | -0.82 |
| TPCH(30) | TPCH-Q3  | parquet / none / none | 3.96   | 3.98        |   -0.47%   |   1.85%    |   1.52%        | 40    |   -0.17%       | -1.21   | -1.25 |
| TPCH(30) | TPCH-Q1  | parquet / none / none | 5.25   | 5.29        |   -0.81%   |   2.11%    |   6.02%        | 40    |   +0.08%       | 0.54    | -0.80 |
| TPCH(30) | TPCH-Q6  | parquet / none / none | 1.63   | 1.64        |   -0.69%   |   2.81%    |   2.72%        | 40    |   -0.07%       | -0.75   | -1.13 |
| TPCH(30) | TPCH-Q13 | parquet / none / none | 9.79   | 9.87        |   -0.79%   |   1.17%    |   0.94%        | 40    |   -0.61%       | -2.92   | -3.33 |
| TPCH(30) | TPCH-Q10 | parquet / none / none | 7.89   | 7.91        |   -0.24%   | * 13.08% * | * 11.07% *     | 40    |   -1.16%       | -1.34   | -0.09 |
| TPCH(30) | TPCH-Q18 | parquet / none / none | 14.07  | 13.79       |   +2.04%   | * 29.12% * | * 19.15% *     | 40    |   -3.46%       | -3.14   | 0.36  |
| TPCH(30) | TPCH-Q15 | parquet / none / none | 3.77   | 3.79        |   -0.66%   |   1.56%    |   1.48%        | 40    |   -0.82%       | -2.19   | -1.96 |
| TPCH(30) | TPCH-Q14 | parquet / none / none | 3.62   | 3.63        |   -0.27%   |   4.40%    |   2.64%        | 40    |   -1.23%       | -1.01   | -0.34 |
| TPCH(30) | TPCH-Q5  | parquet / none / none | 4.53   | 4.56        |   -0.81%   |   1.88%    |   1.33%        | 40    |   -1.06%       | -2.03   | -2.24 |
| TPCH(30) | TPCH-Q20 | parquet / none / none | 2.94   | 2.96        |   -0.87%   |   2.15%    |   2.04%        | 40    |   -1.52%       | -1.85   | -1.87 |
| TPCH(30) | TPCH-Q4  | parquet / none / none | 2.66   | 2.70        |   -1.63%   |   1.95%    |   2.37%        | 40    |   -1.79%       | -2.79   | -3.37 |
| TPCH(30) | TPCH-Q17 | parquet / none / none | 14.58  | 15.14       |   -3.72%   |   3.08%    |   2.98%        | 40    |   -3.44%       | -4.35   | -5.60 |
+----------+----------+-----------------------+--------+-------------+------------+------------+----------------+-------+----------------+---------+-------+

Change-Id: I6debae562826621411bbcbb757978e227b395441
---
M be/src/exec/partitioned-hash-join-node.cc
M be/src/runtime/row-batch.h
M tests/query_test/test_mem_usage_scaling.py
3 files changed, 41 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/15863/2
-- 
To view, visit http://gerrit.cloudera.org:8080/15863
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6debae562826621411bbcbb757978e227b395441
Gerrit-Change-Number: 15863
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>

[Impala-ASF-CR] IMPALA-9712: fix mem consumption of operators above selective scan

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15863 )

Change subject: IMPALA-9712: fix mem consumption of operators above selective scan
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15863/2/be/src/runtime/row-batch.h
File be/src/runtime/row-batch.h:

http://gerrit.cloudera.org:8080/#/c/15863/2/be/src/runtime/row-batch.h@209
PS2, Line 209: HasAttachedData
> I find this a little misleading - just from the name it sounds like it woul
I think you're right to call it out, this was a pretty bad name. It's unfortunately kinda a vague concept - I changed to ShouldConsiderFlush().

I think the conceptual integrity here isn't great overall - there isn't necessarily a good reason to have two different thresholds... I added a TODO next to the constant definitions to apologise for this. I do think it's best not to bite off too much with this fix though.



-- 
To view, visit http://gerrit.cloudera.org:8080/15863
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6debae562826621411bbcbb757978e227b395441
Gerrit-Change-Number: 15863
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 05 May 2020 18:46:11 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9712: fix mem consumption of operators above selective scan

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15863 )

Change subject: IMPALA-9712: fix mem consumption of operators above selective scan
......................................................................


Patch Set 7:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5833/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/15863
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6debae562826621411bbcbb757978e227b395441
Gerrit-Change-Number: 15863
Gerrit-PatchSet: 7
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 14 May 2020 16:31:01 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9712: fix mem consumption of operators above selective scan

Posted by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/15863 )

Change subject: IMPALA-9712: fix mem consumption of operators above selective scan
......................................................................


Patch Set 6: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/15863
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6debae562826621411bbcbb757978e227b395441
Gerrit-Change-Number: 15863
Gerrit-PatchSet: 6
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 14 May 2020 15:55:00 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9712: fix mem consumption of operators above selective scan

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has abandoned this change. ( http://gerrit.cloudera.org:8080/15863 )

Change subject: IMPALA-9712: fix mem consumption of operators above selective scan
......................................................................


Abandoned
-- 
To view, visit http://gerrit.cloudera.org:8080/15863
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: I6debae562826621411bbcbb757978e227b395441
Gerrit-Change-Number: 15863
Gerrit-PatchSet: 5
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-9712: fix mem consumption of operators above selective scan

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has restored this change. ( http://gerrit.cloudera.org:8080/15863 )

Change subject: IMPALA-9712: fix mem consumption of operators above selective scan
......................................................................


Restored
-- 
To view, visit http://gerrit.cloudera.org:8080/15863
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: restore
Gerrit-Change-Id: I6debae562826621411bbcbb757978e227b395441
Gerrit-Change-Number: 15863
Gerrit-PatchSet: 4
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-9712: fix mem consumption of operators above selective scan

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Hello Thomas Tauber-Marshall, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/15863

to look at the new patch set (#3).

Change subject: IMPALA-9712: fix mem consumption of operators above selective scan
......................................................................

IMPALA-9712: fix mem consumption of operators above selective scan

This change is motivated by excessive memory consumption of
TPC-H Q19 which has a hash join and non-grouping aggregate
above a selective scan.

There are two related parts to this change.

First, we fix RowBatch::AtCapacity() to account for the actual
memory consumed by the RowBatch. It used total_allocated_bytes(),
which does *not* account for unused space in the MemPool chunks.
Instead it now uses total_reserved_bytes(), which includes the
whole chunks. This reduced memory consumption of the agg from
60+MB to ~16MB.

Second, we make PartitionedHashJoinNode flush memory a bit more
aggressively by exiting loops when a small amount of memory
is accumulated in an empty batch. This reduced memory consumption
of the agg further from ~16MB to ~8MB.

Testing:
Ran TPC-H Q19 on parquet with mt_dop=8.  Aggregation mem usage was
reduced from 60+MB to ~8MB.

Added a targeted regression test that ran out of memory before this
fix.

Performance:
No significant change on TPC-H single node run.

+----------+-----------------------+---------+------------+------------+----------------+
| Workload | File Format           | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) |
+----------+-----------------------+---------+------------+------------+----------------+
| TPCH(30) | parquet / none / none | 6.15    | -0.39%     | 4.52       | -0.45%         |
+----------+-----------------------+---------+------------+------------+----------------+

+----------+----------+-----------------------+--------+-------------+------------+------------+----------------+-------+----------------+---------+-------+
| Workload | Query    | File Format           | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%)  | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | Tval  |
+----------+----------+-----------------------+--------+-------------+------------+------------+----------------+-------+----------------+---------+-------+
| TPCH(30) | TPCH-Q2  | parquet / none / none | 2.82   | 2.80        |   +0.79%   |   2.36%    |   2.50%        | 40    |   +1.59%       | 1.33    | 1.45  |
| TPCH(30) | TPCH-Q8  | parquet / none / none | 5.29   | 5.26        |   +0.49%   |   1.72%    |   1.73%        | 40    |   +0.78%       | 1.50    | 1.26  |
| TPCH(30) | TPCH-Q9  | parquet / none / none | 13.78  | 13.76       |   +0.18%   |   1.51%    |   1.64%        | 40    |   +0.32%       | 0.60    | 0.51  |
| TPCH(30) | TPCH-Q16 | parquet / none / none | 1.80   | 1.80        |   +0.31%   |   2.95%    |   2.24%        | 40    |   +0.09%       | 1.27    | 0.53  |
| TPCH(30) | TPCH-Q21 | parquet / none / none | 22.26  | 22.24       |   +0.07%   |   1.86%    |   1.83%        | 40    |   +0.17%       | 0.56    | 0.16  |
| TPCH(30) | TPCH-Q11 | parquet / none / none | 1.11   | 1.11        |   +0.13%   |   5.75%    |   3.68%        | 40    |   -0.13%       | -0.71   | 0.12  |
| TPCH(30) | TPCH-Q7  | parquet / none / none | 4.47   | 4.48        |   -0.15%   |   1.37%    |   1.86%        | 40    |   +0.01%       | 0.10    | -0.40 |
| TPCH(30) | TPCH-Q19 | parquet / none / none | 4.04   | 4.05        |   -0.22%   |   1.99%    |   2.13%        | 40    |   -0.03%       | -0.55   | -0.48 |
| TPCH(30) | TPCH-Q22 | parquet / none / none | 1.98   | 1.98        |   -0.25%   |   2.58%    |   3.10%        | 40    |   -0.04%       | -0.52   | -0.39 |
| TPCH(30) | TPCH-Q12 | parquet / none / none | 3.17   | 3.19        |   -0.42%   |   2.71%    |   1.73%        | 40    |   -0.11%       | -0.84   | -0.82 |
| TPCH(30) | TPCH-Q3  | parquet / none / none | 3.96   | 3.98        |   -0.47%   |   1.85%    |   1.52%        | 40    |   -0.17%       | -1.21   | -1.25 |
| TPCH(30) | TPCH-Q1  | parquet / none / none | 5.25   | 5.29        |   -0.81%   |   2.11%    |   6.02%        | 40    |   +0.08%       | 0.54    | -0.80 |
| TPCH(30) | TPCH-Q6  | parquet / none / none | 1.63   | 1.64        |   -0.69%   |   2.81%    |   2.72%        | 40    |   -0.07%       | -0.75   | -1.13 |
| TPCH(30) | TPCH-Q13 | parquet / none / none | 9.79   | 9.87        |   -0.79%   |   1.17%    |   0.94%        | 40    |   -0.61%       | -2.92   | -3.33 |
| TPCH(30) | TPCH-Q10 | parquet / none / none | 7.89   | 7.91        |   -0.24%   | * 13.08% * | * 11.07% *     | 40    |   -1.16%       | -1.34   | -0.09 |
| TPCH(30) | TPCH-Q18 | parquet / none / none | 14.07  | 13.79       |   +2.04%   | * 29.12% * | * 19.15% *     | 40    |   -3.46%       | -3.14   | 0.36  |
| TPCH(30) | TPCH-Q15 | parquet / none / none | 3.77   | 3.79        |   -0.66%   |   1.56%    |   1.48%        | 40    |   -0.82%       | -2.19   | -1.96 |
| TPCH(30) | TPCH-Q14 | parquet / none / none | 3.62   | 3.63        |   -0.27%   |   4.40%    |   2.64%        | 40    |   -1.23%       | -1.01   | -0.34 |
| TPCH(30) | TPCH-Q5  | parquet / none / none | 4.53   | 4.56        |   -0.81%   |   1.88%    |   1.33%        | 40    |   -1.06%       | -2.03   | -2.24 |
| TPCH(30) | TPCH-Q20 | parquet / none / none | 2.94   | 2.96        |   -0.87%   |   2.15%    |   2.04%        | 40    |   -1.52%       | -1.85   | -1.87 |
| TPCH(30) | TPCH-Q4  | parquet / none / none | 2.66   | 2.70        |   -1.63%   |   1.95%    |   2.37%        | 40    |   -1.79%       | -2.79   | -3.37 |
| TPCH(30) | TPCH-Q17 | parquet / none / none | 14.58  | 15.14       |   -3.72%   |   3.08%    |   2.98%        | 40    |   -3.44%       | -4.35   | -5.60 |
+----------+----------+-----------------------+--------+-------------+------------+------------+----------------+-------+----------------+---------+-------+

Change-Id: I6debae562826621411bbcbb757978e227b395441
---
M be/src/exec/partitioned-hash-join-node.cc
M be/src/runtime/row-batch.h
M tests/query_test/test_mem_usage_scaling.py
3 files changed, 45 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/15863/3
-- 
To view, visit http://gerrit.cloudera.org:8080/15863
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6debae562826621411bbcbb757978e227b395441
Gerrit-Change-Number: 15863
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>

[Impala-ASF-CR] IMPALA-9712: fix mem consumption of operators above selective scan

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15863 )

Change subject: IMPALA-9712: fix mem consumption of operators above selective scan
......................................................................


Patch Set 7: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/15863
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6debae562826621411bbcbb757978e227b395441
Gerrit-Change-Number: 15863
Gerrit-PatchSet: 7
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 14 May 2020 21:54:32 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9712: fix mem consumption of operators above selective scan

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/15863 )

Change subject: IMPALA-9712: fix mem consumption of operators above selective scan
......................................................................


Patch Set 6:

It turned out that the second part of the change was somewhat problematic, so I cut this back down to the minimal fix, which means that memory consumption isn't reduced quite as much, but it fixes the worst of the problem.

The issue was that at least one operator - SubplanNode - can get stuck in a loop if its child repeatedly returns empty batches that are not AtCapacity() without the child making progress. And, after my change, the PartitionedHashJoinNode will not make progress if a batch is passed into GetNext() where ConsiderFlush() returns true.

https://github.com/apache/impala/blob/master/be/src/exec/subplan-node.cc#L136

I'm not sure how many other operators might implicitly assume something like this, so I decided not to deal with that in this change.


-- 
To view, visit http://gerrit.cloudera.org:8080/15863
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6debae562826621411bbcbb757978e227b395441
Gerrit-Change-Number: 15863
Gerrit-PatchSet: 6
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 13 May 2020 22:33:44 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9712: fix mem consumption of operators above selective scan

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15863 )

Change subject: IMPALA-9712: fix mem consumption of operators above selective scan
......................................................................


Patch Set 7: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/15863
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6debae562826621411bbcbb757978e227b395441
Gerrit-Change-Number: 15863
Gerrit-PatchSet: 7
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 14 May 2020 16:31:00 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9712: fix mem consumption of operators above selective scan

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15863 )

Change subject: IMPALA-9712: fix mem consumption of operators above selective scan
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5958/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/15863
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6debae562826621411bbcbb757978e227b395441
Gerrit-Change-Number: 15863
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Tue, 05 May 2020 06:12:32 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9712: fix mem consumption of operators above selective scan

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15863 )

Change subject: IMPALA-9712: fix mem consumption of operators above selective scan
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15863/2/tests/query_test/test_mem_usage_scaling.py
File tests/query_test/test_mem_usage_scaling.py:

http://gerrit.cloudera.org:8080/#/c/15863/2/tests/query_test/test_mem_usage_scaling.py@371
PS2, Line 371: @SkipIfNotHdfsMinicluster.tuned_for_minicluster
flake8: F811 redefinition of unused 'TestScanMemLimit' from line 308



-- 
To view, visit http://gerrit.cloudera.org:8080/15863
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6debae562826621411bbcbb757978e227b395441
Gerrit-Change-Number: 15863
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Tue, 05 May 2020 05:54:36 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9712: fix mem consumption of operators above selective scan

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has abandoned this change. ( http://gerrit.cloudera.org:8080/15863 )

Change subject: IMPALA-9712: fix mem consumption of operators above selective scan
......................................................................


Abandoned

Looks like there's something else going on weird with these nested types test. Need to investigate further.
-- 
To view, visit http://gerrit.cloudera.org:8080/15863
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: I6debae562826621411bbcbb757978e227b395441
Gerrit-Change-Number: 15863
Gerrit-PatchSet: 4
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-9712: fix mem consumption of operators above selective scan

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15863 )

Change subject: IMPALA-9712: fix mem consumption of operators above selective scan
......................................................................


Patch Set 4: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/15863
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6debae562826621411bbcbb757978e227b395441
Gerrit-Change-Number: 15863
Gerrit-PatchSet: 4
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 05 May 2020 19:24:45 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9712: fix mem consumption of operators above selective scan

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/15863 )

Change subject: IMPALA-9712: fix mem consumption of operators above selective scan
......................................................................

IMPALA-9712: fix mem consumption of operators above selective scan

This change is motivated by excessive memory consumption of
TPC-H Q19 which has a hash join and non-grouping aggregate
above a selective scan.

This change fixes RowBatch::AtCapacity() to account for the actual
memory consumed by the RowBatch. It used total_allocated_bytes(),
which does *not* account for unused space in the MemPool chunks.
Instead it now uses total_reserved_bytes(), which includes the
whole chunks. This reduced memory consumption of the agg from
60+MB to ~16MB.

Testing:
Ran TPC-H Q19 on parquet with mt_dop=8.  Aggregation mem usage was
reduced from 60+MB to ~16MB.

Added a targeted regression test that ran out of memory before this
fix.

Ran exhaustive tests.

Performance:
No significant change on TPC-H single node run with scale factor 30.

I also ran TPC-H nested scale factor 1 and there was no measureable
change, but generation of the report failed for some reason.

+----------+-----------------------+---------+------------+------------+----------------+
| Workload | File Format           | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) |
+----------+-----------------------+---------+------------+------------+----------------+
| TPCH(30) | parquet / none / none | 6.18    | -1.31%     | 4.54       | -1.03%         |
+----------+-----------------------+---------+------------+------------+----------------+

+----------+----------+-----------------------+--------+-------------+------------+------------+----------------+-------+----------------+---------+-------+
| Workload | Query    | File Format           | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%)  | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | Tval  |
+----------+----------+-----------------------+--------+-------------+------------+------------+----------------+-------+----------------+---------+-------+
| TPCH(30) | TPCH-Q10 | parquet / none / none | 8.08   | 7.98        |   +1.19%   | * 14.00% * | * 10.55% *     | 20    |   +0.63%       | 0.60    | 0.30  |
| TPCH(30) | TPCH-Q13 | parquet / none / none | 10.03  | 9.98        |   +0.44%   |   1.22%    |   0.92%        | 20    |   +0.48%       | 1.07    | 1.28  |
| TPCH(30) | TPCH-Q12 | parquet / none / none | 3.20   | 3.19        |   +0.34%   |   2.02%    |   2.68%        | 20    |   +0.08%       | 0.48    | 0.45  |
| TPCH(30) | TPCH-Q21 | parquet / none / none | 22.60  | 22.54       |   +0.24%   |   2.85%    |   2.80%        | 20    |   +0.17%       | 0.22    | 0.27  |
| TPCH(30) | TPCH-Q9  | parquet / none / none | 13.80  | 13.77       |   +0.17%   |   1.99%    |   1.70%        | 20    |   +0.06%       | 0.10    | 0.30  |
| TPCH(30) | TPCH-Q7  | parquet / none / none | 4.52   | 4.52        |   -0.01%   |   1.68%    |   1.71%        | 20    |   +0.03%       | 0.07    | -0.01 |
| TPCH(30) | TPCH-Q8  | parquet / none / none | 5.40   | 5.43        |   -0.52%   |   1.60%    |   1.92%        | 20    |   -0.23%       | -0.98   | -0.94 |
| TPCH(30) | TPCH-Q16 | parquet / none / none | 1.82   | 1.83        |   -0.63%   |   2.98%    |   2.62%        | 20    |   -0.17%       | -1.07   | -0.71 |
| TPCH(30) | TPCH-Q15 | parquet / none / none | 3.80   | 3.82        |   -0.46%   |   1.31%    |   1.21%        | 20    |   -0.41%       | -1.30   | -1.17 |
| TPCH(30) | TPCH-Q4  | parquet / none / none | 2.72   | 2.74        |   -0.68%   |   3.06%    |   2.36%        | 20    |   -0.23%       | -0.92   | -0.79 |
| TPCH(30) | TPCH-Q6  | parquet / none / none | 1.63   | 1.64        |   -0.98%   |   3.28%    |   2.66%        | 20    |   -0.23%       | -1.47   | -1.04 |
| TPCH(30) | TPCH-Q22 | parquet / none / none | 1.99   | 2.01        |   -0.98%   |   2.59%    |   2.99%        | 20    |   -0.68%       | -1.12   | -1.12 |
| TPCH(30) | TPCH-Q1  | parquet / none / none | 5.22   | 5.27        |   -0.96%   |   1.93%    |   2.25%        | 20    |   -0.93%       | -1.39   | -1.45 |
| TPCH(30) | TPCH-Q14 | parquet / none / none | 3.61   | 3.64        |   -0.73%   |   2.51%    |   2.40%        | 20    |   -1.26%       | -1.18   | -0.95 |
| TPCH(30) | TPCH-Q5  | parquet / none / none | 4.52   | 4.58        |   -1.23%   |   1.41%    |   1.39%        | 20    |   -1.15%       | -2.99   | -2.79 |
| TPCH(30) | TPCH-Q19 | parquet / none / none | 4.05   | 4.10        |   -1.19%   |   2.58%    |   2.48%        | 20    |   -1.27%       | -1.77   | -1.49 |
| TPCH(30) | TPCH-Q3  | parquet / none / none | 3.96   | 4.01        |   -1.32%   |   1.82%    |   1.82%        | 20    |   -1.31%       | -2.09   | -2.30 |
| TPCH(30) | TPCH-Q20 | parquet / none / none | 2.93   | 2.98        |   -1.47%   |   2.16%    |   2.60%        | 20    |   -1.63%       | -1.88   | -1.95 |
| TPCH(30) | TPCH-Q2  | parquet / none / none | 2.78   | 2.83        |   -1.70%   |   2.29%    |   2.19%        | 20    |   -1.77%       | -1.97   | -2.41 |
| TPCH(30) | TPCH-Q11 | parquet / none / none | 1.09   | 1.12        |   -2.64%   |   2.99%    |   3.81%        | 20    |   -3.71%       | -1.77   | -2.46 |
| TPCH(30) | TPCH-Q17 | parquet / none / none | 15.07  | 15.73       |   -4.20%   |   3.24%    |   2.26%        | 20    |   -5.00%       | -3.84   | -4.89 |
| TPCH(30) | TPCH-Q18 | parquet / none / none | 13.23  | 14.13       |   -6.38%   | * 11.87% * | * 26.04% *     | 20    |   -3.19%       | -2.41   | -1.01 |
+----------+----------+-----------------------+--------+-------------+------------+------------+----------------+-------+----------------+---------+-------+

Change-Id: I6debae562826621411bbcbb757978e227b395441
Reviewed-on: http://gerrit.cloudera.org:8080/15863
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/runtime/row-batch.h
M tests/query_test/test_mem_usage_scaling.py
2 files changed, 28 insertions(+), 1 deletion(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/15863
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I6debae562826621411bbcbb757978e227b395441
Gerrit-Change-Number: 15863
Gerrit-PatchSet: 8
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-9712: fix mem consumption of operators above selective scan

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15863 )

Change subject: IMPALA-9712: fix mem consumption of operators above selective scan
......................................................................


Patch Set 6:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/6056/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/15863
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6debae562826621411bbcbb757978e227b395441
Gerrit-Change-Number: 15863
Gerrit-PatchSet: 6
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 13 May 2020 23:18:46 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9712: fix mem consumption of operators above selective scan

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15863 )

Change subject: IMPALA-9712: fix mem consumption of operators above selective scan
......................................................................


Patch Set 3:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/5964/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/15863
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6debae562826621411bbcbb757978e227b395441
Gerrit-Change-Number: 15863
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 05 May 2020 19:33:52 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9712: fix mem consumption of operators above selective scan

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has abandoned this change. ( http://gerrit.cloudera.org:8080/15863 )

Change subject: IMPALA-9712: fix mem consumption of operators above selective scan
......................................................................


Abandoned

I need to add a test for this actually..
-- 
To view, visit http://gerrit.cloudera.org:8080/15863
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: I6debae562826621411bbcbb757978e227b395441
Gerrit-Change-Number: 15863
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>

[Impala-ASF-CR] IMPALA-9712: fix mem consumption of operators above selective scan

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15863 )

Change subject: IMPALA-9712: fix mem consumption of operators above selective scan
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15863/3/tests/query_test/test_mem_usage_scaling.py
File tests/query_test/test_mem_usage_scaling.py:

http://gerrit.cloudera.org:8080/#/c/15863/3/tests/query_test/test_mem_usage_scaling.py@371
PS3, Line 371: @SkipIfNotHdfsMinicluster.tuned_for_minicluster
flake8: F811 redefinition of unused 'TestScanMemLimit' from line 308



-- 
To view, visit http://gerrit.cloudera.org:8080/15863
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6debae562826621411bbcbb757978e227b395441
Gerrit-Change-Number: 15863
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Tue, 05 May 2020 18:46:06 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9712: fix mem consumption of operators above selective scan

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15863 )

Change subject: IMPALA-9712: fix mem consumption of operators above selective scan
......................................................................


Patch Set 4: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/5757/


-- 
To view, visit http://gerrit.cloudera.org:8080/15863
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6debae562826621411bbcbb757978e227b395441
Gerrit-Change-Number: 15863
Gerrit-PatchSet: 4
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 06 May 2020 02:27:54 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9712: fix mem consumption of operators above selective scan

Posted by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/15863 )

Change subject: IMPALA-9712: fix mem consumption of operators above selective scan
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15863/2/be/src/runtime/row-batch.h
File be/src/runtime/row-batch.h:

http://gerrit.cloudera.org:8080/#/c/15863/2/be/src/runtime/row-batch.h@209
PS2, Line 209: HasAttachedData
I find this a little misleading - just from the name it sounds like it would only be based on 'attached_buffer_bytes_', and its surprising there's a threshold involved, though maybe I'm misunderstanding how the term "AttachedData" is being used here.

I'm not sure what would be better, maybe ShouldFlush()? ShouldConsiderFlush()? HasEnoughAttachedData()? AtFlushTreshold()?

Not a big deal if you prefer to leave as is.



-- 
To view, visit http://gerrit.cloudera.org:8080/15863
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6debae562826621411bbcbb757978e227b395441
Gerrit-Change-Number: 15863
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Tue, 05 May 2020 18:11:27 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9712: fix mem consumption of operators above selective scan

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has restored this change. ( http://gerrit.cloudera.org:8080/15863 )

Change subject: IMPALA-9712: fix mem consumption of operators above selective scan
......................................................................


Restored
-- 
To view, visit http://gerrit.cloudera.org:8080/15863
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: restore
Gerrit-Change-Id: I6debae562826621411bbcbb757978e227b395441
Gerrit-Change-Number: 15863
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>

[Impala-ASF-CR] IMPALA-9712: fix mem consumption of operators above selective scan

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15863 )

Change subject: IMPALA-9712: fix mem consumption of operators above selective scan
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15863/5/tests/query_test/test_mem_usage_scaling.py
File tests/query_test/test_mem_usage_scaling.py:

http://gerrit.cloudera.org:8080/#/c/15863/5/tests/query_test/test_mem_usage_scaling.py@371
PS5, Line 371: @SkipIfNotHdfsMinicluster.tuned_for_minicluster
flake8: F811 redefinition of unused 'TestScanMemLimit' from line 308



-- 
To view, visit http://gerrit.cloudera.org:8080/15863
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6debae562826621411bbcbb757978e227b395441
Gerrit-Change-Number: 15863
Gerrit-PatchSet: 5
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 13 May 2020 16:43:40 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9712: fix mem consumption of operators above selective scan

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15863 )

Change subject: IMPALA-9712: fix mem consumption of operators above selective scan
......................................................................


Patch Set 5:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/6053/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/15863
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6debae562826621411bbcbb757978e227b395441
Gerrit-Change-Number: 15863
Gerrit-PatchSet: 5
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 13 May 2020 17:36:10 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9712: fix mem consumption of operators above selective scan

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Hello Thomas Tauber-Marshall, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/15863

to look at the new patch set (#5).

Change subject: IMPALA-9712: fix mem consumption of operators above selective scan
......................................................................

IMPALA-9712: fix mem consumption of operators above selective scan

This change is motivated by excessive memory consumption of
TPC-H Q19 which has a hash join and non-grouping aggregate
above a selective scan.

This change fixes RowBatch::AtCapacity() to account for the actual
memory consumed by the RowBatch. It used total_allocated_bytes(),
which does *not* account for unused space in the MemPool chunks.
Instead it now uses total_reserved_bytes(), which includes the
whole chunks. This reduced memory consumption of the agg from
60+MB to ~16MB.

Testing:
Ran TPC-H Q19 on parquet with mt_dop=8.  Aggregation mem usage was
reduced from 60+MB to ~16MB.

Added a targeted regression test that ran out of memory before this
fix.

Performance:
No significant change on TPC-H single node run.

TODO: refresh, add nested

+----------+-----------------------+---------+------------+------------+----------------+
| Workload | File Format           | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) |
+----------+-----------------------+---------+------------+------------+----------------+
| TPCH(30) | parquet / none / none | 6.15    | -0.39%     | 4.52       | -0.45%         |
+----------+-----------------------+---------+------------+------------+----------------+

+----------+----------+-----------------------+--------+-------------+------------+------------+----------------+-------+----------------+---------+-------+
| Workload | Query    | File Format           | Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%)  | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | Tval  |
+----------+----------+-----------------------+--------+-------------+------------+------------+----------------+-------+----------------+---------+-------+
| TPCH(30) | TPCH-Q2  | parquet / none / none | 2.82   | 2.80        |   +0.79%   |   2.36%    |   2.50%        | 40    |   +1.59%       | 1.33    | 1.45  |
| TPCH(30) | TPCH-Q8  | parquet / none / none | 5.29   | 5.26        |   +0.49%   |   1.72%    |   1.73%        | 40    |   +0.78%       | 1.50    | 1.26  |
| TPCH(30) | TPCH-Q9  | parquet / none / none | 13.78  | 13.76       |   +0.18%   |   1.51%    |   1.64%        | 40    |   +0.32%       | 0.60    | 0.51  |
| TPCH(30) | TPCH-Q16 | parquet / none / none | 1.80   | 1.80        |   +0.31%   |   2.95%    |   2.24%        | 40    |   +0.09%       | 1.27    | 0.53  |
| TPCH(30) | TPCH-Q21 | parquet / none / none | 22.26  | 22.24       |   +0.07%   |   1.86%    |   1.83%        | 40    |   +0.17%       | 0.56    | 0.16  |
| TPCH(30) | TPCH-Q11 | parquet / none / none | 1.11   | 1.11        |   +0.13%   |   5.75%    |   3.68%        | 40    |   -0.13%       | -0.71   | 0.12  |
| TPCH(30) | TPCH-Q7  | parquet / none / none | 4.47   | 4.48        |   -0.15%   |   1.37%    |   1.86%        | 40    |   +0.01%       | 0.10    | -0.40 |
| TPCH(30) | TPCH-Q19 | parquet / none / none | 4.04   | 4.05        |   -0.22%   |   1.99%    |   2.13%        | 40    |   -0.03%       | -0.55   | -0.48 |
| TPCH(30) | TPCH-Q22 | parquet / none / none | 1.98   | 1.98        |   -0.25%   |   2.58%    |   3.10%        | 40    |   -0.04%       | -0.52   | -0.39 |
| TPCH(30) | TPCH-Q12 | parquet / none / none | 3.17   | 3.19        |   -0.42%   |   2.71%    |   1.73%        | 40    |   -0.11%       | -0.84   | -0.82 |
| TPCH(30) | TPCH-Q3  | parquet / none / none | 3.96   | 3.98        |   -0.47%   |   1.85%    |   1.52%        | 40    |   -0.17%       | -1.21   | -1.25 |
| TPCH(30) | TPCH-Q1  | parquet / none / none | 5.25   | 5.29        |   -0.81%   |   2.11%    |   6.02%        | 40    |   +0.08%       | 0.54    | -0.80 |
| TPCH(30) | TPCH-Q6  | parquet / none / none | 1.63   | 1.64        |   -0.69%   |   2.81%    |   2.72%        | 40    |   -0.07%       | -0.75   | -1.13 |
| TPCH(30) | TPCH-Q13 | parquet / none / none | 9.79   | 9.87        |   -0.79%   |   1.17%    |   0.94%        | 40    |   -0.61%       | -2.92   | -3.33 |
| TPCH(30) | TPCH-Q10 | parquet / none / none | 7.89   | 7.91        |   -0.24%   | * 13.08% * | * 11.07% *     | 40    |   -1.16%       | -1.34   | -0.09 |
| TPCH(30) | TPCH-Q18 | parquet / none / none | 14.07  | 13.79       |   +2.04%   | * 29.12% * | * 19.15% *     | 40    |   -3.46%       | -3.14   | 0.36  |
| TPCH(30) | TPCH-Q15 | parquet / none / none | 3.77   | 3.79        |   -0.66%   |   1.56%    |   1.48%        | 40    |   -0.82%       | -2.19   | -1.96 |
| TPCH(30) | TPCH-Q14 | parquet / none / none | 3.62   | 3.63        |   -0.27%   |   4.40%    |   2.64%        | 40    |   -1.23%       | -1.01   | -0.34 |
| TPCH(30) | TPCH-Q5  | parquet / none / none | 4.53   | 4.56        |   -0.81%   |   1.88%    |   1.33%        | 40    |   -1.06%       | -2.03   | -2.24 |
| TPCH(30) | TPCH-Q20 | parquet / none / none | 2.94   | 2.96        |   -0.87%   |   2.15%    |   2.04%        | 40    |   -1.52%       | -1.85   | -1.87 |
| TPCH(30) | TPCH-Q4  | parquet / none / none | 2.66   | 2.70        |   -1.63%   |   1.95%    |   2.37%        | 40    |   -1.79%       | -2.79   | -3.37 |
| TPCH(30) | TPCH-Q17 | parquet / none / none | 14.58  | 15.14       |   -3.72%   |   3.08%    |   2.98%        | 40    |   -3.44%       | -4.35   | -5.60 |
+----------+----------+-----------------------+--------+-------------+------------+------------+----------------+-------+----------------+---------+-------+

Change-Id: I6debae562826621411bbcbb757978e227b395441
---
M be/src/runtime/row-batch.h
M tests/query_test/test_mem_usage_scaling.py
2 files changed, 28 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/15863/5
-- 
To view, visit http://gerrit.cloudera.org:8080/15863
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6debae562826621411bbcbb757978e227b395441
Gerrit-Change-Number: 15863
Gerrit-PatchSet: 5
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>