You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Amogh Margoor (Code Review)" <ge...@cloudera.org> on 2021/11/01 17:51:22 UTC

[Impala-ASF-CR] IMPALA-9873: Avoid materialization of columns for filtered out rows in Parquet table.

Amogh Margoor has posted comments on this change. ( http://gerrit.cloudera.org:8080/17860 )

Change subject: IMPALA-9873: Avoid materialization of columns for filtered out rows in Parquet table.
......................................................................


Patch Set 19:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17860/12//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17860/12//COMMIT_MSG@24
PS12, Line 24: TPCH scale 42
> I think it would be good to execute the whole benchmark with bin/single_nod
Hi Zoltan,
Sorry for the delay with benchmark. I ran the entire tpch bechmark at scale 42. This was the summary of report (Delta is the change).

Report Generated on 2021-10-28
Run Description: "78ce235db6d5b720f3e3319ff571a2da054a2602 vs c46d765dccd5739c848d8c1c82043e72394b8397"

Cluster Name: UNKNOWN
Lab Run Info: UNKNOWN
Impala Version:          impalad version 4.1.0-SNAPSHOT RELEASE (2021-10-28)
Baseline Impala Version: impalad version 4.1.0-SNAPSHOT RELEASE (2021-10-27)

+----------+-----------------------+---------+------------+------------+----------------+
| Workload | File Format           | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) |
+----------+-----------------------+---------+------------+------------+----------------+
| TPCH(42) | parquet / none / none | 12.83   | -1.54%     | 8.26       | -1.48%         |
+----------+-----------------------+---------+------------+------------+----------------+

Very slight improvement overall and major improvements in these 2 queries:

(I) Improvement: TPCH(42) TPCH-Q6 [parquet / none / none] (1.85s -> 1.72s [-7.30%])
+--------------+------------+-------+----------+------------+-----------+-------+----------+------------+--------+-------+-------+-----------+
| Operator     | % of Query | Avg   | Base Avg | Delta(Avg) | StdDev(%) | Max   | Base Max | Delta(Max) | #Hosts | #Inst | #Rows | Est #Rows |
+--------------+------------+-------+----------+------------+-----------+-------+----------+------------+--------+-------+-------+-----------+
| 00:SCAN HDFS | 94.83%     | 1.50s | 1.62s    | -7.75%     |   2.07%   | 1.56s | 1.73s    | -9.58%     | 1      | 1     | 4.79M | 29.96M    |
+--------------+------------+-------+----------+------------+-----------+-------+----------+------------+--------+-------+-------+-----------+

(I) Improvement: TPCH(42) TPCH-Q19 [parquet / none / none] (4.73s -> 4.18s [-11.72%])
+--------------+------------+----------+----------+------------+-----------+----------+----------+------------+--------+-------+--------+-----------+
| Operator     | % of Query | Avg      | Base Avg | Delta(Avg) | StdDev(%) | Max      | Base Max | Delta(Max) | #Hosts | #Inst | #Rows  | Est #Rows |
+--------------+------------+----------+----------+------------+-----------+----------+----------+------------+--------+-------+--------+-----------+
| 01:SCAN HDFS | 22.68%     | 729.91ms | 736.69ms | -0.92%     |   1.61%   | 751.55ms | 747.34ms | +0.56%     | 1      | 1     | 20.33K | 1.50M     |
| 00:SCAN HDFS | 74.84%     | 2.41s    | 2.97s    | -18.98%    |   0.67%   | 2.44s    | 3.00s    | -18.70%    | 1      | 1     | 13.07K | 29.96M    |
+--------------+------------+----------+----------+------------+-----------+----------+----------+------------+--------+-------+--------+-----------+

There was no regression reported as such just these 2 improvements and couple of queries with high variability in runtime (not related to our change).



-- 
To view, visit http://gerrit.cloudera.org:8080/17860
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I46406c913297d5bbbec3ccae62a83bb214ed2c60
Gerrit-Change-Number: 17860
Gerrit-PatchSet: 19
Gerrit-Owner: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kd...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 01 Nov 2021 17:51:22 +0000
Gerrit-HasComments: Yes