You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Anonymous Coward (Code Review)" <ge...@cloudera.org> on 2022/06/01 14:47:55 UTC

[Impala-ASF-CR] IMPALA-11279: Optimize plain count(*) queries for Iceberg tables

lipenglin@sensorsdata.cn has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/18574 )

Change subject: IMPALA-11279: Optimize plain count(*) queries for Iceberg tables
......................................................................

IMPALA-11279: Optimize plain count(*) queries for Iceberg tables

This commit optimizes the plain count(*) queries for the Iceberg tables.
When the `org.apache.iceberg.SnapshotSummary#TOTAL_RECORDS_PROP` can be
retrieved from the current `org.apache.iceberg.BaseSnapshot#summary` of
the Iceberg table, this kind of query can be very fast. If this property
is not retrieved, the query will aggregate the `num_rows` of parquet
`file_metadata_` as usual.

Queries that can be optimized need to meet the following requirements:
 - SelectStmt does not have WHERE clause
 - SelectStmt does not have GROUP BY clause
 - SelectStmt does not have HAVING clause
 - The TableRefs of FROM clause contains only one BaseTableRef
 - Only for the Iceberg table
 - SelectList contains only 'count(*)'

Testing:
 - Added end-to-end test
 - Existing tests

Change-Id: I8e9c48bbba7ab2320fa80915e7001ce54f1ef6d9
---
M be/src/service/client-request-state.cc
M be/src/service/frontend.cc
M be/src/service/frontend.h
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg-compound-predicate-push-down.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-in-predicate-push-down.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-is-null-predicate-push-down.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-partitioned-insert.test
A testdata/workloads/functional-query/queries/QueryTest/iceberg-plain-count-star-optimization.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-upper-lower-bound-metrics.test
M tests/query_test/test_iceberg.py
14 files changed, 330 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/74/18574/5
-- 
To view, visit http://gerrit.cloudera.org:8080/18574
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8e9c48bbba7ab2320fa80915e7001ce54f1ef6d9
Gerrit-Change-Number: 18574
Gerrit-PatchSet: 5
Gerrit-Owner: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jian Zhang <zj...@gmail.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Xianqing He <he...@126.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>