You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2022/07/02 03:28:00 UTC
[jira] [Commented] (IMPALA-11279) Optimize count(*) queries for Iceberg tables
[ https://issues.apache.org/jira/browse/IMPALA-11279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17561690#comment-17561690 ]
ASF subversion and git services commented on IMPALA-11279:
----------------------------------------------------------
Commit f38c53235f1797f91ff9a65bb734d3f38f1aadc9 in impala's branch refs/heads/dependabot/pip/infra/python/deps/urllib3-1.26.5 from LPL
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=f38c53235 ]
IMPALA-11279: Optimize plain count(*) queries for Iceberg tables
This commit optimizes the plain count(*) queries for the Iceberg tables.
When the `org.apache.iceberg.SnapshotSummary#TOTAL_RECORDS_PROP` can be
retrieved from the current `org.apache.iceberg.BaseSnapshot#summary` of
the Iceberg table, this kind of query can be very fast. If this property
is not retrieved, the query will aggregate the `num_rows` of parquet
`file_metadata_` as usual.
Queries that can be optimized need to meet the following requirements:
- SelectStmt does not have WHERE clause
- SelectStmt does not have GROUP BY clause
- SelectStmt does not have HAVING clause
- The TableRefs of FROM clause contains only one BaseTableRef
- Only for the Iceberg table
- SelectList must contain 'count(*)' or 'count(constant)'
- SelectList can contain other agg functions, e.g. min, sum, etc
- SelectList can contain constant
Testing:
- Added end-to-end test
- Existing tests
- Test it in a real cluster
Change-Id: I8e9c48bbba7ab2320fa80915e7001ce54f1ef6d9
Reviewed-on: http://gerrit.cloudera.org:8080/18574
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
> Optimize count(*) queries for Iceberg tables
> --------------------------------------------
>
> Key: IMPALA-11279
> URL: https://issues.apache.org/jira/browse/IMPALA-11279
> Project: IMPALA
> Issue Type: Improvement
> Components: Frontend
> Reporter: Zoltán Borók-Nagy
> Assignee: LiPenglin
> Priority: Major
> Labels: impala-iceberg
>
> Plain SELECT count(*) FROM tbl; queries could be made superfast for Iceberg tables as they store the precise number of rows in table property 'numRows'.
> So we could just answer such queries from table metadata.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org