You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2022/07/02 03:28:00 UTC

[jira] [Commented] (IMPALA-11279) Optimize count(*) queries for Iceberg tables

    [ https://issues.apache.org/jira/browse/IMPALA-11279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17561690#comment-17561690 ] 

ASF subversion and git services commented on IMPALA-11279:
----------------------------------------------------------

Commit f38c53235f1797f91ff9a65bb734d3f38f1aadc9 in impala's branch refs/heads/dependabot/pip/infra/python/deps/urllib3-1.26.5 from LPL
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=f38c53235 ]

IMPALA-11279: Optimize plain count(*) queries for Iceberg tables

This commit optimizes the plain count(*) queries for the Iceberg tables.
When the `org.apache.iceberg.SnapshotSummary#TOTAL_RECORDS_PROP` can be
retrieved from the current `org.apache.iceberg.BaseSnapshot#summary` of
the Iceberg table, this kind of query can be very fast. If this property
is not retrieved, the query will aggregate the `num_rows` of parquet
`file_metadata_` as usual.

Queries that can be optimized need to meet the following requirements:
 - SelectStmt does not have WHERE clause
 - SelectStmt does not have GROUP BY clause
 - SelectStmt does not have HAVING clause
 - The TableRefs of FROM clause contains only one BaseTableRef
 - Only for the Iceberg table
 - SelectList must contain 'count(*)' or 'count(constant)'
 - SelectList can contain other agg functions, e.g. min, sum, etc
 - SelectList can contain constant

Testing:
 - Added end-to-end test
 - Existing tests
 - Test it in a real cluster

Change-Id: I8e9c48bbba7ab2320fa80915e7001ce54f1ef6d9
Reviewed-on: http://gerrit.cloudera.org:8080/18574
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Optimize count(*) queries for Iceberg tables
> --------------------------------------------
>
>                 Key: IMPALA-11279
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11279
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>            Reporter: Zoltán Borók-Nagy
>            Assignee: LiPenglin
>            Priority: Major
>              Labels: impala-iceberg
>
> Plain SELECT count(*) FROM tbl; queries could be made superfast for Iceberg tables as they store the precise number of rows in table property 'numRows'.
> So we could just answer such queries from table metadata.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org