You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Qifan Chen (Jira)" <ji...@apache.org> on 2022/10/10 19:42:00 UTC

[jira] [Commented] (IMPALA-11647) Row size for source tables in a cross join query is set to 0 in query plan

    [ https://issues.apache.org/jira/browse/IMPALA-11647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17615283#comment-17615283 ] 

Qifan Chen commented on IMPALA-11647:
-------------------------------------

The output width from the scan being 0B instead of 8B is due to this line of code: https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/ScanNode.java#L160.
Once the restriction is relaxed, we can get a better plan, where the row size is 8B and the # of rows is the # of files in the table. 



> Row size for source tables in a cross join query is set to 0 in query plan
> --------------------------------------------------------------------------
>
>                 Key: IMPALA-11647
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11647
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>            Reporter: Qifan Chen
>            Priority: Major
>
> The row-size in the following explain output for both source tables is set to 0B.  On paper, it is possible to apply the count star optimization for such queries and therefore set the row-size correctly. 
> {code:java}
> explain select count(*) from store_sales a, store_sales b limit 500
> +--------------------------------------------------------------+
> | Explain String                                               |
> +--------------------------------------------------------------+
> | Max Per-Host Resource Reservation: Memory=256.00KB Threads=5 |
> | Per-Host Resource Estimates: Memory=10MB                     |
> |                                                              |
> | PLAN-ROOT SINK                                               |
> | |                                                            |
> | 06:AGGREGATE [FINALIZE]                                      |
> | |  output: count:merge(*)                                    |
> | |  limit: 500                                                |
> | |  row-size=8B cardinality=1                                 |
> | |                                                            |
> | 05:EXCHANGE [UNPARTITIONED]                                  |
> | |                                                            |
> | 03:AGGREGATE                                                 |
> | |  output: count(*)                                          |
> | |  row-size=8B cardinality=1                                 |
> | |                                                            |
> | 02:NESTED LOOP JOIN [CROSS JOIN, BROADCAST]                  |
> | |  row-size=0B cardinality=8.30T                             |
> | |                                                            |
> | |--04:EXCHANGE [BROADCAST]                                   |
> | |  |                                                         |
> | |  01:SCAN HDFS [tpcds_parquet.store_sales b]                |
> | |     HDFS partitions=1824/1824 files=1824 size=199.83MB     |
> | |     row-size=0B cardinality=2.88M                          |
> | |                                                            |
> | 00:SCAN HDFS [tpcds_parquet.store_sales a]                   |
> |    HDFS partitions=1824/1824 files=1824 size=199.83MB        |
> |    row-size=0B cardinality=2.88M                             |
> +--------------------------------------------------------------+
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org