You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Sahil Takiar (Jira)" <ji...@apache.org> on 2020/08/13 17:21:00 UTC

[jira] [Created] (IMPALA-10084) Display the number of estimated rows for a table

Sahil Takiar created IMPALA-10084:
-------------------------------------

             Summary: Display the number of estimated rows for a table
                 Key: IMPALA-10084
                 URL: https://issues.apache.org/jira/browse/IMPALA-10084
             Project: IMPALA
          Issue Type: Sub-task
            Reporter: Sahil Takiar


AFAICT, there is no way to determine the number of rows estimated for a table when row counts have been estimated via file size:
{code:java}
[localhost:21000] default> create table test (col int);
[localhost:21000] default> insert into table test values (1), (2), (3), (4), (5);
[localhost:21000] default> show table stats test;
+-------+--------+------+--------------+-------------------+--------+-------------------+--------------------------------------------+
| #Rows | #Files | Size | Bytes Cached | Cache Replication | Format | Incremental stats | Location                                   |
+-------+--------+------+--------------+-------------------+--------+-------------------+--------------------------------------------+
| -1    | 1      | 10B  | NOT CACHED   | NOT CACHED        | TEXT   | false             | hdfs://localhost:20500/test-warehouse/test |
+-------+--------+------+--------------+-------------------+--------+-------------------+--------------------------------------------+
[localhost:21000] default> explain select * from test order by col limit 10;
+------------------------------------------------------------------------------------+
| Explain String                                                                     |
+------------------------------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=8.00KB Threads=3                         |
| Per-Host Resource Estimates: Memory=32MB                                           |
| WARNING: The following tables are missing relevant table and/or column statistics. |
| default.test                                                                       |
|                                                                                    |
| PLAN-ROOT SINK                                                                     |
| |                                                                                  |
| 02:MERGING-EXCHANGE [UNPARTITIONED]                                                |
| |  order by: col ASC                                                               |
| |  limit: 10                                                                       |
| |                                                                                  |
| 01:TOP-N [LIMIT=10]                                                                |
| |  order by: col ASC                                                               |
| |  row-size=4B cardinality=3                                                       |
| |                                                                                  |
| 00:SCAN HDFS [default.test]                                                        |
|    HDFS partitions=1/1 files=1 size=10B                                            |
|    row-size=4B cardinality=3                                                       |
+------------------------------------------------------------------------------------+
[localhost:21000] default> set explain_level=3;
localhost:21000] default> explain select * from test order by col limit 10;
+--------------------------------------------------------------------------------------+
| Explain String                                                                       |
+--------------------------------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=8.00KB Threads=3                           |
| Per-Host Resource Estimates: Memory=32MB                                             |
| WARNING: The following tables are missing relevant table and/or column statistics.   |
| default.test                                                                         |
| Analyzed query: SELECT * FROM `default`.test ORDER BY col ASC LIMIT CAST(10 AS       |
| TINYINT)                                                                             |
|                                                                                      |
| F01:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1                                |
| Per-Host Resources: mem-estimate=16.00KB mem-reservation=0B thread-reservation=1     |
|   PLAN-ROOT SINK                                                                     |
|   |  output exprs: col                                                               |
|   |  mem-estimate=0B mem-reservation=0B thread-reservation=0                         |
|   |                                                                                  |
|   02:MERGING-EXCHANGE [UNPARTITIONED]                                                |
|      order by: col ASC                                                               |
|      limit: 10                                                                       |
|      mem-estimate=16.00KB mem-reservation=0B thread-reservation=0                    |
|      tuple-ids=1 row-size=4B cardinality=3                                           |
|      in pipelines: 01(GETNEXT)                                                       |
|                                                                                      |
| F00:PLAN FRAGMENT [RANDOM] hosts=1 instances=1                                       |
| Per-Host Resources: mem-estimate=32.00MB mem-reservation=8.00KB thread-reservation=2 |
|   DATASTREAM SINK [FRAGMENT=F01, EXCHANGE=02, UNPARTITIONED]                         |
|   |  mem-estimate=0B mem-reservation=0B thread-reservation=0                         |
|   01:TOP-N [LIMIT=10]                                                                |
|   |  order by: col ASC                                                               |
|   |  mem-estimate=12B mem-reservation=0B thread-reservation=0                        |
|   |  tuple-ids=1 row-size=4B cardinality=3                                           |
|   |  in pipelines: 01(GETNEXT), 00(OPEN)                                             |
|   |                                                                                  |
|   00:SCAN HDFS [default.test, RANDOM]                                                |
|      HDFS partitions=1/1 files=1 size=10B                                            |
|      stored statistics:                                                              |
|        table: rows=unavailable size=unavailable                                      |
|        columns: unavailable                                                          |
|      extrapolated-rows=disabled max-scan-range-rows=unavailable                      |
|      mem-estimate=32.00MB mem-reservation=8.00KB thread-reservation=1                |
|      tuple-ids=0 row-size=4B cardinality=3                                           |
|      in pipelines: 00(GETNEXT)                                                       |
+--------------------------------------------------------------------------------------+
[localhost:21000] default> create table part_test (col int) partitioned by (partcol int);
[localhost:21000] default> alter table part_test add partition (partcol=1);
[localhost:21000] default> insert into table part_test partition (partcol=1) values (1), (2), (3), (4), (5);
[localhost:21000] default> show table stats part_test;
+---------+-------+--------+------+--------------+-------------------+--------+-------------------+-----------------------------------------------------------+
| partcol | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format | Incremental stats | Location                                                  |
+---------+-------+--------+------+--------------+-------------------+--------+-------------------+-----------------------------------------------------------+
| 1       | -1    | 1      | 10B  | NOT CACHED   | NOT CACHED        | TEXT   | false             | hdfs://localhost:20500/test-warehouse/part_test/partcol=1 |
| Total   | -1    | 1      | 10B  | 0B           |                   |        |                   |                                                           |
+---------+-------+--------+------+--------------+-------------------+--------+-------------------+-----------------------------------------------------------+
[localhost:21000] default> show partitions part_test;
+---------+-------+--------+------+--------------+-------------------+--------+-------------------+-----------------------------------------------------------+
| partcol | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format | Incremental stats | Location                                                  |
+---------+-------+--------+------+--------------+-------------------+--------+-------------------+-----------------------------------------------------------+
| 1       | -1    | 1      | 10B  | NOT CACHED   | NOT CACHED        | TEXT   | false             | hdfs://localhost:20500/test-warehouse/part_test/partcol=1 |
| Total   | -1    | 1      | 10B  | 0B           |                   |        |                   |                                                           |
+---------+-------+--------+------+--------------+-------------------+--------+-------------------+-----------------------------------------------------------+
[localhost:21000] default> explain select * from part_test order by col limit 10;
+--------------------------------------------------------------------------------------+
| Explain String                                                                       |
+--------------------------------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=8.00KB Threads=3                           |
| Per-Host Resource Estimates: Memory=32MB                                             |
| WARNING: The following tables are missing relevant table and/or column statistics.   |
| default.part_test                                                                    |
| Analyzed query: SELECT * FROM `default`.part_test ORDER BY col ASC LIMIT CAST(10     |
| AS TINYINT)                                                                          |
|                                                                                      |
| F01:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1                                |
| Per-Host Resources: mem-estimate=16.00KB mem-reservation=0B thread-reservation=1     |
|   PLAN-ROOT SINK                                                                     |
|   |  output exprs: col, partcol                                                      |
|   |  mem-estimate=0B mem-reservation=0B thread-reservation=0                         |
|   |                                                                                  |
|   02:MERGING-EXCHANGE [UNPARTITIONED]                                                |
|      order by: col ASC                                                               |
|      limit: 10                                                                       |
|      mem-estimate=16.00KB mem-reservation=0B thread-reservation=0                    |
|      tuple-ids=1 row-size=8B cardinality=1                                           |
|      in pipelines: 01(GETNEXT)                                                       |
|                                                                                      |
| F00:PLAN FRAGMENT [RANDOM] hosts=1 instances=1                                       |
| Per-Host Resources: mem-estimate=32.00MB mem-reservation=8.00KB thread-reservation=2 |
|   DATASTREAM SINK [FRAGMENT=F01, EXCHANGE=02, UNPARTITIONED]                         |
|   |  mem-estimate=0B mem-reservation=0B thread-reservation=0                         |
|   01:TOP-N [LIMIT=10]                                                                |
|   |  order by: col ASC                                                               |
|   |  mem-estimate=8B mem-reservation=0B thread-reservation=0                         |
|   |  tuple-ids=1 row-size=8B cardinality=1                                           |
|   |  in pipelines: 01(GETNEXT), 00(OPEN)                                             |
|   |                                                                                  |
|   00:SCAN HDFS [default.part_test, RANDOM]                                           |
|      HDFS partitions=1/1 files=1 size=10B                                            |
|      stored statistics:                                                              |
|        table: rows=unavailable size=unavailable                                      |
|        partitions: 0/1 rows=1                                                        |
|        columns missing stats: col                                                    |
|      extrapolated-rows=disabled max-scan-range-rows=unavailable                      |
|      mem-estimate=32.00MB mem-reservation=8.00KB thread-reservation=1                |
|      tuple-ids=0 row-size=8B cardinality=1                                           |
|      in pipelines: 00(GETNEXT)                                                       |
+--------------------------------------------------------------------------------------+
{code}

It would be nice if there was a field in the explain plan similar to "extrapolated-rows" that lists the estimated row count for the table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org