You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Sahil Takiar (Jira)" <ji...@apache.org> on 2020/08/13 17:21:00 UTC
[jira] [Created] (IMPALA-10084) Display the number of estimated
rows for a table
Sahil Takiar created IMPALA-10084:
-------------------------------------
Summary: Display the number of estimated rows for a table
Key: IMPALA-10084
URL: https://issues.apache.org/jira/browse/IMPALA-10084
Project: IMPALA
Issue Type: Sub-task
Reporter: Sahil Takiar
AFAICT, there is no way to determine the number of rows estimated for a table when row counts have been estimated via file size:
{code:java}
[localhost:21000] default> create table test (col int);
[localhost:21000] default> insert into table test values (1), (2), (3), (4), (5);
[localhost:21000] default> show table stats test;
+-------+--------+------+--------------+-------------------+--------+-------------------+--------------------------------------------+
| #Rows | #Files | Size | Bytes Cached | Cache Replication | Format | Incremental stats | Location |
+-------+--------+------+--------------+-------------------+--------+-------------------+--------------------------------------------+
| -1 | 1 | 10B | NOT CACHED | NOT CACHED | TEXT | false | hdfs://localhost:20500/test-warehouse/test |
+-------+--------+------+--------------+-------------------+--------+-------------------+--------------------------------------------+
[localhost:21000] default> explain select * from test order by col limit 10;
+------------------------------------------------------------------------------------+
| Explain String |
+------------------------------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=8.00KB Threads=3 |
| Per-Host Resource Estimates: Memory=32MB |
| WARNING: The following tables are missing relevant table and/or column statistics. |
| default.test |
| |
| PLAN-ROOT SINK |
| | |
| 02:MERGING-EXCHANGE [UNPARTITIONED] |
| | order by: col ASC |
| | limit: 10 |
| | |
| 01:TOP-N [LIMIT=10] |
| | order by: col ASC |
| | row-size=4B cardinality=3 |
| | |
| 00:SCAN HDFS [default.test] |
| HDFS partitions=1/1 files=1 size=10B |
| row-size=4B cardinality=3 |
+------------------------------------------------------------------------------------+
[localhost:21000] default> set explain_level=3;
localhost:21000] default> explain select * from test order by col limit 10;
+--------------------------------------------------------------------------------------+
| Explain String |
+--------------------------------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=8.00KB Threads=3 |
| Per-Host Resource Estimates: Memory=32MB |
| WARNING: The following tables are missing relevant table and/or column statistics. |
| default.test |
| Analyzed query: SELECT * FROM `default`.test ORDER BY col ASC LIMIT CAST(10 AS |
| TINYINT) |
| |
| F01:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1 |
| Per-Host Resources: mem-estimate=16.00KB mem-reservation=0B thread-reservation=1 |
| PLAN-ROOT SINK |
| | output exprs: col |
| | mem-estimate=0B mem-reservation=0B thread-reservation=0 |
| | |
| 02:MERGING-EXCHANGE [UNPARTITIONED] |
| order by: col ASC |
| limit: 10 |
| mem-estimate=16.00KB mem-reservation=0B thread-reservation=0 |
| tuple-ids=1 row-size=4B cardinality=3 |
| in pipelines: 01(GETNEXT) |
| |
| F00:PLAN FRAGMENT [RANDOM] hosts=1 instances=1 |
| Per-Host Resources: mem-estimate=32.00MB mem-reservation=8.00KB thread-reservation=2 |
| DATASTREAM SINK [FRAGMENT=F01, EXCHANGE=02, UNPARTITIONED] |
| | mem-estimate=0B mem-reservation=0B thread-reservation=0 |
| 01:TOP-N [LIMIT=10] |
| | order by: col ASC |
| | mem-estimate=12B mem-reservation=0B thread-reservation=0 |
| | tuple-ids=1 row-size=4B cardinality=3 |
| | in pipelines: 01(GETNEXT), 00(OPEN) |
| | |
| 00:SCAN HDFS [default.test, RANDOM] |
| HDFS partitions=1/1 files=1 size=10B |
| stored statistics: |
| table: rows=unavailable size=unavailable |
| columns: unavailable |
| extrapolated-rows=disabled max-scan-range-rows=unavailable |
| mem-estimate=32.00MB mem-reservation=8.00KB thread-reservation=1 |
| tuple-ids=0 row-size=4B cardinality=3 |
| in pipelines: 00(GETNEXT) |
+--------------------------------------------------------------------------------------+
[localhost:21000] default> create table part_test (col int) partitioned by (partcol int);
[localhost:21000] default> alter table part_test add partition (partcol=1);
[localhost:21000] default> insert into table part_test partition (partcol=1) values (1), (2), (3), (4), (5);
[localhost:21000] default> show table stats part_test;
+---------+-------+--------+------+--------------+-------------------+--------+-------------------+-----------------------------------------------------------+
| partcol | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format | Incremental stats | Location |
+---------+-------+--------+------+--------------+-------------------+--------+-------------------+-----------------------------------------------------------+
| 1 | -1 | 1 | 10B | NOT CACHED | NOT CACHED | TEXT | false | hdfs://localhost:20500/test-warehouse/part_test/partcol=1 |
| Total | -1 | 1 | 10B | 0B | | | | |
+---------+-------+--------+------+--------------+-------------------+--------+-------------------+-----------------------------------------------------------+
[localhost:21000] default> show partitions part_test;
+---------+-------+--------+------+--------------+-------------------+--------+-------------------+-----------------------------------------------------------+
| partcol | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format | Incremental stats | Location |
+---------+-------+--------+------+--------------+-------------------+--------+-------------------+-----------------------------------------------------------+
| 1 | -1 | 1 | 10B | NOT CACHED | NOT CACHED | TEXT | false | hdfs://localhost:20500/test-warehouse/part_test/partcol=1 |
| Total | -1 | 1 | 10B | 0B | | | | |
+---------+-------+--------+------+--------------+-------------------+--------+-------------------+-----------------------------------------------------------+
[localhost:21000] default> explain select * from part_test order by col limit 10;
+--------------------------------------------------------------------------------------+
| Explain String |
+--------------------------------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=8.00KB Threads=3 |
| Per-Host Resource Estimates: Memory=32MB |
| WARNING: The following tables are missing relevant table and/or column statistics. |
| default.part_test |
| Analyzed query: SELECT * FROM `default`.part_test ORDER BY col ASC LIMIT CAST(10 |
| AS TINYINT) |
| |
| F01:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1 |
| Per-Host Resources: mem-estimate=16.00KB mem-reservation=0B thread-reservation=1 |
| PLAN-ROOT SINK |
| | output exprs: col, partcol |
| | mem-estimate=0B mem-reservation=0B thread-reservation=0 |
| | |
| 02:MERGING-EXCHANGE [UNPARTITIONED] |
| order by: col ASC |
| limit: 10 |
| mem-estimate=16.00KB mem-reservation=0B thread-reservation=0 |
| tuple-ids=1 row-size=8B cardinality=1 |
| in pipelines: 01(GETNEXT) |
| |
| F00:PLAN FRAGMENT [RANDOM] hosts=1 instances=1 |
| Per-Host Resources: mem-estimate=32.00MB mem-reservation=8.00KB thread-reservation=2 |
| DATASTREAM SINK [FRAGMENT=F01, EXCHANGE=02, UNPARTITIONED] |
| | mem-estimate=0B mem-reservation=0B thread-reservation=0 |
| 01:TOP-N [LIMIT=10] |
| | order by: col ASC |
| | mem-estimate=8B mem-reservation=0B thread-reservation=0 |
| | tuple-ids=1 row-size=8B cardinality=1 |
| | in pipelines: 01(GETNEXT), 00(OPEN) |
| | |
| 00:SCAN HDFS [default.part_test, RANDOM] |
| HDFS partitions=1/1 files=1 size=10B |
| stored statistics: |
| table: rows=unavailable size=unavailable |
| partitions: 0/1 rows=1 |
| columns missing stats: col |
| extrapolated-rows=disabled max-scan-range-rows=unavailable |
| mem-estimate=32.00MB mem-reservation=8.00KB thread-reservation=1 |
| tuple-ids=0 row-size=8B cardinality=1 |
| in pipelines: 00(GETNEXT) |
+--------------------------------------------------------------------------------------+
{code}
It would be nice if there was a field in the explain plan similar to "extrapolated-rows" that lists the estimated row count for the table.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org