You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2019/01/04 02:13:00 UTC
[jira] [Commented] (IMPALA-8024) HBase table cardinality estimates
are wrong
[ https://issues.apache.org/jira/browse/IMPALA-8024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16733736#comment-16733736 ]
Paul Rogers commented on IMPALA-8024:
-------------------------------------
Another example, {{functional_hbase.stringids}}:
{noformat}
Query: show table stats stringids
+-----------------+--------------+------------+--------+
| Region Location | Start RowKey | Est. #Rows | Size |
+-----------------+--------------+------------+--------+
| localhost | | 10 | 0B |
| localhost | 1 | 4295 | 1.00MB |
| localhost | 3 | 4267 | 1.00MB |
| localhost | 5 | 4292 | 1.00MB |
| localhost | 7 | 4290 | 1.00MB |
| localhost | 9 | 10 | 0B |
| Total | | 17164 | 4.00MB |
+-----------------+--------------+------------+--------+
Query: show column stats stringids
+-----------------+-----------+------------------+--------+----------+-------------------+
| Column | Type | #Distinct Values | #Nulls | Max Size | Avg Size |
+-----------------+-----------+------------------+--------+----------+-------------------+
| id | STRING | 10000 | 0 | 4 | 3.888999938964844 |
...
select count(*) from stringids
+----------+
| count(*) |
+----------+
| 10000 |
+----------+
{noformat}
Here, {{id}} is unique, so its NDV reflects row count at the time of gathering stats. But, the estimated row count is 17K. Actual row count is 10K, same as the NDV in stats.
> HBase table cardinality estimates are wrong
> -------------------------------------------
>
> Key: IMPALA-8024
> URL: https://issues.apache.org/jira/browse/IMPALA-8024
> Project: IMPALA
> Issue Type: Bug
> Components: Frontend
> Affects Versions: Impala 3.1.0
> Reporter: Paul Rogers
> Priority: Major
>
> IMPALA-8021 added cardinality estimates to EXPLAIN plan output. Running some of our {{PlannerTest}} files revealed that our HBase cardinality estimates are very poor, even for our simple test tables. For example, for {{functional_hbase.alltypessmall}}:
> {{count\(*)}} tells us that there are 100 rows:
> {noformat}
> select count(*) from functional_hbase.alltypessmall
> +----------+
> | count(*) |
> +----------+
> | 100 |
> +----------+
> {noformat}
> Table stats claim that there are only 60 rows:
> {noformat}
> show table stats functional_hbase.alltypessmall;
> +-----------------+--------------+------------+------+
> | Region Location | Start RowKey | Est. #Rows | Size |
> +-----------------+--------------+------------+------+
> | localhost | | 10 | 0B |
> | localhost | 1 | 10 | 0B |
> | localhost | 3 | 10 | 0B |
> | localhost | 5 | 10 | 0B |
> | localhost | 7 | 10 | 0B |
> | localhost | 9 | 10 | 0B |
> | Total | | 60 | 0B |
> +-----------------+--------------+------------+------+
> {noformat}
> The NDV stats show that there must be at least 100 rows:
> {noformat}
> show column stats functional_hbase.alltypessmall
> +-----------------+-----------+------------------+--------+----------+----------+
> | Column | Type | #Distinct Values | #Nulls | Max Size | Avg Size |
> +-----------------+-----------+------------------+--------+----------+----------+
> | id | INT | 99 | 0 | 4 | 4 |
> ...
> | timestamp_col | TIMESTAMP | 100 | 0 | 16 | 16 |
> ...
> +-----------------+-----------+------------------+--------+----------+----------+
> {noformat}
> Planning a query, the most critical part, thinks there are only 50 rows:
> {noformat}
> select *
> from functional.alltypesagg join functional_hbase.alltypessmall using (id, int_col)
> |--01:SCAN HBASE [functional_hbase.alltypessmall]
> | row-size=89B cardinality=50
> {noformat}
> We need a more reliable estimate.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org