You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@impala.apache.org by "Alexander Behm (JIRA)" <ji...@apache.org> on 2017/05/24 20:06:04 UTC

[jira] [Created] (IMPALA-5361) Reconsider cardinality estimation behavior with partial row count stats.

Alexander Behm created IMPALA-5361:
--------------------------------------

Summary: Reconsider cardinality estimation behavior with partial row count stats.
Key: IMPALA-5361
URL: https://issues.apache.org/jira/browse/IMPALA-5361
Project: IMPALA
Issue Type: Improvement
Components: Frontend
Affects Versions: Impala 2.7.0, Impala 2.8.0, Impala 2.9.0
Reporter: Alexander Behm

The current scan-cardinality estimation behavior with partial table/partition row counts is simple, but could potentially be improved:

In the current code there are two cases: Always use the partition stats and only fall back to table stats when no partition stats are available. That behavior is easy to understand and explain.

However, one might argue that it is better to fall back to table-level stats for queries that select all partitions.

There are many cases to consider, among them:
1. Select all partitions; all partitions have stats
2. Select all partitions; some partitions have stats
3. Select all partitions; no partitions have stats
4. Select some partitions; all selected partitions have stats
5. Select some partitions; some selected partitions have stats
6. Select some partitions; no selected partitions have stats

There is more discussion on this topic in the following CR:
https://gerrit.cloudera.org/#/c/6840/3/tests/metadata/test_explain.py@127

--
This message was sent by Atlassian JIRA
(v6.3.15#6346)