You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Alexander Behm (JIRA)" <ji...@apache.org> on 2017/05/24 20:06:04 UTC

[jira] [Created] (IMPALA-5361) Reconsider cardinality estimation behavior with partial row count stats.

Alexander Behm created IMPALA-5361:
--------------------------------------

             Summary: Reconsider cardinality estimation behavior with partial row count stats.
                 Key: IMPALA-5361
                 URL: https://issues.apache.org/jira/browse/IMPALA-5361
             Project: IMPALA
          Issue Type: Improvement
          Components: Frontend
    Affects Versions: Impala 2.7.0, Impala 2.8.0, Impala 2.9.0
            Reporter: Alexander Behm


The current scan-cardinality estimation behavior with partial table/partition row counts is simple, but could potentially be improved:

In the current code there are two cases: Always use the partition stats and only fall back to table stats when no partition stats are available. That behavior is easy to understand and explain.

However, one might argue that it is better to fall back to table-level stats for queries that select all partitions.

There are many cases to consider, among them:
1. Select all partitions; all partitions have stats
2. Select all partitions; some partitions have stats
3. Select all partitions; no partitions have stats
4. Select some partitions; all selected partitions have stats
5. Select some partitions; some selected partitions have stats
6. Select some partitions; no selected partitions have stats

There is more discussion on this topic in the following CR:
https://gerrit.cloudera.org/#/c/6840/3/tests/metadata/test_explain.py@127



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)