You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2019/01/02 22:05:00 UTC
[jira] [Updated] (IMPALA-8034) PlannerTest cardinality tests are not realistic

     [ https://issues.apache.org/jira/browse/IMPALA-8034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Paul Rogers updated IMPALA-8034:
--------------------------------
    Summary: PlannerTest cardinality tests are not realistic  (was: PlannerTest tests are not realistic)

> PlannerTest cardinality tests are not realistic
> -----------------------------------------------
>
>                 Key: IMPALA-8034
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8034
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>    Affects Versions: Impala 3.1.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Minor
>
> Impala generally assumes that queries are M:1, joined on the FK/PK. A PK uniquely identifies a row, so {{|pl1| = |Table|}}. This assumption is build into join estimation: that columns are independent, so if we have multiple keys, {{|pk1| * |pk2| * … * |pkn| = |Table|}}.
> But, PlannerTest frequently uses non-independent, non unique columns. For example, it might join on both the (unique) {{id}} column and the non-unique {{int_col}} column, which throws off calculations. For example:
> {noformat}
> select *
> from functional.alltypesagg a
> full outer join functional.alltypessmall b using (id, int_col)
> right join functional.alltypesaggnonulls c on (a.id = c.id and b.string_col = c.string_col)
> {noformat}
> If we then try to get the estimated cardinalities to match the actual cardinalities obtained from running the query, we end up fighting our assumptions. This shows up in the code: rather than use the classic assumption that the key columns are independent, the code uses special adjustments for redundant columns, perhaps so that tests such as the above produce good estimates.
> Better to modify (or add) tests that are based on our assumptions so we can verify that the intended logic works. It is fine to then add a few “oddball” queries to see how well the estimates hold up when the data (or user) does not follow the independence assumption.
> Alternatively, add new tests that use realistic joins, and retain the existing tests, adding a note of explanation why the resulting cardinality estimates appear wrong (because we are using unrealistic, redundant columns in joins, which real users seldom do.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org