You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Shant Hovsepian (Jira)" <ji...@apache.org> on 2020/07/18 20:28:00 UTC

[jira] [Created] (IMPALA-9972) Use defined referential constraints for join cardinality calculations

Shant Hovsepian created IMPALA-9972:
---------------------------------------

             Summary: Use defined referential constraints for join cardinality calculations
                 Key: IMPALA-9972
                 URL: https://issues.apache.org/jira/browse/IMPALA-9972
             Project: IMPALA
          Issue Type: Sub-task
          Components: Frontend
            Reporter: Shant Hovsepian
            Assignee: Shant Hovsepian
             Fix For: Impala 4.0


Currently an estimation technique is used to determine if the join predicates consistent a foreign key -> primary key type of functional dependency. These types of joins are common in "star schemas" and allow for certain query planning optimization.

The current technique however can produce both false negatives and false positives given the reliance on table stats which can be out of date or incorrect due to the statistical methods used to derive stats. For example higher variability in the error rates of the HyperLogLog algorithm used by stats computation to calculate the number of distinct values for a specific column.

In case swhere a referential integrity constraint exists and is defined in the table metadata, this information should be used instead of the stats based estimation to determine the type and cardinality of a join.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)