You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2019/01/04 03:00:02 UTC

[jira] [Assigned] (IMPALA-8045) ScanNode confusion between table and scan input cardinality

     [ https://issues.apache.org/jira/browse/IMPALA-8045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Paul Rogers reassigned IMPALA-8045:
-----------------------------------

             Assignee: Paul Rogers
    Affects Version/s: Impala 3.1.0
          Description: 
The {{ScanNode}} class in the scanner contains an {{inputCardinality_}} field used by join calculations as a proxy for the table size. However, the actual scan node implementations set the {{inputCardinality_}} to the estimated number of rows *read* by the scan, which is useful when understanding the physical scan structure. But, for joins, we need the base table cardinality.

For example, the join may use the input cardinality to understand the reduction in rows due to filters in order to adjust the NDV of key columns. But, since the input cardinality is the scan count, not the table row count, the math does not work out.

The solution is to clarify the code to separate the idea of scan count vs. base table row count.
          Component/s: Frontend

> ScanNode confusion between table and scan input cardinality
> -----------------------------------------------------------
>
>                 Key: IMPALA-8045
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8045
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 3.1.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Major
>
> The {{ScanNode}} class in the scanner contains an {{inputCardinality_}} field used by join calculations as a proxy for the table size. However, the actual scan node implementations set the {{inputCardinality_}} to the estimated number of rows *read* by the scan, which is useful when understanding the physical scan structure. But, for joins, we need the base table cardinality.
> For example, the join may use the input cardinality to understand the reduction in rows due to filters in order to adjust the NDV of key columns. But, since the input cardinality is the scan count, not the table row count, the math does not work out.
> The solution is to clarify the code to separate the idea of scan count vs. base table row count.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org