You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Jesus Camacho Rodriguez (JIRA)" <ji...@apache.org> on 2018/08/07 18:04:00 UTC

[jira] [Created] (HIVE-20332) Materialized views: Introduce heuristic on selectivity over ROW__ID to favour incremental rebuild

Jesus Camacho Rodriguez created HIVE-20332:
----------------------------------------------

             Summary: Materialized views: Introduce heuristic on selectivity over ROW__ID to favour incremental rebuild
                 Key: HIVE-20332
                 URL: https://issues.apache.org/jira/browse/HIVE-20332
             Project: Hive
          Issue Type: Improvement
          Components: Materialized views
            Reporter: Jesus Camacho Rodriguez
            Assignee: Jesus Camacho Rodriguez


Currently, we do not expose stats over {{ROW__ID.writeId}} to the optimizer. Even if we did, we always assume uniform distribution of the column values, which can easily lead to overestimations on the number of rows read when we filter on {{ROW__ID.writeId}} for materialized views (think about a large transaction for MV creation and then small ones for incremental maintenance). This overestimation can lead to incremental view maintenance not being triggered as cost of the incremental plan is overestimated (we think we will read more rows than we actually do). This could be fixed by introducing histograms that reflect better the column values distribution.

Till that moment, we will use a config variable that will set the selectivity for filter condition on ROW__ID during the cost calculation. Setting that variable to a low value will favour incremental rebuild over full rebuild.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)