You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Stamatis Zampetakis (Jira)" <ji...@apache.org> on 2020/05/13 09:17:00 UTC

[jira] [Assigned] (HIVE-23365) Put RS deduplication optimization under cost based decision

     [ https://issues.apache.org/jira/browse/HIVE-23365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stamatis Zampetakis reassigned HIVE-23365:
------------------------------------------

    Assignee: Stamatis Zampetakis

> Put RS deduplication optimization under cost based decision
> -----------------------------------------------------------
>
>                 Key: HIVE-23365
>                 URL: https://issues.apache.org/jira/browse/HIVE-23365
>             Project: Hive
>          Issue Type: Improvement
>          Components: Physical Optimizer
>            Reporter: Jesus Camacho Rodriguez
>            Assignee: Stamatis Zampetakis
>            Priority: Major
>
> Currently, RS deduplication is always executed whenever it is semantically correct. However, it could be beneficial to leave both RS operators in the plan, e.g., if the NDV of the second RS is very low. Thus, we would like this decision to be cost-based. We could use a simple heuristic that would work fine for most of the cases without introducing regressions for existing cases, e.g., if NDV for partition column is less than estimated parallelism in the second RS, do not execute deduplication.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)