You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Gopal V (JIRA)" <ji...@apache.org> on 2015/11/22 07:59:10 UTC

[jira] [Commented] (HIVE-12491) Statistics: 3 attribute join on a 2-source table is off

    [ https://issues.apache.org/jira/browse/HIVE-12491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15020875#comment-15020875 ] 

Gopal V commented on HIVE-12491:
--------------------------------

band-aid fix.

{code}
              // To avoid denominator getting larger and aggressively reducing
              // number of rows, we will ease out denominator.
              denom = getEasedOutDenominator(new ArrayList<Long>(new HashSet<Long>(distinctVals)));
{code}

> Statistics: 3 attribute join on a 2-source table is off
> -------------------------------------------------------
>
>                 Key: HIVE-12491
>                 URL: https://issues.apache.org/jira/browse/HIVE-12491
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Gopal V
>            Assignee: Prasanth Jayachandran
>
> The eased out denominator has to detect duplicate row-stats from different attributes.
> {code}
>   private Long getEasedOutDenominator(List<Long> distinctVals) {
>       // Exponential back-off for NDVs.
>       // 1) Descending order sort of NDVs
>       // 2) denominator = NDV1 * (NDV2 ^ (1/2)) * (NDV3 ^ (1/4))) * ....
>       Collections.sort(distinctVals, Collections.reverseOrder());
>       long denom = distinctVals.get(0);
>       for (int i = 1; i < distinctVals.size(); i++) {
>         denom = (long) (denom * Math.pow(distinctVals.get(i), 1.0 / (1 << i)));
>       }
>       return denom;
>     }
> {code}
> This gets {{[8007986, 821974390, 821974390]}}, which is actually 3 columns 2 of which are from the RHS table.
> So the eased out denominator is off by a factor of 30,000 or so, causing OOMs in map-joins.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)