You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Namit Jain (JIRA)" <ji...@apache.org> on 2012/08/02 12:48:03 UTC

[jira] [Commented] (HIVE-3086) Skewed Join Optimization

    [ https://issues.apache.org/jira/browse/HIVE-3086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427239#comment-13427239 ] 

Namit Jain commented on HIVE-3086:
----------------------------------

@Yongqiang, the current skew join does the optimization after most of the damage has already been done.
The reducer detects that a particular key is skewed, and then processes that key in a separate MR job.

However, in this approach, we are planning to know about the skewed keys before hand (stored in the metastore),
and then use them to do a map-join for the skewed keys and a normal join for the other keys. This does require
some change from the user (the user needs to store the skewed keys in the metastore). However, this approach can
be very good for repetitive workloads - similar queries running every day for similar data. Most probably, the skew
does not change every day. The skew can be calculated periodically.
                
> Skewed Join Optimization
> ------------------------
>
>                 Key: HIVE-3086
>                 URL: https://issues.apache.org/jira/browse/HIVE-3086
>             Project: Hive
>          Issue Type: New Feature
>            Reporter: Nadeem Moidu
>            Assignee: Nadeem Moidu
>
> During a join operation, if one of the columns has a skewed key, it can cause that particular reducer to become the bottleneck. The following feature will address it:
> https://cwiki.apache.org/confluence/display/Hive/Skewed+Join+Optimization

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira