You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "He Yongqiang (JIRA)" <ji...@apache.org> on 2010/01/08 21:13:54 UTC

[jira] Commented: (HIVE-964) handle skewed keys for a join in a separate job

    [ https://issues.apache.org/jira/browse/HIVE-964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12798159#action_12798159 ] 

He Yongqiang commented on HIVE-964:
-----------------------------------

Had an offline discussion with Namit and Ning days ago. some notes: 
1) 
we can let it not support outer join right now because we need run time evaluation (not just flush data out) . Runtime join evaluate is used to see if a partial join is empty and what will be the final results.  for example: "SELECT * FROM T1 src1 LEFT OUTER JOIN T2 src2 ON src1.key+1 = src2.key RIGHT OUTER JOIN T2 src3 ON src2.key = src3.key;". Results of "src1 LEFT OUTER JOIN T2 src2 ON src1.key+1 = src2.key" is empty. So it is actually EMPTY right outer join src3.

2)
Right now after hive-963 in, once a key appears more than "hive.join.cache.size", data is actually written to local disk by row container. 
We need to let row container to use hadoop fileformat in order to write data to hdfs for another mr job more easily.

> handle skewed keys for a join in a separate job
> -----------------------------------------------
>
>                 Key: HIVE-964
>                 URL: https://issues.apache.org/jira/browse/HIVE-964
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: He Yongqiang
>         Attachments: hive-964-2009-12-17.txt, hive-964-2009-12-28-2.patch, hive-964-2009-12-29-4.patch
>
>
> The skewed keys can be written to a temporary table or file, and a followup conditional task can be used to perform the join on those keys.
> As a first step, JDBM can be used for those keys

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.