You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Ning Zhang (JIRA)" <ji...@apache.org> on 2010/08/16 23:20:21 UTC

[jira] Created: (HIVE-1544) Filtering out NULL-keyed rows in ReduceSinkOperator when no outer join involved

Filtering out NULL-keyed rows in ReduceSinkOperator when no outer join involved
-------------------------------------------------------------------------------

                 Key: HIVE-1544
                 URL: https://issues.apache.org/jira/browse/HIVE-1544
             Project: Hadoop Hive
          Issue Type: Improvement
            Reporter: Ning Zhang


As discussed in HIVE-741, if a plan indicates that a non-outer join is the first operator in the reducer, the ReduceSinkOperator should filter out (not sending) rows with NULL as keys since they will not generate any results anyways. This should save both bandwidth and processing power. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1544) Filtering out NULL-keyed rows in ReduceSinkOperator when no outer join involved

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899096#action_12899096 ] 

Ning Zhang commented on HIVE-1544:
----------------------------------

The JoinDesc already has a flag noOuterJoin to keep track if there are outer joins involved in the join operator. Based on that we should set a flag in the ReduceSinkDesc to indicate whether NULL-keyed rows will be filtered out.

> Filtering out NULL-keyed rows in ReduceSinkOperator when no outer join involved
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-1544
>                 URL: https://issues.apache.org/jira/browse/HIVE-1544
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Ning Zhang
>
> As discussed in HIVE-741, if a plan indicates that a non-outer join is the first operator in the reducer, the ReduceSinkOperator should filter out (not sending) rows with NULL as keys since they will not generate any results anyways. This should save both bandwidth and processing power. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1544) Filtering out NULL-keyed rows in ReduceSinkOperator when no outer join involved

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899805#action_12899805 ] 

Amareshwari Sriramadasu commented on HIVE-1544:
-----------------------------------------------

Also,see Namit's [comment|https://issues.apache.org/jira/browse/HIVE-741?focusedCommentId=12899177&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12899177] on HIVE-741

> Filtering out NULL-keyed rows in ReduceSinkOperator when no outer join involved
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-1544
>                 URL: https://issues.apache.org/jira/browse/HIVE-1544
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Ning Zhang
>
> As discussed in HIVE-741, if a plan indicates that a non-outer join is the first operator in the reducer, the ReduceSinkOperator should filter out (not sending) rows with NULL as keys since they will not generate any results anyways. This should save both bandwidth and processing power. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.