You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org> on 2011/12/07 09:16:41 UTC
[jira] [Commented] (HIVE-2329) Not using map aggregation, fails to execute group-by after cluster-by with same key

    [ https://issues.apache.org/jira/browse/HIVE-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13164233#comment-13164233 ] 

jiraposter@reviews.apache.org commented on HIVE-2329:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1313/
-----------------------------------------------------------

(Updated 2011-12-07 08:15:06.858543)


Review request for hive, John Sichi and Carl Steinbach.


Changes
-------

rebased to trunk


Summary
-------

If map aggregation is set to false, DISTRIBUTED BY followed by GROUP BY with same key fails in runtime. ReduceSinkDeDuplication optimization should be avoid if child of child RS is GBY. 


This addresses bug HIVE-2329.
    https://issues.apache.org/jira/browse/HIVE-2329


Diffs (updated)
-----

  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java e91b4d5 
  ql/src/test/queries/clientpositive/reduce_deduplicate_exclude_gby.q PRE-CREATION 
  ql/src/test/results/clientpositive/reduce_deduplicate_exclude_gby.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/1313/diff


Testing (updated)
-------

test added : reduce_deduplicate_exclude_gby.q


Thanks,

Navis


                
> Not using map aggregation, fails to execute group-by after cluster-by with same key
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-2329
>                 URL: https://issues.apache.org/jira/browse/HIVE-2329
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Navis
>            Assignee: Navis
>            Priority: Minor
>             Fix For: 0.9.0
>
>         Attachments: HIVE-2329.1.patch.txt
>
>
> hive.map.aggr=false
> select Q1.key_int1, sum(Q1.key_int1), sum(distinct Q1.key_int1) from (select * from t1 cluster by key_int1) Q1 group by Q1.key_int1
> resulted..
> FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
> from hadoop logs..
> Caused by: java.lang.RuntimeException: cannot find field key from []
> 	at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:321)
> 	at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:119)
> 	at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:82)
> 	at org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:198)
> 	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
> 	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433)
> ........
> I think the problem is caused by ReduceSinkDeDuplication, removing RS which was providing rs.key for GBY operation. If child of child RS is a GBY, we should bypass the optimization.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira