You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hive.apache.org by "Navis (JIRA)" <ji...@apache.org> on 2011/08/01 10:49:09 UTC

[jira] [Created] (HIVE-2329) Not using map aggregation, fails to execute group-by after cluster-by with same key

Not using map aggregation, fails to execute group-by after cluster-by with same key
-----------------------------------------------------------------------------------

                 Key: HIVE-2329
                 URL: https://issues.apache.org/jira/browse/HIVE-2329
             Project: Hive
          Issue Type: Bug
    Affects Versions: 0.8.0
            Reporter: Navis
            Priority: Minor


hive.map.aggr=false
explain select Q1.key_int1, sum(Q1.key_int1), sum(distinct Q1.key_int1) from (select * from t1 cluster by key_int1) Q1 group by Q1.key_int1

resulted..

FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask

from hadoop logs..

Caused by: java.lang.RuntimeException: cannot find field key from []
	at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:321)
	at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:119)
	at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:82)
	at org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:198)
	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433)
........

I think the problem is caused by ReduceSinkDeDuplication, removing RS which was providing rs.key for GBY operation. If child of child RS is a GBY, we should bypass the optimization.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2329) Not using map aggregation, fails to execute group-by after cluster-by with same key

Posted by "Phabricator (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Phabricator updated HIVE-2329:
------------------------------

    Attachment: HIVE-2329.D657.1.patch

njain requested code review of "HIVE-2329 [jira] Not using map aggregation, fails to execute group-by after cluster-by with same key".
Reviewers: JIRA

  HIVE-2329

  hive.map.aggr=false
  select Q1.key_int1, sum(Q1.key_int1), sum(distinct Q1.key_int1) from (select * from t1 cluster by key_int1) Q1 group by Q1.key_int1

  resulted..

  FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask

  from hadoop logs..

  Caused by: java.lang.RuntimeException: cannot find field key from []
  	at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:321)
  	at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:119)
  	at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:82)
  	at org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:198)
  	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
  	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433)
  ........

  I think the problem is caused by ReduceSinkDeDuplication, removing RS which was providing rs.key for GBY operation. If child of child RS is a GBY, we should bypass the optimization.

TEST PLAN
  EMPTY

REVISION DETAIL
  https://reviews.facebook.net/D657

AFFECTED FILES
  ql/src/test/results/clientpositive/reduce_deduplicate_exclude_gby.q.out
  ql/src/test/queries/clientpositive/reduce_deduplicate_exclude_gby.q
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java

MANAGE HERALD DIFFERENTIAL RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/1467/

Tip: use the X-Herald-Rules header to filter Herald messages in your client.

                
> Not using map aggregation, fails to execute group-by after cluster-by with same key
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-2329
>                 URL: https://issues.apache.org/jira/browse/HIVE-2329
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Navis
>            Assignee: Navis
>            Priority: Minor
>             Fix For: 0.9.0
>
>         Attachments: HIVE-2329.1.patch.txt, HIVE-2329.D657.1.patch
>
>
> hive.map.aggr=false
> select Q1.key_int1, sum(Q1.key_int1), sum(distinct Q1.key_int1) from (select * from t1 cluster by key_int1) Q1 group by Q1.key_int1
> resulted..
> FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
> from hadoop logs..
> Caused by: java.lang.RuntimeException: cannot find field key from []
> 	at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:321)
> 	at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:119)
> 	at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:82)
> 	at org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:198)
> 	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
> 	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433)
> ........
> I think the problem is caused by ReduceSinkDeDuplication, removing RS which was providing rs.key for GBY operation. If child of child RS is a GBY, we should bypass the optimization.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2329) Not using map aggregation, fails to execute group-by after cluster-by with same key

Posted by "Navis (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Navis updated HIVE-2329:
------------------------

    Attachment:     (was: HIVE-2329.2.patch)
    
> Not using map aggregation, fails to execute group-by after cluster-by with same key
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-2329
>                 URL: https://issues.apache.org/jira/browse/HIVE-2329
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.8.0
>            Reporter: Navis
>            Assignee: Navis
>            Priority: Minor
>         Attachments: HIVE-2329.1.patch.txt
>
>
> hive.map.aggr=false
> select Q1.key_int1, sum(Q1.key_int1), sum(distinct Q1.key_int1) from (select * from t1 cluster by key_int1) Q1 group by Q1.key_int1
> resulted..
> FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
> from hadoop logs..
> Caused by: java.lang.RuntimeException: cannot find field key from []
> 	at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:321)
> 	at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:119)
> 	at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:82)
> 	at org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:198)
> 	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
> 	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433)
> ........
> I think the problem is caused by ReduceSinkDeDuplication, removing RS which was providing rs.key for GBY operation. If child of child RS is a GBY, we should bypass the optimization.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2329) Not using map aggregation, fails to execute group-by after cluster-by with same key

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13080779#comment-13080779 ] 

jiraposter@reviews.apache.org commented on HIVE-2329:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1313/
-----------------------------------------------------------

Review request for hive.


Summary
-------

If map aggregation is set to false, DISTRIBUTED BY followed by GROUP BY with same key fails in runtime. ReduceSinkDeDuplication optimization should be avoid if child of child RS is GBY. 


This addresses bug HIVE-2329.
    https://issues.apache.org/jira/browse/HIVE-2329


Diffs
-----

  ql/src/test/queries/clientpositive/reduce_deduplicate_exclude_gby.q PRE-CREATION 
  ql/src/test/results/clientpositive/reduce_deduplicate_exclude_gby.q.out PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java e91b4d5 

Diff: https://reviews.apache.org/r/1313/diff


Testing
-------


Thanks,

Navis



> Not using map aggregation, fails to execute group-by after cluster-by with same key
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-2329
>                 URL: https://issues.apache.org/jira/browse/HIVE-2329
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: Navis
>            Assignee: Navis
>            Priority: Minor
>         Attachments: HIVE-2329.1.patch, HIVE-2329.2.patch
>
>
> hive.map.aggr=false
> select Q1.key_int1, sum(Q1.key_int1), sum(distinct Q1.key_int1) from (select * from t1 cluster by key_int1) Q1 group by Q1.key_int1
> resulted..
> FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
> from hadoop logs..
> Caused by: java.lang.RuntimeException: cannot find field key from []
> 	at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:321)
> 	at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:119)
> 	at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:82)
> 	at org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:198)
> 	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
> 	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433)
> ........
> I think the problem is caused by ReduceSinkDeDuplication, removing RS which was providing rs.key for GBY operation. If child of child RS is a GBY, we should bypass the optimization.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2329) Not using map aggregation, fails to execute group-by after cluster-by with same key

Posted by "John Sichi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13078333#comment-13078333 ] 

John Sichi commented on HIVE-2329:
----------------------------------

Good catch.  Can you submit an updated patch which includes a testcase (.q and .q.out) files, and then create a Review Board entry?

https://cwiki.apache.org/confluence/display/Hive/HowToContribute#HowToContribute-ReviewProcess


> Not using map aggregation, fails to execute group-by after cluster-by with same key
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-2329
>                 URL: https://issues.apache.org/jira/browse/HIVE-2329
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: Navis
>            Assignee: Navis
>            Priority: Minor
>         Attachments: HIVE-2329.1.patch
>
>
> hive.map.aggr=false
> select Q1.key_int1, sum(Q1.key_int1), sum(distinct Q1.key_int1) from (select * from t1 cluster by key_int1) Q1 group by Q1.key_int1
> resulted..
> FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
> from hadoop logs..
> Caused by: java.lang.RuntimeException: cannot find field key from []
> 	at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:321)
> 	at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:119)
> 	at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:82)
> 	at org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:198)
> 	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
> 	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433)
> ........
> I think the problem is caused by ReduceSinkDeDuplication, removing RS which was providing rs.key for GBY operation. If child of child RS is a GBY, we should bypass the optimization.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2329) Not using map aggregation, fails to execute group-by after cluster-by with same key

Posted by "Navis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Navis updated HIVE-2329:
------------------------

    Description: 
hive.map.aggr=false
select Q1.key_int1, sum(Q1.key_int1), sum(distinct Q1.key_int1) from (select * from t1 cluster by key_int1) Q1 group by Q1.key_int1

resulted..

FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask

from hadoop logs..

Caused by: java.lang.RuntimeException: cannot find field key from []
	at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:321)
	at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:119)
	at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:82)
	at org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:198)
	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433)
........

I think the problem is caused by ReduceSinkDeDuplication, removing RS which was providing rs.key for GBY operation. If child of child RS is a GBY, we should bypass the optimization.

  was:
hive.map.aggr=false
explain select Q1.key_int1, sum(Q1.key_int1), sum(distinct Q1.key_int1) from (select * from t1 cluster by key_int1) Q1 group by Q1.key_int1

resulted..

FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask

from hadoop logs..

Caused by: java.lang.RuntimeException: cannot find field key from []
	at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:321)
	at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:119)
	at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:82)
	at org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:198)
	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433)
........

I think the problem is caused by ReduceSinkDeDuplication, removing RS which was providing rs.key for GBY operation. If child of child RS is a GBY, we should bypass the optimization.


> Not using map aggregation, fails to execute group-by after cluster-by with same key
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-2329
>                 URL: https://issues.apache.org/jira/browse/HIVE-2329
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: Navis
>            Priority: Minor
>         Attachments: HIVE-2329.1.patch
>
>
> hive.map.aggr=false
> select Q1.key_int1, sum(Q1.key_int1), sum(distinct Q1.key_int1) from (select * from t1 cluster by key_int1) Q1 group by Q1.key_int1
> resulted..
> FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
> from hadoop logs..
> Caused by: java.lang.RuntimeException: cannot find field key from []
> 	at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:321)
> 	at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:119)
> 	at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:82)
> 	at org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:198)
> 	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
> 	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433)
> ........
> I think the problem is caused by ReduceSinkDeDuplication, removing RS which was providing rs.key for GBY operation. If child of child RS is a GBY, we should bypass the optimization.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2329) Not using map aggregation, fails to execute group-by after cluster-by with same key

Posted by "Navis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Navis updated HIVE-2329:
------------------------

    Attachment: HIVE-2329.2.patch

> Not using map aggregation, fails to execute group-by after cluster-by with same key
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-2329
>                 URL: https://issues.apache.org/jira/browse/HIVE-2329
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: Navis
>            Assignee: Navis
>            Priority: Minor
>         Attachments: HIVE-2329.1.patch, HIVE-2329.2.patch
>
>
> hive.map.aggr=false
> select Q1.key_int1, sum(Q1.key_int1), sum(distinct Q1.key_int1) from (select * from t1 cluster by key_int1) Q1 group by Q1.key_int1
> resulted..
> FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
> from hadoop logs..
> Caused by: java.lang.RuntimeException: cannot find field key from []
> 	at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:321)
> 	at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:119)
> 	at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:82)
> 	at org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:198)
> 	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
> 	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433)
> ........
> I think the problem is caused by ReduceSinkDeDuplication, removing RS which was providing rs.key for GBY operation. If child of child RS is a GBY, we should bypass the optimization.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2329) Not using map aggregation, fails to execute group-by after cluster-by with same key

Posted by "Navis (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Navis updated HIVE-2329:
------------------------

    Attachment: HIVE-2329.0.8.0.patch
    
> Not using map aggregation, fails to execute group-by after cluster-by with same key
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-2329
>                 URL: https://issues.apache.org/jira/browse/HIVE-2329
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: Navis
>            Assignee: Navis
>            Priority: Minor
>         Attachments: HIVE-2329.0.8.0.patch, HIVE-2329.1.patch, HIVE-2329.2.patch
>
>
> hive.map.aggr=false
> select Q1.key_int1, sum(Q1.key_int1), sum(distinct Q1.key_int1) from (select * from t1 cluster by key_int1) Q1 group by Q1.key_int1
> resulted..
> FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
> from hadoop logs..
> Caused by: java.lang.RuntimeException: cannot find field key from []
> 	at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:321)
> 	at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:119)
> 	at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:82)
> 	at org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:198)
> 	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
> 	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433)
> ........
> I think the problem is caused by ReduceSinkDeDuplication, removing RS which was providing rs.key for GBY operation. If child of child RS is a GBY, we should bypass the optimization.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2329) Not using map aggregation, fails to execute group-by after cluster-by with same key

Posted by "Navis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Navis updated HIVE-2329:
------------------------

    Attachment: HIVE-2329.1.patch

simple walk-around

> Not using map aggregation, fails to execute group-by after cluster-by with same key
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-2329
>                 URL: https://issues.apache.org/jira/browse/HIVE-2329
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: Navis
>            Priority: Minor
>         Attachments: HIVE-2329.1.patch
>
>
> hive.map.aggr=false
> explain select Q1.key_int1, sum(Q1.key_int1), sum(distinct Q1.key_int1) from (select * from t1 cluster by key_int1) Q1 group by Q1.key_int1
> resulted..
> FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
> from hadoop logs..
> Caused by: java.lang.RuntimeException: cannot find field key from []
> 	at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:321)
> 	at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:119)
> 	at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:82)
> 	at org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:198)
> 	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
> 	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433)
> ........
> I think the problem is caused by ReduceSinkDeDuplication, removing RS which was providing rs.key for GBY operation. If child of child RS is a GBY, we should bypass the optimization.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2329) Not using map aggregation, fails to execute group-by after cluster-by with same key

Posted by "Navis (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Navis updated HIVE-2329:
------------------------

        Fix Version/s: 0.9.0
    Affects Version/s:     (was: 0.8.0)
               Status: Patch Available  (was: Open)
    
> Not using map aggregation, fails to execute group-by after cluster-by with same key
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-2329
>                 URL: https://issues.apache.org/jira/browse/HIVE-2329
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Navis
>            Assignee: Navis
>            Priority: Minor
>             Fix For: 0.9.0
>
>         Attachments: HIVE-2329.1.patch.txt
>
>
> hive.map.aggr=false
> select Q1.key_int1, sum(Q1.key_int1), sum(distinct Q1.key_int1) from (select * from t1 cluster by key_int1) Q1 group by Q1.key_int1
> resulted..
> FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
> from hadoop logs..
> Caused by: java.lang.RuntimeException: cannot find field key from []
> 	at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:321)
> 	at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:119)
> 	at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:82)
> 	at org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:198)
> 	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
> 	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433)
> ........
> I think the problem is caused by ReduceSinkDeDuplication, removing RS which was providing rs.key for GBY operation. If child of child RS is a GBY, we should bypass the optimization.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2329) Not using map aggregation, fails to execute group-by after cluster-by with same key

Posted by "Namit Jain (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-2329:
-----------------------------

      Resolution: Fixed
    Hadoop Flags: Reviewed
          Status: Resolved  (was: Patch Available)

Committed. Thanks Navis
                
> Not using map aggregation, fails to execute group-by after cluster-by with same key
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-2329
>                 URL: https://issues.apache.org/jira/browse/HIVE-2329
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Navis
>            Assignee: Navis
>            Priority: Minor
>             Fix For: 0.9.0
>
>         Attachments: HIVE-2329.1.patch.txt, HIVE-2329.D657.1.patch
>
>
> hive.map.aggr=false
> select Q1.key_int1, sum(Q1.key_int1), sum(distinct Q1.key_int1) from (select * from t1 cluster by key_int1) Q1 group by Q1.key_int1
> resulted..
> FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
> from hadoop logs..
> Caused by: java.lang.RuntimeException: cannot find field key from []
> 	at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:321)
> 	at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:119)
> 	at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:82)
> 	at org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:198)
> 	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
> 	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433)
> ........
> I think the problem is caused by ReduceSinkDeDuplication, removing RS which was providing rs.key for GBY operation. If child of child RS is a GBY, we should bypass the optimization.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2329) Not using map aggregation, fails to execute group-by after cluster-by with same key

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13166472#comment-13166472 ] 

Hudson commented on HIVE-2329:
------------------------------

Integrated in Hive-trunk-h0.23.0 #12 (See [https://builds.apache.org/job/Hive-trunk-h0.23.0/12/])
    HIVE-2329 Not using map aggregation, fails to execute group-by after
cluster-by with same key (Navis via namit)

namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1212551
Files : 
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java
* /hive/trunk/ql/src/test/queries/clientpositive/reduce_deduplicate_exclude_gby.q
* /hive/trunk/ql/src/test/results/clientpositive/reduce_deduplicate_exclude_gby.q.out

                
> Not using map aggregation, fails to execute group-by after cluster-by with same key
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-2329
>                 URL: https://issues.apache.org/jira/browse/HIVE-2329
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Navis
>            Assignee: Navis
>            Priority: Minor
>             Fix For: 0.9.0
>
>         Attachments: HIVE-2329.1.patch.txt, HIVE-2329.D657.1.patch
>
>
> hive.map.aggr=false
> select Q1.key_int1, sum(Q1.key_int1), sum(distinct Q1.key_int1) from (select * from t1 cluster by key_int1) Q1 group by Q1.key_int1
> resulted..
> FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
> from hadoop logs..
> Caused by: java.lang.RuntimeException: cannot find field key from []
> 	at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:321)
> 	at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:119)
> 	at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:82)
> 	at org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:198)
> 	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
> 	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433)
> ........
> I think the problem is caused by ReduceSinkDeDuplication, removing RS which was providing rs.key for GBY operation. If child of child RS is a GBY, we should bypass the optimization.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2329) Not using map aggregation, fails to execute group-by after cluster-by with same key

Posted by "Carl Steinbach (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Carl Steinbach updated HIVE-2329:
---------------------------------

    Component/s: Query Processor

Please change the status to Patch Available if this is ready for review. Thanks.
                
> Not using map aggregation, fails to execute group-by after cluster-by with same key
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-2329
>                 URL: https://issues.apache.org/jira/browse/HIVE-2329
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.8.0
>            Reporter: Navis
>            Assignee: Navis
>            Priority: Minor
>         Attachments: HIVE-2329.0.8.0.patch, HIVE-2329.1.patch, HIVE-2329.2.patch
>
>
> hive.map.aggr=false
> select Q1.key_int1, sum(Q1.key_int1), sum(distinct Q1.key_int1) from (select * from t1 cluster by key_int1) Q1 group by Q1.key_int1
> resulted..
> FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
> from hadoop logs..
> Caused by: java.lang.RuntimeException: cannot find field key from []
> 	at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:321)
> 	at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:119)
> 	at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:82)
> 	at org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:198)
> 	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
> 	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433)
> ........
> I think the problem is caused by ReduceSinkDeDuplication, removing RS which was providing rs.key for GBY operation. If child of child RS is a GBY, we should bypass the optimization.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HIVE-2329) Not using map aggregation, fails to execute group-by after cluster-by with same key

Posted by "John Sichi (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

John Sichi reassigned HIVE-2329:
--------------------------------

    Assignee: Navis

> Not using map aggregation, fails to execute group-by after cluster-by with same key
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-2329
>                 URL: https://issues.apache.org/jira/browse/HIVE-2329
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: Navis
>            Assignee: Navis
>            Priority: Minor
>         Attachments: HIVE-2329.1.patch
>
>
> hive.map.aggr=false
> select Q1.key_int1, sum(Q1.key_int1), sum(distinct Q1.key_int1) from (select * from t1 cluster by key_int1) Q1 group by Q1.key_int1
> resulted..
> FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
> from hadoop logs..
> Caused by: java.lang.RuntimeException: cannot find field key from []
> 	at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:321)
> 	at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:119)
> 	at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:82)
> 	at org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:198)
> 	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
> 	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433)
> ........
> I think the problem is caused by ReduceSinkDeDuplication, removing RS which was providing rs.key for GBY operation. If child of child RS is a GBY, we should bypass the optimization.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2329) Not using map aggregation, fails to execute group-by after cluster-by with same key

Posted by "Navis (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Navis updated HIVE-2329:
------------------------

    Attachment: HIVE-2329.1.patch.txt

rebased to trunk
                
> Not using map aggregation, fails to execute group-by after cluster-by with same key
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-2329
>                 URL: https://issues.apache.org/jira/browse/HIVE-2329
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.8.0
>            Reporter: Navis
>            Assignee: Navis
>            Priority: Minor
>         Attachments: HIVE-2329.1.patch.txt
>
>
> hive.map.aggr=false
> select Q1.key_int1, sum(Q1.key_int1), sum(distinct Q1.key_int1) from (select * from t1 cluster by key_int1) Q1 group by Q1.key_int1
> resulted..
> FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
> from hadoop logs..
> Caused by: java.lang.RuntimeException: cannot find field key from []
> 	at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:321)
> 	at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:119)
> 	at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:82)
> 	at org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:198)
> 	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
> 	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433)
> ........
> I think the problem is caused by ReduceSinkDeDuplication, removing RS which was providing rs.key for GBY operation. If child of child RS is a GBY, we should bypass the optimization.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2329) Not using map aggregation, fails to execute group-by after cluster-by with same key

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13167391#comment-13167391 ] 

Phabricator commented on HIVE-2329:
-----------------------------------

test123 has commented on the revision "HIVE-2329 [jira] Not using map aggregation, fails to execute group-by after cluster-by with same key".

REVISION DETAIL
  https://reviews.facebook.net/D657

                
> Not using map aggregation, fails to execute group-by after cluster-by with same key
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-2329
>                 URL: https://issues.apache.org/jira/browse/HIVE-2329
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Navis
>            Assignee: Navis
>            Priority: Minor
>             Fix For: 0.9.0
>
>         Attachments: HIVE-2329.1.patch.txt, HIVE-2329.D657.1.patch
>
>
> hive.map.aggr=false
> select Q1.key_int1, sum(Q1.key_int1), sum(distinct Q1.key_int1) from (select * from t1 cluster by key_int1) Q1 group by Q1.key_int1
> resulted..
> FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
> from hadoop logs..
> Caused by: java.lang.RuntimeException: cannot find field key from []
> 	at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:321)
> 	at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:119)
> 	at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:82)
> 	at org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:198)
> 	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
> 	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433)
> ........
> I think the problem is caused by ReduceSinkDeDuplication, removing RS which was providing rs.key for GBY operation. If child of child RS is a GBY, we should bypass the optimization.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2329) Not using map aggregation, fails to execute group-by after cluster-by with same key

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13166881#comment-13166881 ] 

Hudson commented on HIVE-2329:
------------------------------

Integrated in Hive-trunk-h0.21 #1137 (See [https://builds.apache.org/job/Hive-trunk-h0.21/1137/])
    HIVE-2329 Not using map aggregation, fails to execute group-by after
cluster-by with same key (Navis via namit)

namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1212551
Files : 
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java
* /hive/trunk/ql/src/test/queries/clientpositive/reduce_deduplicate_exclude_gby.q
* /hive/trunk/ql/src/test/results/clientpositive/reduce_deduplicate_exclude_gby.q.out

                
> Not using map aggregation, fails to execute group-by after cluster-by with same key
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-2329
>                 URL: https://issues.apache.org/jira/browse/HIVE-2329
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Navis
>            Assignee: Navis
>            Priority: Minor
>             Fix For: 0.9.0
>
>         Attachments: HIVE-2329.1.patch.txt, HIVE-2329.D657.1.patch
>
>
> hive.map.aggr=false
> select Q1.key_int1, sum(Q1.key_int1), sum(distinct Q1.key_int1) from (select * from t1 cluster by key_int1) Q1 group by Q1.key_int1
> resulted..
> FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
> from hadoop logs..
> Caused by: java.lang.RuntimeException: cannot find field key from []
> 	at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:321)
> 	at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:119)
> 	at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:82)
> 	at org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:198)
> 	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
> 	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433)
> ........
> I think the problem is caused by ReduceSinkDeDuplication, removing RS which was providing rs.key for GBY operation. If child of child RS is a GBY, we should bypass the optimization.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2329) Not using map aggregation, fails to execute group-by after cluster-by with same key

Posted by "jiraposter@reviews.apache.org (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13164233#comment-13164233 ] 

jiraposter@reviews.apache.org commented on HIVE-2329:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1313/
-----------------------------------------------------------

(Updated 2011-12-07 08:15:06.858543)


Review request for hive, John Sichi and Carl Steinbach.


Changes
-------

rebased to trunk


Summary
-------

If map aggregation is set to false, DISTRIBUTED BY followed by GROUP BY with same key fails in runtime. ReduceSinkDeDuplication optimization should be avoid if child of child RS is GBY. 


This addresses bug HIVE-2329.
    https://issues.apache.org/jira/browse/HIVE-2329


Diffs (updated)
-----

  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java e91b4d5 
  ql/src/test/queries/clientpositive/reduce_deduplicate_exclude_gby.q PRE-CREATION 
  ql/src/test/results/clientpositive/reduce_deduplicate_exclude_gby.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/1313/diff


Testing (updated)
-------

test added : reduce_deduplicate_exclude_gby.q


Thanks,

Navis


                
> Not using map aggregation, fails to execute group-by after cluster-by with same key
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-2329
>                 URL: https://issues.apache.org/jira/browse/HIVE-2329
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Navis
>            Assignee: Navis
>            Priority: Minor
>             Fix For: 0.9.0
>
>         Attachments: HIVE-2329.1.patch.txt
>
>
> hive.map.aggr=false
> select Q1.key_int1, sum(Q1.key_int1), sum(distinct Q1.key_int1) from (select * from t1 cluster by key_int1) Q1 group by Q1.key_int1
> resulted..
> FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
> from hadoop logs..
> Caused by: java.lang.RuntimeException: cannot find field key from []
> 	at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:321)
> 	at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:119)
> 	at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:82)
> 	at org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:198)
> 	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
> 	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433)
> ........
> I think the problem is caused by ReduceSinkDeDuplication, removing RS which was providing rs.key for GBY operation. If child of child RS is a GBY, we should bypass the optimization.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2329) Not using map aggregation, fails to execute group-by after cluster-by with same key

Posted by "Navis (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Navis updated HIVE-2329:
------------------------

    Attachment:     (was: HIVE-2329.1.patch)
    
> Not using map aggregation, fails to execute group-by after cluster-by with same key
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-2329
>                 URL: https://issues.apache.org/jira/browse/HIVE-2329
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.8.0
>            Reporter: Navis
>            Assignee: Navis
>            Priority: Minor
>         Attachments: HIVE-2329.1.patch.txt
>
>
> hive.map.aggr=false
> select Q1.key_int1, sum(Q1.key_int1), sum(distinct Q1.key_int1) from (select * from t1 cluster by key_int1) Q1 group by Q1.key_int1
> resulted..
> FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
> from hadoop logs..
> Caused by: java.lang.RuntimeException: cannot find field key from []
> 	at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:321)
> 	at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:119)
> 	at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:82)
> 	at org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:198)
> 	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
> 	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433)
> ........
> I think the problem is caused by ReduceSinkDeDuplication, removing RS which was providing rs.key for GBY operation. If child of child RS is a GBY, we should bypass the optimization.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2329) Not using map aggregation, fails to execute group-by after cluster-by with same key

Posted by "Namit Jain (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13165871#comment-13165871 ] 

Namit Jain commented on HIVE-2329:
----------------------------------

+1
                
> Not using map aggregation, fails to execute group-by after cluster-by with same key
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-2329
>                 URL: https://issues.apache.org/jira/browse/HIVE-2329
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Navis
>            Assignee: Navis
>            Priority: Minor
>             Fix For: 0.9.0
>
>         Attachments: HIVE-2329.1.patch.txt, HIVE-2329.D657.1.patch
>
>
> hive.map.aggr=false
> select Q1.key_int1, sum(Q1.key_int1), sum(distinct Q1.key_int1) from (select * from t1 cluster by key_int1) Q1 group by Q1.key_int1
> resulted..
> FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
> from hadoop logs..
> Caused by: java.lang.RuntimeException: cannot find field key from []
> 	at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:321)
> 	at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:119)
> 	at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:82)
> 	at org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:198)
> 	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
> 	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433)
> ........
> I think the problem is caused by ReduceSinkDeDuplication, removing RS which was providing rs.key for GBY operation. If child of child RS is a GBY, we should bypass the optimization.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2329) Not using map aggregation, fails to execute group-by after cluster-by with same key

Posted by "Navis (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Navis updated HIVE-2329:
------------------------

    Attachment:     (was: HIVE-2329.0.8.0.patch)
    
> Not using map aggregation, fails to execute group-by after cluster-by with same key
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-2329
>                 URL: https://issues.apache.org/jira/browse/HIVE-2329
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.8.0
>            Reporter: Navis
>            Assignee: Navis
>            Priority: Minor
>         Attachments: HIVE-2329.1.patch.txt
>
>
> hive.map.aggr=false
> select Q1.key_int1, sum(Q1.key_int1), sum(distinct Q1.key_int1) from (select * from t1 cluster by key_int1) Q1 group by Q1.key_int1
> resulted..
> FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
> from hadoop logs..
> Caused by: java.lang.RuntimeException: cannot find field key from []
> 	at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:321)
> 	at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:119)
> 	at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:82)
> 	at org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:198)
> 	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
> 	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:433)
> ........
> I think the problem is caused by ReduceSinkDeDuplication, removing RS which was providing rs.key for GBY operation. If child of child RS is a GBY, we should bypass the optimization.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira