You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Xuefu Zhang (JIRA)" <ji...@apache.org> on 2013/12/12 17:15:07 UTC
[jira] [Commented] (HIVE-6021) Problem in GroupByOperator for
handling distinct aggrgations
[ https://issues.apache.org/jira/browse/HIVE-6021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13846400#comment-13846400 ]
Xuefu Zhang commented on HIVE-6021:
-----------------------------------
[~sunrui] Thanks for your contribution. Do you mind providing the following?
1. A test case similar to what you constructed to produce the problem?
2. A review board entry.
> Problem in GroupByOperator for handling distinct aggrgations
> ------------------------------------------------------------
>
> Key: HIVE-6021
> URL: https://issues.apache.org/jira/browse/HIVE-6021
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.12.0
> Reporter: Sun Rui
> Assignee: Sun Rui
> Attachments: HIVE-6021.1.patch
>
>
> Use the following test case with HIVE 0.12:
> {code:sql}
> create table src(key int, value string);
> load data local inpath 'src/data/files/kv1.txt' overwrite into table src;
> set hive.map.aggr=false;
> select count(key),count(distinct value) from src group by key;
> {code}
> We will get an ArrayIndexOutOfBoundsException from GroupByOperator:
> {code}
> java.lang.RuntimeException: Error in configuring object
> at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
> at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
> at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
> at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:485)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
> Caused by: java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
> ... 5 more
> Caused by: java.lang.RuntimeException: Reduce operator initialization failed
> at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:159)
> ... 10 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
> at org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:281)
> at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:377)
> at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:152)
> ... 10 more
> {code}
> explain select count(key),count(distinct value) from src group by key;
> {code}
> STAGE PLANS:
> Stage: Stage-1
> Map Reduce
> Alias -> Map Operator Tree:
> src
> TableScan
> alias: src
> Select Operator
> expressions:
> expr: key
> type: int
> expr: value
> type: string
> outputColumnNames: key, value
> Reduce Output Operator
> key expressions:
> expr: key
> type: int
> expr: value
> type: string
> sort order: ++
> Map-reduce partition columns:
> expr: key
> type: int
> tag: -1
> Reduce Operator Tree:
> Group By Operator
> aggregations:
> expr: count(KEY._col0) // The parameter causes this problem
> ^^^^^^^^^^^
> expr: count(DISTINCT KEY._col1:0._col0)
> bucketGroup: false
> keys:
> expr: KEY._col0
> type: int
> mode: complete
> outputColumnNames: _col0, _col1, _col2
> Select Operator
> expressions:
> expr: _col1
> type: bigint
> expr: _col2
> type: bigint
> outputColumnNames: _col0, _col1
> File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Stage: Stage-0
> Fetch Operator
> limit: -1
> {code}
> The root cause is within GroupByOperator.initializeOp(). The method forgets to handle the case:
> For a query has distinct aggregations, there is an aggregation function has a parameter which is a groupby key column but not distinct key column.
> {code}
> if (unionExprEval != null) {
> String[] names = parameters.get(j).getExprString().split("\\.");
> // parameters of the form : KEY.colx:t.coly
> if (Utilities.ReduceField.KEY.name().equals(names[0])) {
> String name = names[names.length - 2];
> int tag = Integer.parseInt(name.split("\\:")[1]);
>
> ...
>
> } else {
> // will be VALUE._COLx
> if (!nonDistinctAggrs.contains(i)) {
> nonDistinctAggrs.add(i);
> }
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.1.4#6159)