You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Chao (JIRA)" <ji...@apache.org> on 2015/02/02 19:53:35 UTC

[jira] [Commented] (HIVE-9517) UNION ALL query failed with ArrayIndexOutOfBoundsException [Spark Branch]

    [ https://issues.apache.org/jira/browse/HIVE-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14301657#comment-14301657 ] 

Chao commented on HIVE-9517:
----------------------------

I tried both MR and Tez. For MR, the query runs fine. For Tez, the query works OK in unit test, but failed when running in CLI/HS2. It gives the same ArrayIndexOutOfBoundsException. The reason that Tez succeeded in unit test is because of the "hive.in.test" flag set in hive-site.xml used for unit test. As a result, the generated operator tree is quite different.

The root of the error is basically that hive is treating int as double type. When running with "hive.in.test" flag set, the operator tree contains a UDFToDouble function, which does the conversion from int to double. But, without the flag, this isn't done and error will happen.

I'm still not sure why Spark will fail even with the "hive.in.test" flag set.

> UNION ALL query failed with ArrayIndexOutOfBoundsException [Spark Branch]
> -------------------------------------------------------------------------
>
>                 Key: HIVE-9517
>                 URL: https://issues.apache.org/jira/browse/HIVE-9517
>             Project: Hive
>          Issue Type: Sub-task
>    Affects Versions: spark-branch
>            Reporter: Chao
>            Assignee: Chao
>
> I was running a query from cbo_gby_empty.q:
> {code}
> select unionsrc.key, unionsrc.value FROM (select 'max' as key, max(c_int) as value from cbo_t3 s1
>   UNION  ALL
>       select 'min' as key,  min(c_int) as value from cbo_t3 s2
>     UNION ALL
>         select 'avg' as key,  avg(c_int) as value from cbo_t3 s3) unionsrc order by unionsrc.key;
> {code}
> and got the following exception:
> {noformat}
> 2015-01-29 15:57:55,948 ERROR [Executor task launch worker-1]: spark.SparkReduceRecordHandler (SparkReduceRecordHandler.java:processRow(299)) - Fatal error: org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row (tag=0) {"key":{"reducesinkkey0":"max"},"value":{"_col0":1.5}}
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while processing row (tag=0) {"key":{"reducesinkkey0":"max"},"value":{"_col0":1.5}}
>   at org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:339)
>   at org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processRow(SparkReduceRecordHandler.java:289)
>   at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:49)
>   at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunctionResultList.processNextRecord(HiveReduceFunctionResultList.java:28)
>   at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:98)
>   at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>   at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
>   at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$2.apply(AsyncRDDActions.scala:115)
>   at org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1390)
>   at org.apache.spark.SparkContext$$anonfun$30.apply(SparkContext.scala:1390)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
>   at org.apache.spark.scheduler.Task.run(Task.scala:56)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
>   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating VALUE._col0
>   at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:82)
>   at org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.processKeyValues(SparkReduceRecordHandler.java:330)
>   ... 17 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 3
>   at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.byteArrayToLong(LazyBinaryUtils.java:84)
>   at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryDouble.init(LazyBinaryDouble.java:43)
>   at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.uncheckedGetField(LazyBinaryStruct.java:264)
>   at org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(LazyBinaryStruct.java:201)
>   at org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(LazyBinaryStructObjectInspector.java:64)
>   at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator._evaluate(ExprNodeColumnEvaluator.java:98)
>   at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
>   at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)
>   at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:77)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)