You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Chao (JIRA)" <ji...@apache.org> on 2014/08/06 22:08:15 UTC
[jira] [Commented] (HIVE-7569) Make sure multi-MR queries work

    [ https://issues.apache.org/jira/browse/HIVE-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14088162#comment-14088162 ] 

Chao commented on HIVE-7569:
----------------------------

(Not sure it's related)
Sometimes when I run a multi-insertion job in Spark, I got exception like following.
If I ran the SAME query in MR mode AND THEN in Spark mode, the query will succeed and produce correct result.

{{code}}
2014-08-06 12:58:53,168 INFO  [Executor task launch worker-0]: exec.GroupByOperator (Operator.java:initialize(389)) - Initialization Done 35 GBY
2014-08-06 12:58:53,169 ERROR [Executor task launch worker-0]: ExecReducer (ExecReducer.java:reduce(272)) - org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error: Unable to deserialize reduce input key from x1x1x1x98x98x98x98x98x98x98x98x98x98x98x98x98x98x98x98x0x0x255 with properties {columns=_col0, serialization.lib=org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe, serialization.sort.order=+, columns.types=map<string,string>}
        at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:212)
        at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:60)
        at org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:31)
        at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:161)
        at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:161)
        at org.apache.spark.rdd.RDD$$anonfun$12.apply(RDD.scala:559)
        at org.apache.spark.rdd.RDD$$anonfun$12.apply(RDD.scala:559)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
        at org.apache.spark.scheduler.Task.run(Task.scala:51)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)
Caused by: org.apache.hadoop.hive.serde2.SerDeException: java.io.EOFException
        at org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:191)
        at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:210)
        ... 15 more
Caused by: java.io.EOFException
        at org.apache.hadoop.hive.serde2.binarysortable.InputByteBuffer.read(InputByteBuffer.java:54)
        at org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:201)
        at org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:491)
        at org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:187)
        ... 16 more
{{code}}



> Make sure multi-MR queries work
> -------------------------------
>
>                 Key: HIVE-7569
>                 URL: https://issues.apache.org/jira/browse/HIVE-7569
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Xuefu Zhang
>            Assignee: Chao
>
> With the latest dev effort, queries that would involve multiple MR jobs should be supported by spark now, except for sorting, multi-insert, union, and join (map join and smb might just work). However, this hasn't be verified and tested. This task is to ensure this is the case. Please create JIRAs for problems found.



--
This message was sent by Atlassian JIRA
(v6.2#6252)