You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Chao Sun (JIRA)" <ji...@apache.org> on 2016/01/25 18:55:40 UTC
[jira] [Commented] (HIVE-12629) hive.auto.convert.join=true makes lateral view join sql failed on spark engine on yarn

    [ https://issues.apache.org/jira/browse/HIVE-12629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15115656#comment-15115656 ] 

Chao Sun commented on HIVE-12629:
---------------------------------

This query actually executes successfully in Hive 2.1.0-SNAPSHOT. This is because the check for parent/children connection in SparkPlan is disabled in another commit. However, the duplicated MapWork still present in the query plan, which might cause problem in some other situations. This patch fixes it.

> hive.auto.convert.join=true makes lateral view join sql failed on spark engine on yarn
> --------------------------------------------------------------------------------------
>
>                 Key: HIVE-12629
>                 URL: https://issues.apache.org/jira/browse/HIVE-12629
>             Project: Hive
>          Issue Type: Bug
>          Components: Spark
>    Affects Versions: 1.2.1
>            Reporter: 吴子美
>            Assignee: Chao Sun
>         Attachments: HIVE-12629.1-spark.patch
>
>
> I am using hive1.2 on spark on yarn. 
> I found 
> select count(1) from 
> (select  user_id from xxx group by user_id ) a join
> (select  user_id from yyy lateral view json_tuple(u, 'h') v1 as h) b
> on a.user_id=b.user_id ;
> failed in hive on spark on yarn, but OK in hive on MR.
> I tried the following sql on spark. It was OK.
> select count(1) from 
> (select  user_id from xxx group by user_id ) a left join
> (select  user_id from yyy lateral view json_tuple(u, 'h') v1 as h) b
> on a.user_id=b.user_id ;
> When I turn hive.auto.convert.join from true to false. Everything goes OK.
> The error message in hive.log was :
> {code}
> 2015-12-09 21:10:17,190 INFO  [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(569)) - 15/12/09 21:10:17 INFO log.PerfLogger: <PERFLOG method=serializePlan from=org.apache.hadoop.hive.ql.exec.Utilities>
> 2015-12-09 21:10:17,190 INFO  [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(569)) - 15/12/09 21:10:17 INFO exec.Utilities: Serializing ReduceWork via kryo
> 2015-12-09 21:10:17,214 INFO  [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(569)) - 15/12/09 21:10:17 INFO log.PerfLogger: </PERFLOG method=serializePlan start=1449666617190 end=1449666617214 duration=24 from=org.apache.hadoop.hive.ql.exec.Utilities>
> 2015-12-09 21:10:17,261 INFO  [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(569)) - 15/12/09 21:10:17 INFO client.RemoteDriver: Failed to run job 8fed1ca8-834f-497f-b189-eab343440a9f
> 2015-12-09 21:10:17,261 INFO  [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(569)) - java.lang.IllegalStateException: Connection already exists
> 2015-12-09 21:10:17,261 INFO  [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(569)) - 	at org.apache.hadoop.hive.ql.exec.spark.SparkPlan.connect(SparkPlan.java:142)
> 2015-12-09 21:10:17,261 INFO  [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(569)) - 	at org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generateParentTran(SparkPlanGenerator.java:142)
> 2015-12-09 21:10:17,261 INFO  [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(569)) - 	at org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generate(SparkPlanGenerator.java:106)
> 2015-12-09 21:10:17,261 INFO  [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(569)) - 	at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient$JobStatusJob.call(RemoteHiveSparkClient.java:252)
> 2015-12-09 21:10:17,261 INFO  [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(569)) - 	at org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:366)
> 2015-12-09 21:10:17,261 INFO  [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(569)) - 	at org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:335)
> 2015-12-09 21:10:17,261 INFO  [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(569)) - 	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> 2015-12-09 21:10:17,262 INFO  [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(569)) - 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 2015-12-09 21:10:17,262 INFO  [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(569)) - 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 2015-12-09 21:10:17,262 INFO  [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(569)) - 	at java.lang.Thread.run(Thread.java:745)
> 2015-12-09 21:10:17,266 INFO  [RPC-Handler-3]: client.SparkClientImpl (SparkClientImpl.java:handle(522)) - Received result for 8fed1ca8-834f-497f-b189-eab343440a9f
> 2015-12-09 21:10:18,054 ERROR [HiveServer2-Background-Pool: Thread-43]: status.SparkJobMonitor (SessionState.java:printError(960)) - Status: Failed
> 2015-12-09 21:10:18,055 INFO  [HiveServer2-Background-Pool: Thread-43]: log.PerfLogger (PerfLogger.java:PerfLogEnd(148)) - </PERFLOG method=SparkRunJob start=1449666615051 end=1449666618055 duration=3004 from=org.apache.hadoop.hive.ql.exec.spark.status.SparkJobMonitor>
> 2015-12-09 21:10:18,076 ERROR [HiveServer2-Background-Pool: Thread-43]: ql.Driver (SessionState.java:printError(960)) - FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask
> {code}
> Is it the bug of hive on spark?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)