You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "KaiXu (JIRA)" <ji...@apache.org> on 2017/02/09 10:51:41 UTC

[jira] [Updated] (HIVE-15859) Hive client side shows Spark Driver disconnected while Spark Driver side could not get RPC header

     [ https://issues.apache.org/jira/browse/HIVE-15859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

KaiXu updated HIVE-15859:
-------------------------
    Description: 
Hive on Spark, failed with error:
2017-02-08 09:50:59,331 Stage-2_0: 1039(+2)/1041 Stage-3_0: 796(+456)/1520 Stage-4_0: 0/2021 Stage-5_0: 0/1009 Stage-6_0: 0/1
2017-02-08 09:51:00,335 Stage-2_0: 1040(+1)/1041 Stage-3_0: 914(+398)/1520 Stage-4_0: 0/2021 Stage-5_0: 0/1009 Stage-6_0: 0/1
2017-02-08 09:51:01,338 Stage-2_0: 1041/1041 Finished Stage-3_0: 961(+383)/1520 Stage-4_0: 0/2021 Stage-5_0: 0/1009 Stage-6_0: 0/1
Failed to monitor Job[ 2] with exception 'java.lang.IllegalStateException(RPC channel is closed.)'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask

application log shows the driver commanded a shutdown with some unknown reason, but hive's log shows Driver could not get RPC header( Expected RPC header, got org.apache.hive.spark.client.rpc.Rpc$NullMessage instead).


17/02/08 09:51:04 INFO exec.Utilities: PLAN PATH = hdfs://hsx-node1:8020/tmp/hive/root/b723c85d-2a7b-469e-bab1-9c165b25e656/hive_2017-02-08_09-49-37_890_6267025825539539056-1/-mr-10006/71a9dacb-a463-40ef-9e86-78d3b8e3738d/map.xml
17/02/08 09:51:04 INFO executor.Executor: Executor killed task 1169.0 in stage 3.0 (TID 2519)
17/02/08 09:51:04 INFO executor.CoarseGrainedExecutorBackend: Driver commanded a shutdown
17/02/08 09:51:04 INFO storage.MemoryStore: MemoryStore cleared
17/02/08 09:51:04 INFO storage.BlockManager: BlockManager stopped
17/02/08 09:51:04 INFO exec.Utilities: PLAN PATH = hdfs://hsx-node1:8020/tmp/hive/root/b723c85d-2a7b-469e-bab1-9c165b25e656/hive_2017-02-08_09-49-37_890_6267025825539539056-1/-mr-10006/71a9dacb-a463-40ef-9e86-78d3b8e3738d/map.xml
17/02/08 09:51:04 WARN executor.CoarseGrainedExecutorBackend: An unknown (hsx-node1:42777) driver disconnected.
17/02/08 09:51:04 ERROR executor.CoarseGrainedExecutorBackend: Driver 192.168.1.1:42777 disassociated! Shutting down.
17/02/08 09:51:04 INFO executor.Executor: Executor killed task 1105.0 in stage 3.0 (TID 2511)
17/02/08 09:51:04 INFO util.ShutdownHookManager: Shutdown hook called
17/02/08 09:51:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /mnt/disk6/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-71da1dfc-99bd-4687-bc2f-33452db8de3d
17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /mnt/disk2/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-7f134d81-e77e-4b92-bd99-0a51d0962c14
17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /mnt/disk5/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-77a90d63-fb05-4bc6-8d5e-1562cc502e6c
17/02/08 09:51:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /mnt/disk4/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-91f8b91a-114d-4340-8560-d3cd085c1cd4
17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /mnt/disk1/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-a3c24f9e-8609-48f0-9d37-0de7ae06682a
17/02/08 09:51:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /mnt/disk7/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-f6120a43-2158-4780-927c-c5786b78f53e
17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /mnt/disk3/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-e17931ad-9e8a-45da-86f8-9a0fdca0fad1
17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /mnt/disk8/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-4de34175-f871-4c28-8ec0-d2fc0020c5c3
17/02/08 09:51:04 INFO executor.Executor: Executor killed task 1137.0 in stage 3.0 (TID 2515)
17/02/08 09:51:04 INFO executor.Executor: Executor killed task 897.0 in stage 3.0 (TID 2417)
17/02/08 09:51:04 INFO executor.Executor: Executor killed task 1225.0 in stage 3.0 (TID 2526)
17/02/08 09:51:04 INFO executor.Executor: Executor killed task 905.0 in stage 3.0 (TID 2423)


in hive's log,
2017-02-08T09:51:04,327 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 INFO scheduler.TaskSetManager: Finished task 971.0 in stage 3.0 (TID 2218) in 5948 ms on hsx-node8 (1338/1520)
2017-02-08T09:51:04,346 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 INFO rpc.RpcDispatcher: [DriverProtocol] Closing channel due to exception in pipeline (org.apache.hive.spark.client.RemoteDriver$DriverProtocol.handle(io.netty.channel.ChannelHandlerContext, org.apache.hive.spark.client.rpc.Rpc$MessageHeader)).
2017-02-08T09:51:04,346 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 WARN rpc.RpcDispatcher: [DriverProtocol] Expected RPC header, got org.apache.hive.spark.client.rpc.Rpc$NullMessage instead.
2017-02-08T09:51:04,347 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 INFO rpc.RpcDispatcher: [DriverProtocol] Closing channel due to exception in pipeline (null).
2017-02-08T09:51:04,347 INFO [RPC-Handler-3] rpc.RpcDispatcher: [ClientProtocol] Closing channel due to exception in pipeline (Connection reset by peer).
2017-02-08T09:51:04,347 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 ERROR scheduler.LiveListenerBus: Listener ClientListener threw an exception
2017-02-08T09:51:04,347 INFO [stderr-redir-1] client.SparkClientImpl: java.lang.IllegalStateException: RPC channel is closed.
2017-02-08T09:51:04,347 INFO [stderr-redir-1] client.SparkClientImpl: at com.google.common.base.Preconditions.checkState(Preconditions.java:149)
2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.rpc.Rpc.call(Rpc.java:276)
2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.rpc.Rpc.call(Rpc.java:259)
2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.RemoteDriver$DriverProtocol.sendMetrics(RemoteDriver.java:270)
2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.RemoteDriver$ClientListener.onTaskEnd(RemoteDriver.java:490)
2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:42)
2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:55)
2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:37)
2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(AsynchronousListenerBus.scala:80)
2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65)
2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65)
2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:64)
2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1181)
2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1.run(AsynchronousListenerBus.scala:63)
2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 WARN rpc.Rpc: Failed to send RPC, closing connection.
2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: java.nio.channels.ClosedChannelException
2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 WARN client.RemoteDriver: Shutting down driver because RPC channel was closed.
2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 INFO client.RemoteDriver: Shutting down remote driver.
2017-02-08T09:51:04,349 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 ERROR scheduler.LiveListenerBus: Listener ClientListener threw an exception
2017-02-08T09:51:04,349 INFO [stderr-redir-1] client.SparkClientImpl: java.lang.IllegalStateException: RPC channel is closed.
2017-02-08T09:51:04,349 INFO [stderr-redir-1] client.SparkClientImpl: at com.google.common.base.Preconditions.checkState(Preconditions.java:149)
2017-02-08T09:51:04,349 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.rpc.Rpc.call(Rpc.java:276)
2017-02-08T09:51:04,349 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.rpc.Rpc.call(Rpc.java:259)
2017-02-08T09:51:04,349 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.RemoteDriver$DriverProtocol.sendMetrics(RemoteDriver.java:270)
2017-02-08T09:51:04,349 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.RemoteDriver$ClientListener.onTaskEnd(RemoteDriver.java:490)
2017-02-08T09:51:04,349 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:42)
2017-02-08T09:51:04,349 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:55)
2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:37)
2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(AsynchronousListenerBus.scala:80)
2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65)
2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65)
2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:64)
2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1181)
2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1.run(AsynchronousListenerBus.scala:63)
2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 INFO scheduler.DAGScheduler: Asked to cancel job 2
2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 ERROR scheduler.LiveListenerBus: Listener ClientListener threw an exception
2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: java.lang.InterruptedException
2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at java.lang.Object.wait(Native Method)
2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at java.lang.Object.wait(Object.java:502)
2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.scheduler.JobWaiter.awaitResult(JobWaiter.scala:73)
2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.SimpleFutureAction.org$apache$spark$SimpleFutureAction$$awaitResult(FutureAction.scala:165)
2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.SimpleFutureAction.ready(FutureAction.scala:120)
2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.SimpleFutureAction.ready(FutureAction.scala:108)
2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at scala.concurrent.Await$$anonfun$ready$1.apply(package.scala:86)
2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at scala.concurrent.Await$$anonfun$ready$1.apply(package.scala:86)
2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at scala.concurrent.Await$.ready(package.scala:86)
2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.JavaFutureActionWrapper.getImpl(FutureAction.scala:303)
2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.JavaFutureActionWrapper.get(FutureAction.scala:316)
2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:362)
2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:323)
2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at java.util.concurrent.FutureTask.run(FutureTask.java:266)
2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at java.lang.Thread.run(Thread.java:745)
2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 ERROR scheduler.LiveListenerBus: Listener ClientListener threw an exception
2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: java.lang.IllegalStateException: RPC channel is closed.
2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at com.google.common.base.Preconditions.checkState(Preconditions.java:149)
2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.rpc.Rpc.call(Rpc.java:276)
2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.rpc.Rpc.call(Rpc.java:259)
2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.RemoteDriver$DriverProtocol.sendMetrics(RemoteDriver.java:270)
2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.RemoteDriver$ClientListener.onTaskEnd(RemoteDriver.java:490)
2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:42)
2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:55)
2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:37)
2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(AsynchronousListenerBus.scala:80)
2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65)
2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65)
2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:64)
2017-02-08T09:51:04,654 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-65f40590-d87f-4701-b374-6b3b2a11538c
2017-02-08T09:52:04,346 WARN [b723c85d-2a7b-469e-bab1-9c165b25e656 main] impl.RemoteSparkJobStatus: Error getting stage info
java.util.concurrent.TimeoutException
at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:49) ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
at org.apache.hadoop.hive.ql.exec.spark.status.impl.RemoteSparkJobStatus.getSparkStageInfo(RemoteSparkJobStatus.java:161) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.exec.spark.status.impl.RemoteSparkJobStatus.getSparkStageProgress(RemoteSparkJobStatus.java:96) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.exec.spark.status.RemoteSparkJobMonitor.startMonitor(RemoteSparkJobMonitor.java:82) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.exec.spark.status.impl.RemoteSparkJobRef.monitorJob(RemoteSparkJobRef.java:60) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:101) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1997) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1688) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1419) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1143) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1131) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:400) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:430) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:446) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:749) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:715) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:642) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_60]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_60]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_60]
at java.lang.reflect.Method.invoke(Method.java:497) ~[?:1.8.0_60]
at org.apache.hadoop.util.RunJar.run(RunJar.java:221) ~[spark-assembly-1.6.2-hadoop2.6.0.jar:1.6.2]
at org.apache.hadoop.util.RunJar.main(RunJar.java:136) ~[spark-assembly-1.6.2-hadoop2.6.0.jar:1.6.2]
2017-02-08T09:52:04,346 ERROR [b723c85d-2a7b-469e-bab1-9c165b25e656 main] status.SparkJobMonitor: Failed to monitor Job[ 2] with exception 'java.lang.IllegalStateException(RPC channel is closed.)'
java.lang.IllegalStateException: RPC channel is closed.
at com.google.common.base.Preconditions.checkState(Preconditions.java:149) ~[guava-14.0.1.jar:?]
at org.apache.hive.spark.client.rpc.Rpc.call(Rpc.java:276) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
also in container's log, I find Driver still request for executors:
17/02/08 09:51:00 INFO yarn.YarnAllocator: Driver requested a total number of 77 executor(s).
17/02/08 09:51:00 INFO yarn.YarnAllocator: Canceling requests for 2 executor containers
17/02/08 09:51:00 INFO yarn.YarnAllocator: Driver requested a total number of 76 executor(s).
17/02/08 09:51:00 INFO yarn.YarnAllocator: Canceling requests for 1 executor containers
17/02/08 09:51:00 INFO yarn.YarnAllocator: Driver requested a total number of 75 executor(s).
17/02/08 09:51:00 INFO yarn.YarnAllocator: Canceling requests for 1 executor containers
17/02/08 09:51:00 INFO yarn.YarnAllocator: Driver requested a total number of 74 executor(s).
17/02/08 09:51:00 INFO yarn.YarnAllocator: Canceling requests for 1 executor containers
17/02/08 09:51:00 INFO yarn.YarnAllocator: Driver requested a total number of 73 executor(s).
17/02/08 09:51:00 INFO yarn.YarnAllocator: Canceling requests for 1 executor containers
17/02/08 09:51:00 INFO yarn.YarnAllocator: Driver requested a total number of 71 executor(s).
17/02/08 09:51:00 INFO yarn.YarnAllocator: Canceling requests for 2 executor containers
17/02/08 09:51:00 INFO yarn.YarnAllocator: Driver requested a total number of 70 executor(s).
17/02/08 09:51:00 INFO yarn.YarnAllocator: Canceling requests for 1 executor containers
17/02/08 09:51:04 INFO yarn.YarnAllocator: Driver requested a total number of 50 executor(s).
17/02/08 09:51:04 INFO yarn.YarnAllocator: Canceling requests for 0 executor containers
17/02/08 09:51:04 WARN yarn.YarnAllocator: Expected to find pending requests, but found none.
17/02/08 09:51:04 INFO yarn.YarnAllocator: Driver requested a total number of 0 executor(s).
17/02/08 09:51:04 INFO yarn.YarnAllocator: Canceling requests for 0 executor containers
17/02/08 09:51:04 WARN yarn.YarnAllocator: Expected to find pending requests, but found none.
17/02/08 09:51:04 INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated or disconnected! Shutting down. 192.168.1.1:42777
17/02/08 09:51:04 INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated or disconnected! Shutting down. hsx-node1:42777
17/02/08 09:51:04 INFO yarn.ApplicationMaster: Final app status: SUCCEEDED, exitCode: 0
17/02/08 09:51:04 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with SUCCEEDED
17/02/08 09:51:04 INFO impl.AMRMClientImpl: Waiting for application to be successfully unregistered.
17/02/08 09:51:04 INFO yarn.ApplicationMaster: Deleting staging directory .sparkStaging/application_1486453422616_0150
17/02/08 09:51:04 INFO util.ShutdownHookManager: Shutdown hook called


found only one ERROR in yarn application log:

17/02/08 09:51:00 INFO executor.Executor: Finished task 1492.0 in stage 3.0 (TID 2168). 3294 bytes result sent to driver
17/02/08 09:51:00 INFO executor.Executor: Finished task 556.0 in stage 3.0 (TID 1587). 3312 bytes result sent to driver
17/02/08 09:51:00 INFO executor.Executor: Finished task 1412.0 in stage 3.0 (TID 2136). 3294 bytes result sent to driver
17/02/08 09:51:00 INFO executor.Executor: Finished task 1236.0 in stage 3.0 (TID 2007). 3294 bytes result sent to driver
17/02/08 09:51:04 INFO executor.CoarseGrainedExecutorBackend: Driver commanded a shutdown
17/02/08 09:51:04 INFO storage.MemoryStore: MemoryStore cleared
17/02/08 09:51:04 INFO storage.BlockManager: BlockManager stopped
17/02/08 09:51:04 WARN executor.CoarseGrainedExecutorBackend: An unknown (hsx-node1:42777) driver disconnected.
17/02/08 09:51:04 ERROR executor.CoarseGrainedExecutorBackend: Driver 192.168.1.1:42777 disassociated! Shutting down.
17/02/08 09:51:04 INFO util.ShutdownHookManager: Shutdown hook called
17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /mnt/disk8/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-a8167f0b-f3c3-458f-ad51-8a0f4bcda4f3
17/02/08 09:51:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /mnt/disk1/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-26cba445-66d2-4b78-a428-17881c92f0f6
17/02/08 09:51:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
17/02/08 09:51:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remoting shut down.


this error occurs when several queries run at the same time with large data scale, in fact it would not occur when running the query separately, but it can frequently occur when running together again.

  was:
Hive on Spark, failed with error:
2017-02-08 09:50:59,331 Stage-2_0: 1039(+2)/1041 Stage-3_0: 796(+456)/1520 Stage-4_0: 0/2021 Stage-5_0: 0/1009 Stage-6_0: 0/1
2017-02-08 09:51:00,335 Stage-2_0: 1040(+1)/1041 Stage-3_0: 914(+398)/1520 Stage-4_0: 0/2021 Stage-5_0: 0/1009 Stage-6_0: 0/1
2017-02-08 09:51:01,338 Stage-2_0: 1041/1041 Finished Stage-3_0: 961(+383)/1520 Stage-4_0: 0/2021 Stage-5_0: 0/1009 Stage-6_0: 0/1
Failed to monitor Job[ 2] with exception 'java.lang.IllegalStateException(RPC channel is closed.)'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask




> Hive client side shows Spark Driver disconnected while Spark Driver side could not get RPC header 
> --------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-15859
>                 URL: https://issues.apache.org/jira/browse/HIVE-15859
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive, Spark
>    Affects Versions: 2.1.1
>         Environment: hadoop2.7.1
> spark1.6.2
> hive2.2
>            Reporter: KaiXu
>
> Hive on Spark, failed with error:
> 2017-02-08 09:50:59,331 Stage-2_0: 1039(+2)/1041 Stage-3_0: 796(+456)/1520 Stage-4_0: 0/2021 Stage-5_0: 0/1009 Stage-6_0: 0/1
> 2017-02-08 09:51:00,335 Stage-2_0: 1040(+1)/1041 Stage-3_0: 914(+398)/1520 Stage-4_0: 0/2021 Stage-5_0: 0/1009 Stage-6_0: 0/1
> 2017-02-08 09:51:01,338 Stage-2_0: 1041/1041 Finished Stage-3_0: 961(+383)/1520 Stage-4_0: 0/2021 Stage-5_0: 0/1009 Stage-6_0: 0/1
> Failed to monitor Job[ 2] with exception 'java.lang.IllegalStateException(RPC channel is closed.)'
> FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask
> application log shows the driver commanded a shutdown with some unknown reason, but hive's log shows Driver could not get RPC header( Expected RPC header, got org.apache.hive.spark.client.rpc.Rpc$NullMessage instead).
> 17/02/08 09:51:04 INFO exec.Utilities: PLAN PATH = hdfs://hsx-node1:8020/tmp/hive/root/b723c85d-2a7b-469e-bab1-9c165b25e656/hive_2017-02-08_09-49-37_890_6267025825539539056-1/-mr-10006/71a9dacb-a463-40ef-9e86-78d3b8e3738d/map.xml
> 17/02/08 09:51:04 INFO executor.Executor: Executor killed task 1169.0 in stage 3.0 (TID 2519)
> 17/02/08 09:51:04 INFO executor.CoarseGrainedExecutorBackend: Driver commanded a shutdown
> 17/02/08 09:51:04 INFO storage.MemoryStore: MemoryStore cleared
> 17/02/08 09:51:04 INFO storage.BlockManager: BlockManager stopped
> 17/02/08 09:51:04 INFO exec.Utilities: PLAN PATH = hdfs://hsx-node1:8020/tmp/hive/root/b723c85d-2a7b-469e-bab1-9c165b25e656/hive_2017-02-08_09-49-37_890_6267025825539539056-1/-mr-10006/71a9dacb-a463-40ef-9e86-78d3b8e3738d/map.xml
> 17/02/08 09:51:04 WARN executor.CoarseGrainedExecutorBackend: An unknown (hsx-node1:42777) driver disconnected.
> 17/02/08 09:51:04 ERROR executor.CoarseGrainedExecutorBackend: Driver 192.168.1.1:42777 disassociated! Shutting down.
> 17/02/08 09:51:04 INFO executor.Executor: Executor killed task 1105.0 in stage 3.0 (TID 2511)
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Shutdown hook called
> 17/02/08 09:51:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /mnt/disk6/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-71da1dfc-99bd-4687-bc2f-33452db8de3d
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /mnt/disk2/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-7f134d81-e77e-4b92-bd99-0a51d0962c14
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /mnt/disk5/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-77a90d63-fb05-4bc6-8d5e-1562cc502e6c
> 17/02/08 09:51:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /mnt/disk4/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-91f8b91a-114d-4340-8560-d3cd085c1cd4
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /mnt/disk1/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-a3c24f9e-8609-48f0-9d37-0de7ae06682a
> 17/02/08 09:51:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /mnt/disk7/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-f6120a43-2158-4780-927c-c5786b78f53e
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /mnt/disk3/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-e17931ad-9e8a-45da-86f8-9a0fdca0fad1
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /mnt/disk8/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-4de34175-f871-4c28-8ec0-d2fc0020c5c3
> 17/02/08 09:51:04 INFO executor.Executor: Executor killed task 1137.0 in stage 3.0 (TID 2515)
> 17/02/08 09:51:04 INFO executor.Executor: Executor killed task 897.0 in stage 3.0 (TID 2417)
> 17/02/08 09:51:04 INFO executor.Executor: Executor killed task 1225.0 in stage 3.0 (TID 2526)
> 17/02/08 09:51:04 INFO executor.Executor: Executor killed task 905.0 in stage 3.0 (TID 2423)
> in hive's log,
> 2017-02-08T09:51:04,327 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 INFO scheduler.TaskSetManager: Finished task 971.0 in stage 3.0 (TID 2218) in 5948 ms on hsx-node8 (1338/1520)
> 2017-02-08T09:51:04,346 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 INFO rpc.RpcDispatcher: [DriverProtocol] Closing channel due to exception in pipeline (org.apache.hive.spark.client.RemoteDriver$DriverProtocol.handle(io.netty.channel.ChannelHandlerContext, org.apache.hive.spark.client.rpc.Rpc$MessageHeader)).
> 2017-02-08T09:51:04,346 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 WARN rpc.RpcDispatcher: [DriverProtocol] Expected RPC header, got org.apache.hive.spark.client.rpc.Rpc$NullMessage instead.
> 2017-02-08T09:51:04,347 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 INFO rpc.RpcDispatcher: [DriverProtocol] Closing channel due to exception in pipeline (null).
> 2017-02-08T09:51:04,347 INFO [RPC-Handler-3] rpc.RpcDispatcher: [ClientProtocol] Closing channel due to exception in pipeline (Connection reset by peer).
> 2017-02-08T09:51:04,347 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 ERROR scheduler.LiveListenerBus: Listener ClientListener threw an exception
> 2017-02-08T09:51:04,347 INFO [stderr-redir-1] client.SparkClientImpl: java.lang.IllegalStateException: RPC channel is closed.
> 2017-02-08T09:51:04,347 INFO [stderr-redir-1] client.SparkClientImpl: at com.google.common.base.Preconditions.checkState(Preconditions.java:149)
> 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.rpc.Rpc.call(Rpc.java:276)
> 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.rpc.Rpc.call(Rpc.java:259)
> 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.RemoteDriver$DriverProtocol.sendMetrics(RemoteDriver.java:270)
> 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.RemoteDriver$ClientListener.onTaskEnd(RemoteDriver.java:490)
> 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:42)
> 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
> 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
> 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:55)
> 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:37)
> 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(AsynchronousListenerBus.scala:80)
> 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65)
> 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65)
> 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
> 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:64)
> 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1181)
> 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1.run(AsynchronousListenerBus.scala:63)
> 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 WARN rpc.Rpc: Failed to send RPC, closing connection.
> 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: java.nio.channels.ClosedChannelException
> 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 WARN client.RemoteDriver: Shutting down driver because RPC channel was closed.
> 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 INFO client.RemoteDriver: Shutting down remote driver.
> 2017-02-08T09:51:04,349 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 ERROR scheduler.LiveListenerBus: Listener ClientListener threw an exception
> 2017-02-08T09:51:04,349 INFO [stderr-redir-1] client.SparkClientImpl: java.lang.IllegalStateException: RPC channel is closed.
> 2017-02-08T09:51:04,349 INFO [stderr-redir-1] client.SparkClientImpl: at com.google.common.base.Preconditions.checkState(Preconditions.java:149)
> 2017-02-08T09:51:04,349 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.rpc.Rpc.call(Rpc.java:276)
> 2017-02-08T09:51:04,349 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.rpc.Rpc.call(Rpc.java:259)
> 2017-02-08T09:51:04,349 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.RemoteDriver$DriverProtocol.sendMetrics(RemoteDriver.java:270)
> 2017-02-08T09:51:04,349 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.RemoteDriver$ClientListener.onTaskEnd(RemoteDriver.java:490)
> 2017-02-08T09:51:04,349 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:42)
> 2017-02-08T09:51:04,349 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
> 2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
> 2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:55)
> 2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:37)
> 2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(AsynchronousListenerBus.scala:80)
> 2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65)
> 2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65)
> 2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
> 2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:64)
> 2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1181)
> 2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1.run(AsynchronousListenerBus.scala:63)
> 2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 INFO scheduler.DAGScheduler: Asked to cancel job 2
> 2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 ERROR scheduler.LiveListenerBus: Listener ClientListener threw an exception
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: java.lang.InterruptedException
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at java.lang.Object.wait(Native Method)
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at java.lang.Object.wait(Object.java:502)
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.scheduler.JobWaiter.awaitResult(JobWaiter.scala:73)
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.SimpleFutureAction.org$apache$spark$SimpleFutureAction$$awaitResult(FutureAction.scala:165)
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.SimpleFutureAction.ready(FutureAction.scala:120)
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.SimpleFutureAction.ready(FutureAction.scala:108)
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at scala.concurrent.Await$$anonfun$ready$1.apply(package.scala:86)
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at scala.concurrent.Await$$anonfun$ready$1.apply(package.scala:86)
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at scala.concurrent.Await$.ready(package.scala:86)
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.JavaFutureActionWrapper.getImpl(FutureAction.scala:303)
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.JavaFutureActionWrapper.get(FutureAction.scala:316)
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:362)
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:323)
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at java.lang.Thread.run(Thread.java:745)
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 ERROR scheduler.LiveListenerBus: Listener ClientListener threw an exception
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: java.lang.IllegalStateException: RPC channel is closed.
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at com.google.common.base.Preconditions.checkState(Preconditions.java:149)
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.rpc.Rpc.call(Rpc.java:276)
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.rpc.Rpc.call(Rpc.java:259)
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.RemoteDriver$DriverProtocol.sendMetrics(RemoteDriver.java:270)
> 2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.RemoteDriver$ClientListener.onTaskEnd(RemoteDriver.java:490)
> 2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:42)
> 2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
> 2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
> 2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:55)
> 2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:37)
> 2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(AsynchronousListenerBus.scala:80)
> 2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65)
> 2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65)
> 2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
> 2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:64)
> 2017-02-08T09:51:04,654 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-65f40590-d87f-4701-b374-6b3b2a11538c
> 2017-02-08T09:52:04,346 WARN [b723c85d-2a7b-469e-bab1-9c165b25e656 main] impl.RemoteSparkJobStatus: Error getting stage info
> java.util.concurrent.TimeoutException
> at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:49) ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
> at org.apache.hadoop.hive.ql.exec.spark.status.impl.RemoteSparkJobStatus.getSparkStageInfo(RemoteSparkJobStatus.java:161) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.exec.spark.status.impl.RemoteSparkJobStatus.getSparkStageProgress(RemoteSparkJobStatus.java:96) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.exec.spark.status.RemoteSparkJobMonitor.startMonitor(RemoteSparkJobMonitor.java:82) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.exec.spark.status.impl.RemoteSparkJobRef.monitorJob(RemoteSparkJobRef.java:60) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:101) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1997) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1688) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1419) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1143) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1131) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:400) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:430) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:446) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:749) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:715) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:642) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_60]
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_60]
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_60]
> at java.lang.reflect.Method.invoke(Method.java:497) ~[?:1.8.0_60]
> at org.apache.hadoop.util.RunJar.run(RunJar.java:221) ~[spark-assembly-1.6.2-hadoop2.6.0.jar:1.6.2]
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136) ~[spark-assembly-1.6.2-hadoop2.6.0.jar:1.6.2]
> 2017-02-08T09:52:04,346 ERROR [b723c85d-2a7b-469e-bab1-9c165b25e656 main] status.SparkJobMonitor: Failed to monitor Job[ 2] with exception 'java.lang.IllegalStateException(RPC channel is closed.)'
> java.lang.IllegalStateException: RPC channel is closed.
> at com.google.common.base.Preconditions.checkState(Preconditions.java:149) ~[guava-14.0.1.jar:?]
> at org.apache.hive.spark.client.rpc.Rpc.call(Rpc.java:276) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> also in container's log, I find Driver still request for executors:
> 17/02/08 09:51:00 INFO yarn.YarnAllocator: Driver requested a total number of 77 executor(s).
> 17/02/08 09:51:00 INFO yarn.YarnAllocator: Canceling requests for 2 executor containers
> 17/02/08 09:51:00 INFO yarn.YarnAllocator: Driver requested a total number of 76 executor(s).
> 17/02/08 09:51:00 INFO yarn.YarnAllocator: Canceling requests for 1 executor containers
> 17/02/08 09:51:00 INFO yarn.YarnAllocator: Driver requested a total number of 75 executor(s).
> 17/02/08 09:51:00 INFO yarn.YarnAllocator: Canceling requests for 1 executor containers
> 17/02/08 09:51:00 INFO yarn.YarnAllocator: Driver requested a total number of 74 executor(s).
> 17/02/08 09:51:00 INFO yarn.YarnAllocator: Canceling requests for 1 executor containers
> 17/02/08 09:51:00 INFO yarn.YarnAllocator: Driver requested a total number of 73 executor(s).
> 17/02/08 09:51:00 INFO yarn.YarnAllocator: Canceling requests for 1 executor containers
> 17/02/08 09:51:00 INFO yarn.YarnAllocator: Driver requested a total number of 71 executor(s).
> 17/02/08 09:51:00 INFO yarn.YarnAllocator: Canceling requests for 2 executor containers
> 17/02/08 09:51:00 INFO yarn.YarnAllocator: Driver requested a total number of 70 executor(s).
> 17/02/08 09:51:00 INFO yarn.YarnAllocator: Canceling requests for 1 executor containers
> 17/02/08 09:51:04 INFO yarn.YarnAllocator: Driver requested a total number of 50 executor(s).
> 17/02/08 09:51:04 INFO yarn.YarnAllocator: Canceling requests for 0 executor containers
> 17/02/08 09:51:04 WARN yarn.YarnAllocator: Expected to find pending requests, but found none.
> 17/02/08 09:51:04 INFO yarn.YarnAllocator: Driver requested a total number of 0 executor(s).
> 17/02/08 09:51:04 INFO yarn.YarnAllocator: Canceling requests for 0 executor containers
> 17/02/08 09:51:04 WARN yarn.YarnAllocator: Expected to find pending requests, but found none.
> 17/02/08 09:51:04 INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated or disconnected! Shutting down. 192.168.1.1:42777
> 17/02/08 09:51:04 INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated or disconnected! Shutting down. hsx-node1:42777
> 17/02/08 09:51:04 INFO yarn.ApplicationMaster: Final app status: SUCCEEDED, exitCode: 0
> 17/02/08 09:51:04 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with SUCCEEDED
> 17/02/08 09:51:04 INFO impl.AMRMClientImpl: Waiting for application to be successfully unregistered.
> 17/02/08 09:51:04 INFO yarn.ApplicationMaster: Deleting staging directory .sparkStaging/application_1486453422616_0150
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Shutdown hook called
> found only one ERROR in yarn application log:
> 17/02/08 09:51:00 INFO executor.Executor: Finished task 1492.0 in stage 3.0 (TID 2168). 3294 bytes result sent to driver
> 17/02/08 09:51:00 INFO executor.Executor: Finished task 556.0 in stage 3.0 (TID 1587). 3312 bytes result sent to driver
> 17/02/08 09:51:00 INFO executor.Executor: Finished task 1412.0 in stage 3.0 (TID 2136). 3294 bytes result sent to driver
> 17/02/08 09:51:00 INFO executor.Executor: Finished task 1236.0 in stage 3.0 (TID 2007). 3294 bytes result sent to driver
> 17/02/08 09:51:04 INFO executor.CoarseGrainedExecutorBackend: Driver commanded a shutdown
> 17/02/08 09:51:04 INFO storage.MemoryStore: MemoryStore cleared
> 17/02/08 09:51:04 INFO storage.BlockManager: BlockManager stopped
> 17/02/08 09:51:04 WARN executor.CoarseGrainedExecutorBackend: An unknown (hsx-node1:42777) driver disconnected.
> 17/02/08 09:51:04 ERROR executor.CoarseGrainedExecutorBackend: Driver 192.168.1.1:42777 disassociated! Shutting down.
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Shutdown hook called
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /mnt/disk8/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-a8167f0b-f3c3-458f-ad51-8a0f4bcda4f3
> 17/02/08 09:51:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /mnt/disk1/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-26cba445-66d2-4b78-a428-17881c92f0f6
> 17/02/08 09:51:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
> 17/02/08 09:51:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
> this error occurs when several queries run at the same time with large data scale, in fact it would not occur when running the query separately, but it can frequently occur when running together again.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)