You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Rui Li (JIRA)" <ji...@apache.org> on 2017/02/14 07:21:41 UTC

[jira] [Assigned] (HIVE-15859) Hive client side shows Spark Driver disconnected while Spark Driver side could not get RPC header

     [ https://issues.apache.org/jira/browse/HIVE-15859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rui Li reassigned HIVE-15859:
-----------------------------

    Assignee: Rui Li

> Hive client side shows Spark Driver disconnected while Spark Driver side could not get RPC header 
> --------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-15859
>                 URL: https://issues.apache.org/jira/browse/HIVE-15859
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive, Spark
>    Affects Versions: 2.2.0
>         Environment: hadoop2.7.1
> spark1.6.2
> hive2.2
>            Reporter: KaiXu
>            Assignee: Rui Li
>
> Hive on Spark, failed with error:
> {noformat}
> 2017-02-08 09:50:59,331 Stage-2_0: 1039(+2)/1041 Stage-3_0: 796(+456)/1520 Stage-4_0: 0/2021 Stage-5_0: 0/1009 Stage-6_0: 0/1
> 2017-02-08 09:51:00,335 Stage-2_0: 1040(+1)/1041 Stage-3_0: 914(+398)/1520 Stage-4_0: 0/2021 Stage-5_0: 0/1009 Stage-6_0: 0/1
> 2017-02-08 09:51:01,338 Stage-2_0: 1041/1041 Finished Stage-3_0: 961(+383)/1520 Stage-4_0: 0/2021 Stage-5_0: 0/1009 Stage-6_0: 0/1
> Failed to monitor Job[ 2] with exception 'java.lang.IllegalStateException(RPC channel is closed.)'
> FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask
> {noformat}
> application log shows the driver commanded a shutdown with some unknown reason, but hive's log shows Driver could not get RPC header( Expected RPC header, got org.apache.hive.spark.client.rpc.Rpc$NullMessage instead).
> {noformat}
> 17/02/08 09:51:04 INFO exec.Utilities: PLAN PATH = hdfs://hsx-node1:8020/tmp/hive/root/b723c85d-2a7b-469e-bab1-9c165b25e656/hive_2017-02-08_09-49-37_890_6267025825539539056-1/-mr-10006/71a9dacb-a463-40ef-9e86-78d3b8e3738d/map.xml
> 17/02/08 09:51:04 INFO executor.Executor: Executor killed task 1169.0 in stage 3.0 (TID 2519)
> 17/02/08 09:51:04 INFO executor.CoarseGrainedExecutorBackend: Driver commanded a shutdown
> 17/02/08 09:51:04 INFO storage.MemoryStore: MemoryStore cleared
> 17/02/08 09:51:04 INFO storage.BlockManager: BlockManager stopped
> 17/02/08 09:51:04 INFO exec.Utilities: PLAN PATH = hdfs://hsx-node1:8020/tmp/hive/root/b723c85d-2a7b-469e-bab1-9c165b25e656/hive_2017-02-08_09-49-37_890_6267025825539539056-1/-mr-10006/71a9dacb-a463-40ef-9e86-78d3b8e3738d/map.xml
> 17/02/08 09:51:04 WARN executor.CoarseGrainedExecutorBackend: An unknown (hsx-node1:42777) driver disconnected.
> 17/02/08 09:51:04 ERROR executor.CoarseGrainedExecutorBackend: Driver 192.168.1.1:42777 disassociated! Shutting down.
> 17/02/08 09:51:04 INFO executor.Executor: Executor killed task 1105.0 in stage 3.0 (TID 2511)
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Shutdown hook called
> 17/02/08 09:51:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /mnt/disk6/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-71da1dfc-99bd-4687-bc2f-33452db8de3d
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /mnt/disk2/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-7f134d81-e77e-4b92-bd99-0a51d0962c14
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /mnt/disk5/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-77a90d63-fb05-4bc6-8d5e-1562cc502e6c
> 17/02/08 09:51:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /mnt/disk4/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-91f8b91a-114d-4340-8560-d3cd085c1cd4
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /mnt/disk1/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-a3c24f9e-8609-48f0-9d37-0de7ae06682a
> 17/02/08 09:51:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /mnt/disk7/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-f6120a43-2158-4780-927c-c5786b78f53e
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /mnt/disk3/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-e17931ad-9e8a-45da-86f8-9a0fdca0fad1
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /mnt/disk8/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-4de34175-f871-4c28-8ec0-d2fc0020c5c3
> 17/02/08 09:51:04 INFO executor.Executor: Executor killed task 1137.0 in stage 3.0 (TID 2515)
> 17/02/08 09:51:04 INFO executor.Executor: Executor killed task 897.0 in stage 3.0 (TID 2417)
> 17/02/08 09:51:04 INFO executor.Executor: Executor killed task 1225.0 in stage 3.0 (TID 2526)
> 17/02/08 09:51:04 INFO executor.Executor: Executor killed task 905.0 in stage 3.0 (TID 2423)
> {noformat}
> in hive's log,
> {noformat}
> 2017-02-08T09:51:04,327 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 INFO scheduler.TaskSetManager: Finished task 971.0 in stage 3.0 (TID 2218) in 5948 ms on hsx-node8 (1338/1520)
> 2017-02-08T09:51:04,346 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 INFO rpc.RpcDispatcher: [DriverProtocol] Closing channel due to exception in pipeline (org.apache.hive.spark.client.RemoteDriver$DriverProtocol.handle(io.netty.channel.ChannelHandlerContext, org.apache.hive.spark.client.rpc.Rpc$MessageHeader)).
> 2017-02-08T09:51:04,346 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 WARN rpc.RpcDispatcher: [DriverProtocol] Expected RPC header, got org.apache.hive.spark.client.rpc.Rpc$NullMessage instead.
> 2017-02-08T09:51:04,347 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 INFO rpc.RpcDispatcher: [DriverProtocol] Closing channel due to exception in pipeline (null).
> 2017-02-08T09:51:04,347 INFO [RPC-Handler-3] rpc.RpcDispatcher: [ClientProtocol] Closing channel due to exception in pipeline (Connection reset by peer).
> 2017-02-08T09:51:04,347 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 ERROR scheduler.LiveListenerBus: Listener ClientListener threw an exception
> 2017-02-08T09:51:04,347 INFO [stderr-redir-1] client.SparkClientImpl: java.lang.IllegalStateException: RPC channel is closed.
> 2017-02-08T09:51:04,347 INFO [stderr-redir-1] client.SparkClientImpl: at com.google.common.base.Preconditions.checkState(Preconditions.java:149)
> 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.rpc.Rpc.call(Rpc.java:276)
> 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.rpc.Rpc.call(Rpc.java:259)
> 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.RemoteDriver$DriverProtocol.sendMetrics(RemoteDriver.java:270)
> 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.RemoteDriver$ClientListener.onTaskEnd(RemoteDriver.java:490)
> 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:42)
> 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
> 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
> 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:55)
> 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:37)
> 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(AsynchronousListenerBus.scala:80)
> 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65)
> 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65)
> 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
> 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:64)
> 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1181)
> 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1.run(AsynchronousListenerBus.scala:63)
> 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 WARN rpc.Rpc: Failed to send RPC, closing connection.
> 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: java.nio.channels.ClosedChannelException
> 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 WARN client.RemoteDriver: Shutting down driver because RPC channel was closed.
> 2017-02-08T09:51:04,348 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 INFO client.RemoteDriver: Shutting down remote driver.
> 2017-02-08T09:51:04,349 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 ERROR scheduler.LiveListenerBus: Listener ClientListener threw an exception
> 2017-02-08T09:51:04,349 INFO [stderr-redir-1] client.SparkClientImpl: java.lang.IllegalStateException: RPC channel is closed.
> 2017-02-08T09:51:04,349 INFO [stderr-redir-1] client.SparkClientImpl: at com.google.common.base.Preconditions.checkState(Preconditions.java:149)
> 2017-02-08T09:51:04,349 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.rpc.Rpc.call(Rpc.java:276)
> 2017-02-08T09:51:04,349 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.rpc.Rpc.call(Rpc.java:259)
> 2017-02-08T09:51:04,349 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.RemoteDriver$DriverProtocol.sendMetrics(RemoteDriver.java:270)
> 2017-02-08T09:51:04,349 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.RemoteDriver$ClientListener.onTaskEnd(RemoteDriver.java:490)
> 2017-02-08T09:51:04,349 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:42)
> 2017-02-08T09:51:04,349 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
> 2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
> 2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:55)
> 2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:37)
> 2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(AsynchronousListenerBus.scala:80)
> 2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65)
> 2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65)
> 2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
> 2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:64)
> 2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1181)
> 2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1.run(AsynchronousListenerBus.scala:63)
> 2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 INFO scheduler.DAGScheduler: Asked to cancel job 2
> 2017-02-08T09:51:04,350 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 ERROR scheduler.LiveListenerBus: Listener ClientListener threw an exception
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: java.lang.InterruptedException
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at java.lang.Object.wait(Native Method)
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at java.lang.Object.wait(Object.java:502)
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.scheduler.JobWaiter.awaitResult(JobWaiter.scala:73)
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.SimpleFutureAction.org$apache$spark$SimpleFutureAction$$awaitResult(FutureAction.scala:165)
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.SimpleFutureAction.ready(FutureAction.scala:120)
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.SimpleFutureAction.ready(FutureAction.scala:108)
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at scala.concurrent.Await$$anonfun$ready$1.apply(package.scala:86)
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at scala.concurrent.Await$$anonfun$ready$1.apply(package.scala:86)
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at scala.concurrent.Await$.ready(package.scala:86)
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.JavaFutureActionWrapper.getImpl(FutureAction.scala:303)
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.JavaFutureActionWrapper.get(FutureAction.scala:316)
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:362)
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:323)
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at java.lang.Thread.run(Thread.java:745)
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 ERROR scheduler.LiveListenerBus: Listener ClientListener threw an exception
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: java.lang.IllegalStateException: RPC channel is closed.
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at com.google.common.base.Preconditions.checkState(Preconditions.java:149)
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.rpc.Rpc.call(Rpc.java:276)
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.rpc.Rpc.call(Rpc.java:259)
> 2017-02-08T09:51:04,351 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.RemoteDriver$DriverProtocol.sendMetrics(RemoteDriver.java:270)
> 2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.hive.spark.client.RemoteDriver$ClientListener.onTaskEnd(RemoteDriver.java:490)
> 2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:42)
> 2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
> 2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
> 2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:55)
> 2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:37)
> 2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(AsynchronousListenerBus.scala:80)
> 2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65)
> 2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65)
> 2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
> 2017-02-08T09:51:04,352 INFO [stderr-redir-1] client.SparkClientImpl: at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:64)
> 2017-02-08T09:51:04,654 INFO [stderr-redir-1] client.SparkClientImpl: 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-65f40590-d87f-4701-b374-6b3b2a11538c
> 2017-02-08T09:52:04,346 WARN [b723c85d-2a7b-469e-bab1-9c165b25e656 main] impl.RemoteSparkJobStatus: Error getting stage info
> java.util.concurrent.TimeoutException
> at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:49) ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
> at org.apache.hadoop.hive.ql.exec.spark.status.impl.RemoteSparkJobStatus.getSparkStageInfo(RemoteSparkJobStatus.java:161) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.exec.spark.status.impl.RemoteSparkJobStatus.getSparkStageProgress(RemoteSparkJobStatus.java:96) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.exec.spark.status.RemoteSparkJobMonitor.startMonitor(RemoteSparkJobMonitor.java:82) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.exec.spark.status.impl.RemoteSparkJobRef.monitorJob(RemoteSparkJobRef.java:60) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:101) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1997) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1688) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1419) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1143) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1131) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:400) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:430) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:446) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:749) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:715) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:642) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_60]
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_60]
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_60]
> at java.lang.reflect.Method.invoke(Method.java:497) ~[?:1.8.0_60]
> at org.apache.hadoop.util.RunJar.run(RunJar.java:221) ~[spark-assembly-1.6.2-hadoop2.6.0.jar:1.6.2]
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136) ~[spark-assembly-1.6.2-hadoop2.6.0.jar:1.6.2]
> 2017-02-08T09:52:04,346 ERROR [b723c85d-2a7b-469e-bab1-9c165b25e656 main] status.SparkJobMonitor: Failed to monitor Job[ 2] with exception 'java.lang.IllegalStateException(RPC channel is closed.)'
> java.lang.IllegalStateException: RPC channel is closed.
> at com.google.common.base.Preconditions.checkState(Preconditions.java:149) ~[guava-14.0.1.jar:?]
> at org.apache.hive.spark.client.rpc.Rpc.call(Rpc.java:276) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
> {noformat}
> also in container's log, I find Driver still request for executors:
> {noformat}
> 17/02/08 09:51:00 INFO yarn.YarnAllocator: Driver requested a total number of 77 executor(s).
> 17/02/08 09:51:00 INFO yarn.YarnAllocator: Canceling requests for 2 executor containers
> 17/02/08 09:51:00 INFO yarn.YarnAllocator: Driver requested a total number of 76 executor(s).
> 17/02/08 09:51:00 INFO yarn.YarnAllocator: Canceling requests for 1 executor containers
> 17/02/08 09:51:00 INFO yarn.YarnAllocator: Driver requested a total number of 75 executor(s).
> 17/02/08 09:51:00 INFO yarn.YarnAllocator: Canceling requests for 1 executor containers
> 17/02/08 09:51:00 INFO yarn.YarnAllocator: Driver requested a total number of 74 executor(s).
> 17/02/08 09:51:00 INFO yarn.YarnAllocator: Canceling requests for 1 executor containers
> 17/02/08 09:51:00 INFO yarn.YarnAllocator: Driver requested a total number of 73 executor(s).
> 17/02/08 09:51:00 INFO yarn.YarnAllocator: Canceling requests for 1 executor containers
> 17/02/08 09:51:00 INFO yarn.YarnAllocator: Driver requested a total number of 71 executor(s).
> 17/02/08 09:51:00 INFO yarn.YarnAllocator: Canceling requests for 2 executor containers
> 17/02/08 09:51:00 INFO yarn.YarnAllocator: Driver requested a total number of 70 executor(s).
> 17/02/08 09:51:00 INFO yarn.YarnAllocator: Canceling requests for 1 executor containers
> 17/02/08 09:51:04 INFO yarn.YarnAllocator: Driver requested a total number of 50 executor(s).
> 17/02/08 09:51:04 INFO yarn.YarnAllocator: Canceling requests for 0 executor containers
> 17/02/08 09:51:04 WARN yarn.YarnAllocator: Expected to find pending requests, but found none.
> 17/02/08 09:51:04 INFO yarn.YarnAllocator: Driver requested a total number of 0 executor(s).
> 17/02/08 09:51:04 INFO yarn.YarnAllocator: Canceling requests for 0 executor containers
> 17/02/08 09:51:04 WARN yarn.YarnAllocator: Expected to find pending requests, but found none.
> 17/02/08 09:51:04 INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated or disconnected! Shutting down. 192.168.1.1:42777
> 17/02/08 09:51:04 INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated or disconnected! Shutting down. hsx-node1:42777
> 17/02/08 09:51:04 INFO yarn.ApplicationMaster: Final app status: SUCCEEDED, exitCode: 0
> 17/02/08 09:51:04 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with SUCCEEDED
> 17/02/08 09:51:04 INFO impl.AMRMClientImpl: Waiting for application to be successfully unregistered.
> 17/02/08 09:51:04 INFO yarn.ApplicationMaster: Deleting staging directory .sparkStaging/application_1486453422616_0150
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Shutdown hook called
> {noformat}
> found only one ERROR in yarn application log:
> {noformat}
> 17/02/08 09:51:00 INFO executor.Executor: Finished task 1492.0 in stage 3.0 (TID 2168). 3294 bytes result sent to driver
> 17/02/08 09:51:00 INFO executor.Executor: Finished task 556.0 in stage 3.0 (TID 1587). 3312 bytes result sent to driver
> 17/02/08 09:51:00 INFO executor.Executor: Finished task 1412.0 in stage 3.0 (TID 2136). 3294 bytes result sent to driver
> 17/02/08 09:51:00 INFO executor.Executor: Finished task 1236.0 in stage 3.0 (TID 2007). 3294 bytes result sent to driver
> 17/02/08 09:51:04 INFO executor.CoarseGrainedExecutorBackend: Driver commanded a shutdown
> 17/02/08 09:51:04 INFO storage.MemoryStore: MemoryStore cleared
> 17/02/08 09:51:04 INFO storage.BlockManager: BlockManager stopped
> 17/02/08 09:51:04 WARN executor.CoarseGrainedExecutorBackend: An unknown (hsx-node1:42777) driver disconnected.
> 17/02/08 09:51:04 ERROR executor.CoarseGrainedExecutorBackend: Driver 192.168.1.1:42777 disassociated! Shutting down.
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Shutdown hook called
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /mnt/disk8/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-a8167f0b-f3c3-458f-ad51-8a0f4bcda4f3
> 17/02/08 09:51:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
> 17/02/08 09:51:04 INFO util.ShutdownHookManager: Deleting directory /mnt/disk1/yarn/nm/usercache/root/appcache/application_1486453422616_0150/spark-26cba445-66d2-4b78-a428-17881c92f0f6
> 17/02/08 09:51:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
> 17/02/08 09:51:04 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
> {noformat}
> this error occurs when several queries run at the same time with large data scale, in fact it would not occur when running the query separately, but it can frequently occur when running together again.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)