You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Paul Wu (JIRA)" <ji...@apache.org> on 2017/08/22 22:06:00 UTC

[jira] [Commented] (HIVE-15912) Executor kill task and Failed to get spark memory/core info

    [ https://issues.apache.org/jira/browse/HIVE-15912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16137479#comment-16137479 ] 

Paul Wu commented on HIVE-15912:
--------------------------------

Got the similar issue with Spark 2.2/Hive 2.5.3 (on HortonWork).  The terrible thing is that app got hung without shutdown the context (app didn't exit by itself except by killing it with yarn command. The same app runs fine with 2.1.1 or earlier.

java.lang.InterruptedException
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:998)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
	at scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:202)
	at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218)
	at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:153)
	at org.apache.spark.util.ThreadUtils$.awaitReady(ThreadUtils.scala:222)
	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:621)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2022)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2043)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2062)
	at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:336)
	at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38)
	at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collectFromPlan(Dataset.scala:2853)
	at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2153)
	at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2153)
	at org.apache.spark.sql.Dataset$$anonfun$55.apply(Dataset.scala:2837)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65)
	at org.apache.spark.sql.Dataset.withAction(Dataset.scala:2836)
	at org.apache.spark.sql.Dataset.head(Dataset.scala:2153)
	at org.apache.spark.sql.Dataset.take(Dataset.scala:2366)
	at org.apache.spark.sql.Dataset.showString(Dataset.scala:245)
	at org.apache.spark.sql.Dataset.show(Dataset.scala:644)
	at org.apache.spark.sql.Dataset.show(Dataset.scala:603)
	at org.apache.spark.sql.Dataset.show(Dataset.scala:612)
	at com.att.nrs.spk.MainUtil.getVoiceData(MainUtil.java:222)
	at com.att.nrs.spk.RedialFinderMain.main(RedialFinderMain.java:63)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:635)

*Stderr:*
ERROR ApplicationMaster 91: Exception from Reporter thread.
org.apache.hadoop.yarn.exceptions.ApplicationAttemptNotFoundException: Application attempt appattempt_1502715048011_224038_000002 doesn't exist in ApplicationMasterService cache.
	at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.allocate(ApplicationMasterService.java:445)
	at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.allocate(ApplicationMasterProtocolPBServiceImpl.java:60)
	at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:99)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2313)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2309)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2307)

	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
	at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:101)
	at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:79)
	at sun.reflect.GeneratedMethodAccessor43.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
	at com.sun.proxy.$Proxy18.allocate(Unknown Source)
	at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClientImpl.java:277)
	at org.apache.spark.deploy.yarn.YarnAllocator.allocateResources(YarnAllocator.scala:265)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$1.run(ApplicationMaster.scala:457)

> Executor kill task and Failed to get spark memory/core info
> -----------------------------------------------------------
>
>                 Key: HIVE-15912
>                 URL: https://issues.apache.org/jira/browse/HIVE-15912
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive, Spark
>    Affects Versions: 2.2.0
>         Environment: hadoop2.7.1
> spark2.0.2
> Hive2.2
>            Reporter: KaiXu
>
> Hive on Spark, failed with error:
> Starting Spark Job = 12a8cb8c-ed0d-4049-ae06-8d32d13fe285
> Failed to monitor Job[ 6] with exception 'java.lang.IllegalStateException(RPC channel is closed.)'
> FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask
> Hive's log:
> 2017-02-14T19:03:09,147  INFO [stderr-redir-1] client.SparkClientImpl: 17/02/14 19:03:09 INFO yarn.Client: Application report for application_1486905599813_0403 (state: ACCEPTED)
> 2017-02-14T19:03:10,817  WARN [5bcf13e5-cb54-4cfe-a0d4-9a6556ab48b1 main] spark.SetSparkReducerParallelism: Failed to get spark memory/core info
> java.util.concurrent.TimeoutException
>         at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:49) ~[netty-all-4.0.29.Final.jar:4.0.29.Final]
>         at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.getExecutorCount(RemoteHiveSparkClient.java:155) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.getExecutorCount(RemoteHiveSparkClient.java:165) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.getMemoryAndCores(SparkSessionImpl.java:77) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.process(SetSparkReducerParallelism.java:119) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:158) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.runJoinOptimizations(SparkCompiler.java:291) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeOperatorPlan(SparkCompiler.java:120) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:140) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11085) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:279) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:258) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:510) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1302) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1442) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1222) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1212) ~[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:400) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:430) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:446) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:749) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:715) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:642) ~[hive-cli-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_60]
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_60]
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_60]
>         at java.lang.reflect.Method.invoke(Method.java:497) ~[?:1.8.0_60]
>         at org.apache.hadoop.util.RunJar.run(RunJar.java:221) ~[hadoop-common-2.7.1.jar:?]
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:136) ~[hadoop-common-2.7.1.jar:?]
> 2017-02-14T19:03:10,817  INFO [5bcf13e5-cb54-4cfe-a0d4-9a6556ab48b1 main] spark.SetSparkReducerParallelism: Set parallelism for reduce sink RS[24] to: 1 (calculated)
> yarn log:
> 17/02/14 19:05:36 INFO executor.Executor: Finished task 329.0 in stage 15.0 (TID 3865). 3228 bytes result sent to driver
> 17/02/14 19:05:36 INFO executor.Executor: Finished task 201.0 in stage 15.0 (TID 3737). 3141 bytes result sent to driver
> 17/02/14 19:05:36 INFO executor.Executor: Finished task 9.0 in stage 15.0 (TID 3545). 4027 bytes result sent to driver
> 17/02/14 19:05:36 INFO executor.Executor: Finished task 457.0 in stage 15.0 (TID 3993). 3141 bytes result sent to driver
> 17/02/14 19:05:36 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 4221
> 17/02/14 19:05:36 INFO executor.Executor: Running task 687.0 in stage 15.0 (TID 4221)
> 17/02/14 19:05:36 INFO executor.Executor: Finished task 265.0 in stage 15.0 (TID 3801). 3141 bytes result sent to driver
> 17/02/14 19:05:36 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 4240
> 17/02/14 19:05:36 INFO executor.Executor: Running task 692.0 in stage 15.0 (TID 4240)
> 17/02/14 19:05:36 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 4242
> 17/02/14 19:05:36 INFO executor.Executor: Running task 763.0 in stage 15.0 (TID 4242)
> 17/02/14 19:05:36 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 4243
> 17/02/14 19:05:36 INFO executor.Executor: Running task 836.0 in stage 15.0 (TID 4243)
> 17/02/14 19:05:36 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 4244
> 17/02/14 19:05:36 INFO executor.Executor: Running task 863.0 in stage 15.0 (TID 4244)
> 17/02/14 19:05:36 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 4248
> 17/02/14 19:05:36 INFO executor.Executor: Running task 979.0 in stage 15.0 (TID 4248)
> 17/02/14 19:05:36 INFO executor.CoarseGrainedExecutorBackend: Got assigned task 4280
> 17/02/14 19:05:36 INFO storage.BlockManager: Found block rdd_24_687 locally
> 17/02/14 19:05:36 INFO executor.Executor: Running task 990.0 in stage 15.0 (TID 4280)
> 17/02/14 19:05:36 INFO executor.Executor: Executor is trying to kill task 687.0 in stage 15.0 (TID 4221)
> 17/02/14 19:05:36 INFO executor.Executor: Executor is trying to kill task 73.0 in stage 15.0 (TID 3609)
> 17/02/14 19:05:36 INFO executor.Executor: Executor is trying to kill task 763.0 in stage 15.0 (TID 4242)
> 17/02/14 19:05:36 INFO executor.Executor: Executor is trying to kill task 979.0 in stage 15.0 (TID 4248)
> 17/02/14 19:05:36 INFO executor.Executor: Executor is trying to kill task 990.0 in stage 15.0 (TID 4280)
> 17/02/14 19:05:36 INFO executor.Executor: Executor is trying to kill task 863.0 in stage 15.0 (TID 4244)
> 17/02/14 19:05:36 INFO executor.Executor: Executor is trying to kill task 836.0 in stage 15.0 (TID 4243)
> 17/02/14 19:05:36 INFO executor.Executor: Executor is trying to kill task 692.0 in stage 15.0 (TID 4240)
> 17/02/14 19:05:36 INFO storage.BlockManager: Removing RDD 24
> 17/02/14 19:05:36 INFO executor.Executor: Finished task 73.0 in stage 15.0 (TID 3609). 3214 bytes result sent to driver
> 17/02/14 19:05:36 INFO storage.BlockManager: Found block rdd_24_863 locally
> 17/02/14 19:05:36 INFO storage.BlockManager: Found block rdd_24_692 locally
> 17/02/14 19:05:36 ERROR executor.Executor: Exception in task 990.0 in stage 15.0 (TID 4280)
> org.apache.spark.TaskKilledException
> 	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:264)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at java.lang.Thread.run(Thread.java:745)
> 17/02/14 19:05:36 INFO executor.Executor: Executor killed task 687.0 in stage 15.0 (TID 4221)
> 17/02/14 19:05:36 INFO storage.BlockManager: Found block rdd_24_836 locally
> 17/02/14 19:05:36 INFO storage.BlockManager: Found block rdd_24_979 locally
> 17/02/14 19:05:36 INFO storage.BlockManager: Found block rdd_24_763 locally
> 17/02/14 19:05:36 INFO executor.Executor: Executor killed task 863.0 in stage 15.0 (TID 4244)
> 17/02/14 19:05:36 INFO executor.Executor: Executor killed task 979.0 in stage 15.0 (TID 4248)
> 17/02/14 19:05:36 INFO executor.Executor: Executor killed task 836.0 in stage 15.0 (TID 4243)
> 17/02/14 19:05:36 INFO executor.Executor: Executor killed task 692.0 in stage 15.0 (TID 4240)
> 17/02/14 19:05:36 INFO executor.Executor: Executor killed task 763.0 in stage 15.0 (TID 4242)
> 17/02/14 19:05:36 INFO executor.CoarseGrainedExecutorBackend: Driver commanded a shutdown
> 17/02/14 19:05:36 INFO executor.CoarseGrainedExecutorBackend: Driver from 192.168.1.1:35981 disconnected during shutdown
> 17/02/14 19:05:36 INFO executor.CoarseGrainedExecutorBackend: Driver from 192.168.1.1:35981 disconnected during shutdown
> 17/02/14 19:05:36 INFO memory.MemoryStore: MemoryStore cleared
> 17/02/14 19:05:36 INFO storage.BlockManager: BlockManager stopped
> 17/02/14 19:05:36 INFO util.ShutdownHookManager: Shutdown hook called



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)