You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "JoneZhang (JIRA)" <ji...@apache.org> on 2016/02/16 08:00:24 UTC
[jira] [Commented] (HIVE-12649) Hive on Spark will resubmitted application when not enough resouces to launch yarn application master

    [ https://issues.apache.org/jira/browse/HIVE-12649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15148175#comment-15148175 ] 

JoneZhang commented on HIVE-12649:
----------------------------------

The applications submited by estimate reducer number will fast fail when the reources is free after awhile.
The error log is like below

Container: container_1448873753366_121022_02_000001 on 10.239.243.69_8041
===========================================================================
LogType: stderr
LogLength: 4284
Log Contents:
Please use CMSClassUnloadingEnabled in place of CMSPermGenSweepingEnabled in the future
Please use CMSClassUnloadingEnabled in place of CMSPermGenSweepingEnabled in the future
15/12/09 16:29:31 INFO yarn.ApplicationMaster: Registered signal handlers for [TERM, HUP, INT]
15/12/09 16:29:32 INFO yarn.ApplicationMaster: ApplicationAttemptId: appattempt_1448873753366_121022_000002
15/12/09 16:29:33 INFO spark.SecurityManager: Changing view acls to: mqq
15/12/09 16:29:33 INFO spark.SecurityManager: Changing modify acls to: mqq
15/12/09 16:29:33 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(mqq); users with modify permissions: Set(mqq)
15/12/09 16:29:33 INFO yarn.ApplicationMaster: Starting the user application in a separate Thread
15/12/09 16:29:33 INFO yarn.ApplicationMaster: Waiting for spark context initialization
15/12/09 16:29:33 INFO yarn.ApplicationMaster: Waiting for spark context initialization ... 
15/12/09 16:29:33 INFO client.RemoteDriver: Connecting to: 10.179.12.140:38842
15/12/09 16:29:33 WARN rpc.Rpc: Invalid log level null, reverting to default.
15/12/09 16:29:33 ERROR yarn.ApplicationMaster: User class threw exception: java.util.concurrent.ExecutionException: javax.security.sasl.SaslException: Client closed before SASL negotiation finished.
java.util.concurrent.ExecutionException: javax.security.sasl.SaslException: Client closed before SASL negotiation finished.
        at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:37)
        at org.apache.hive.spark.client.RemoteDriver.<init>(RemoteDriver.java:156)
        at org.apache.hive.spark.client.RemoteDriver.main(RemoteDriver.java:556)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:483)
Caused by: javax.security.sasl.SaslException: Client closed before SASL negotiation finished.
        at org.apache.hive.spark.client.rpc.Rpc$SaslClientHandler.dispose(Rpc.java:449)
        at org.apache.hive.spark.client.rpc.SaslHandler.channelInactive(SaslHandler.java:90)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:233)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:219)
        at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
        at org.apache.hive.spark.client.rpc.KryoMessageCodec.channelInactive(KryoMessageCodec.java:127)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:233)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:219)
        at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:233)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:219)
        at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:769)
        at io.netty.channel.AbstractChannel$AbstractUnsafe$5.run(AbstractChannel.java:567)
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:380)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
        at java.lang.Thread.run(Thread.java:745)
15/12/09 16:29:33 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: java.util.concurrent.ExecutionException: javax.security.sasl.SaslException: Client closed before SASL negotiation finished.)
15/12/09 16:29:43 ERROR yarn.ApplicationMaster: SparkContext did not initialize after waiting for 150000 ms. Please check earlier log output for errors. Failing the application.
15/12/09 16:29:43 INFO util.Utils: Shutdown hook called

> Hive on Spark will resubmitted application when not enough resouces to launch yarn application master
> -----------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-12649
>                 URL: https://issues.apache.org/jira/browse/HIVE-12649
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 1.1.1, 1.2.1
>            Reporter: JoneZhang
>            Assignee: Xuefu Zhang
>
> Hive on spark will estimate reducer number when the query is not set reduce number,which cause a application submit.The application will pending if the yarn queue's resources is insufficient.
> So there are more than one pending applications probably because 
> there are more than one estimate call.The failure is soft, so it doesn't prevent subsequent processings. We can make that a hard failure
> That code is found in 
> at org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.getSparkSession(SparkUtilities.java:112)
> at org.apache.hadoop.hive.ql.optimizer.spark.SetSparkReducerParallelism.process(SetSparkReducerParallelism.java:115)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)