You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Bolke de Bruin (JIRA)" <ji...@apache.org> on 2015/07/14 09:34:05 UTC

[jira] [Issue Comment Deleted] (SPARK-9019) spark-submit fails on yarn with kerberos enabled

     [ https://issues.apache.org/jira/browse/SPARK-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bolke de Bruin updated SPARK-9019:
----------------------------------
    Comment: was deleted

(was: - this was incorrect -
)

> spark-submit fails on yarn with kerberos enabled
> ------------------------------------------------
>
>                 Key: SPARK-9019
>                 URL: https://issues.apache.org/jira/browse/SPARK-9019
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Submit
>    Affects Versions: 1.5.0
>         Environment: Hadoop 2.6 with YARN and kerberos enabled
>            Reporter: Bolke de Bruin
>              Labels: kerberos, spark-submit, yarn
>
> It is not possible to run jobs using spark-submit on yarn with a kerberized cluster. 
> Commandline:
> /usr/hdp/2.2.0.0-2041/spark-1.5.0/bin/spark-submit --principal sparkjob --keytab sparkjob.keytab --num-executors 3 --executor-cores 5 --executor-memory 5G --master yarn-cluster /tmp/get_peers.py 
> Fails with:
> 15/07/13 22:48:31 INFO server.Server: jetty-8.y.z-SNAPSHOT
> 15/07/13 22:48:31 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:58380
> 15/07/13 22:48:31 INFO util.Utils: Successfully started service 'SparkUI' on port 58380.
> 15/07/13 22:48:31 INFO ui.SparkUI: Started SparkUI at http://10.111.114.9:58380
> 15/07/13 22:48:31 INFO cluster.YarnClusterScheduler: Created YarnClusterScheduler
> 15/07/13 22:48:31 WARN metrics.MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set.
> 15/07/13 22:48:32 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43470.
> 15/07/13 22:48:32 INFO netty.NettyBlockTransferService: Server created on 43470
> 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Trying to register BlockManager
> 15/07/13 22:48:32 INFO storage.BlockManagerMasterEndpoint: Registering block manager 10.111.114.9:43470 with 265.1 MB RAM, BlockManagerId(driver, 10.111.114.9, 43470)
> 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Registered BlockManager
> 15/07/13 22:48:32 INFO impl.TimelineClientImpl: Timeline service address: http://lxhnl002.ad.ing.net:8188/ws/v1/timeline/
> 15/07/13 22:48:33 WARN ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
> 15/07/13 22:48:33 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
> 15/07/13 22:48:33 INFO retry.RetryInvocationHandler: Exception while invoking getClusterNodes of class ApplicationClientProtocolPBClientImpl over rm2 after 1 fail over attempts. Trying to fail over after sleeping for 32582ms.
> java.net.ConnectException: Call From lxhnl006.ad.ing.net/10.111.114.9 to lxhnl013.ad.ing.net:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
> 	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> 	at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
> 	at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
> 	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1472)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1399)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
> 	at com.sun.proxy.$Proxy24.getClusterNodes(Unknown Source)
> 	at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:262)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:606)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> 	at com.sun.proxy.$Proxy25.getClusterNodes(Unknown Source)
> 	at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getNodeReports(YarnClientImpl.java:475)
> 	at org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend$$anonfun$getDriverLogUrls$1.apply(YarnClusterSchedulerBackend.scala:92)
> 	at org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend$$anonfun$getDriverLogUrls$1.apply(YarnClusterSchedulerBackend.scala:73)
> at scala.Option.foreach(Option.scala:236)
> 	at org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend.getDriverLogUrls(YarnClusterSchedulerBackend.scala:73)
> 	at org.apache.spark.SparkContext.postApplicationStart(SparkContext.scala:1993)
> 	at org.apache.spark.SparkContext.<init>(SparkContext.scala:544)
> 	at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:61)
> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
> 	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> 	at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
> 	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
> 	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
> 	at py4j.Gateway.invoke(Gateway.java:214)
> 	at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
> 	at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
> 	at py4j.GatewayConnection.run(GatewayConnection.java:207)
> 	at java.lang.Thread.run(Thread.java:745)
> Caused by: java.net.ConnectException: Connection refused
> 	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> 	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
> 	at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
> 	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
> 	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
> 	at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:607)
> 	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:705)
> 	at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
> 	at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1438)
> 	... 30 more
> If not using --principal and --keytab the same error shows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org