You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Anirudh Vyas (Jira)" <ji...@apache.org> on 2022/10/19 05:05:00 UTC

[jira] [Commented] (SPARK-38390) Spark submit k8s with proxy user

    [ https://issues.apache.org/jira/browse/SPARK-38390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17620003#comment-17620003 ] 

Anirudh Vyas commented on SPARK-38390:
--------------------------------------

use 
{code:java}
--conf spark.kerberos.keytab=local://keytab file
{code}
Mount the keytab file on `/tmp` and it should work if you are running on cluster mode. This is a workaround,

 

It works currently only using local keytab file. Also when specifying keytab file specify principal as well

 
{code:java}
--conf spark.kerberos.principal = principal-name@REALM{code}

> Spark submit k8s with proxy user
> --------------------------------
>
>                 Key: SPARK-38390
>                 URL: https://issues.apache.org/jira/browse/SPARK-38390
>             Project: Spark
>          Issue Type: Bug
>          Components: Kubernetes
>    Affects Versions: 3.1.1
>            Reporter: Mikhail Pochatkin
>            Priority: Major
>
> In the process of trying to run a spark test using spark submit k8s, I ran into a problem running it with the proxy user option. Judging by the stack trace and the authentication, it is clear that on the side of the spark submit there is a problem with authorization through the user's proxy using delegation token. Command line bellow
> {code:java}
> exec /usr/bin/tini -s -- /bin/sh -c /usr/bin/kinit -c FILE:/tmp/krb5cc -kt  /etc/test.keytab principal@REALM \
> --proxy-user ambari-qa \
> --master k8s://https://${KUBERNETES_SERVICE_HOST}:${KUBERNETES_SERVICE_PORT} \
> --deploy-mode cluster \
> --conf spark.app.name=spark-dfsreadwrite \
> --conf spark.kubernetes.namespace=namespace \
> --conf spark.kubernetes.container.image=gct.io/spark-operator/spark:v3.1.1 \
> --conf spark.kubernetes.submission.waitAppCompletion=true \
> --conf spark.driver.cores=1 \
> --conf spark.driver.memory=512m \
> --conf spark.kubernetes.driver.limit.cores=1 \
> --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark-application-sa \
> --conf spark.kubernetes.driver.label.app=spark-dfsreadwrite \
> --conf spark.executor.instance=1 \
> --conf spark.executor.cores=1 \
> --conf spark.executor.limit.cores=1 \
> --conf spark.kubernetes.executor.label.app=spark-dfsreadwrite \
> --conf spark.kubernetes.hadoop.configMapName=hadoop-configmap \
> --conf spark.kubernetes.kerberos.krb5.configMapName=kerberos-configmap \
> --conf spark.kerberos.renewal.credentials=ccache \
> --conf spark.hadoop.kerberos.keytab.login.autorenewal.enabled=true \
> local:///opt/spark/examples/jars/spark-examples_2.13.-3.1.1.jar \
> /etc/profile /tmp/
> {code}
> Output from command with stack trace.
> {code:java}
> ++ id -u
> + myuid=185
> ++ id -g
> + mygid=0
> + set +e
> ++ getent passwd 185
> + uidentry=
> + set -e
> + '[' -z '' ']'
> + '[' -w /etc/passwd ']'
> + echo '185:x:185:0:anonymous uid:/opt/spark:/bin/false'
> + SPARK_CLASSPATH=':/opt/spark/jars/*'
> + env
> + grep SPARK_JAVA_OPT_
> + sort -t_ -k4 -n
> + sed 's/[^=]*=\(.*\)/\1/g'
> + readarray -t SPARK_EXECUTOR_JAVA_OPTS
> + '[' -n '' ']'
> + '[' -z ']'
> + '[' -z ']'
> + '[' -n '' ']'
> + '[' -z x ']'
> + SPARK_CLASSPATH='/opt/hadoop/conf::/opt/spark/jars/*'
> + '[' -z x ']'
> + SPARK_CLASSPATH='/opt/spark/conf:/opt/hadoop/conf::/opt/spark/jars/*'
> + case "$1" in
> + shift 1
> + CMD=("$SPARK_HOME/bin/spark-submit" --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$@")
> + exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=<ip> --deploy-mode client --proxy-user ambari-qa --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.examples.DFSReadWriteTest local:///opt/spark/examples/jars/spark-examples_2.12-3.1.1.jar /etc/profile /tmp/
> WARNING: An illegal reflective access operation has occurred
> WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/opt/spark/jars/spark-unsafe_2.12-3.1.1.jar) to constructor java.nio.DirectByteBuffer(long,int)
> WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
> WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
> WARNING: All illegal access operations will be denied in a future release
> 22/02/25 10:33:30 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> Setting spark.hadoop.yarn.resourcemanager.principal to ambari-qa
> Performing local word count
> Creating SparkSession
> Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
> 22/02/25 10:33:31 INFO SparkContext: Running Spark version 3.1.1
> 22/02/25 10:33:31 INFO ResourceUtils: ==============================================================
> 22/02/25 10:33:31 INFO ResourceUtils: No custom resources configured for spark.driver.
> 22/02/25 10:33:31 INFO ResourceUtils: ==============================================================
> 22/02/25 10:33:31 INFO SparkContext: Submitted application: DFS Read Write Test
> 22/02/25 10:33:31 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 512, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0)
> 22/02/25 10:33:31 INFO ResourceProfile: Limiting resource is cpus at 1 tasks per executor
> 22/02/25 10:33:31 INFO ResourceProfileManager: Added ResourceProfile id: 0
> 22/02/25 10:33:31 INFO SecurityManager: Changing view acls to: 185,ambari-qa
> 22/02/25 10:33:31 INFO SecurityManager: Changing modify acls to: 185,ambari-qa
> 22/02/25 10:33:31 INFO SecurityManager: Changing view acls groups to: 
> 22/02/25 10:33:31 INFO SecurityManager: Changing modify acls groups to: 
> 22/02/25 10:33:31 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(185, ambari-qa); groups with view permissions: Set(); users  with modify permissions: Set(185, ambari-qa); groups with modify permissions: Set()
> 22/02/25 10:33:32 INFO Utils: Successfully started service 'sparkDriver' on port 7078.
> 22/02/25 10:33:32 INFO SparkEnv: Registering MapOutputTracker
> 22/02/25 10:33:32 INFO SparkEnv: Registering BlockManagerMaster
> 22/02/25 10:33:32 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
> 22/02/25 10:33:32 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
> 22/02/25 10:33:32 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
> 22/02/25 10:33:32 INFO DiskBlockManager: Created local directory at /var/data/spark-3b0fe4a4-edb4-4144-9f9c-74e3ea583def/blockmgr-33259dcc-20aa-47cd-b09c-8c128de5f5eb
> 22/02/25 10:33:32 INFO MemoryStore: MemoryStore started with capacity 117.0 MiB
> 22/02/25 10:33:32 INFO SparkEnv: Registering OutputCommitCoordinator
> 22/02/25 10:33:33 INFO Utils: Successfully started service 'SparkUI' on port 4040.
> 22/02/25 10:33:33 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://spark-dfsreadwrite-09f12c7f30714a72-driver-svc.compute.svc:4040
> 22/02/25 10:33:33 INFO SparkContext: Added JAR local:///opt/spark/examples/jars/spark-examples_2.12-3.1.1.jar at file:/opt/spark/examples/jars/spark-examples_2.12-3.1.1.jar with timestamp 1645785211700
> 22/02/25 10:33:33 WARN SparkContext: The jar local:///opt/spark/examples/jars/spark-examples_2.12-3.1.1.jar has been added already. Overwriting of added jars is not supported in the current version.
> 22/02/25 10:33:33 INFO SparkKubernetesClientFactory: Auto-configuring K8S client using current context from users K8S config file
> 22/02/25 10:33:35 INFO ExecutorPodsAllocator: Going to request 1 executors from Kubernetes for ResourceProfile Id: 0, target: 1 running: 0.
> 22/02/25 10:33:36 INFO BasicExecutorFeatureStep: Decommissioning not enabled, skipping shutdown script
> 22/02/25 10:33:36 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 7079.
> 22/02/25 10:33:36 INFO NettyBlockTransferService: Server created on spark-dfsreadwrite-09f12c7f30714a72-driver-svc.compute.svc:7079
> 22/02/25 10:33:36 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
> 22/02/25 10:33:36 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, spark-dfsreadwrite-09f12c7f30714a72-driver-svc.compute.svc, 7079, None)
> 22/02/25 10:33:36 INFO BlockManagerMasterEndpoint: Registering block manager spark-dfsreadwrite-09f12c7f30714a72-driver-svc.compute.svc:7079 with 117.0 MiB RAM, BlockManagerId(driver, spark-dfsreadwrite-09f12c7f30714a72-driver-svc.compute.svc, 7079, None)
> 22/02/25 10:33:36 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, spark-dfsreadwrite-09f12c7f30714a72-driver-svc.compute.svc, 7079, None)
> 22/02/25 10:33:36 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, spark-dfsreadwrite-09f12c7f30714a72-driver-svc.compute.svc, 7079, None)
> 22/02/25 10:33:39 INFO KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (<ip>:<port>) with ID 1,  ResourceProfileId 0
> 22/02/25 10:33:39 INFO KubernetesClusterSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
> 22/02/25 10:33:39 INFO BlockManagerMasterEndpoint: Registering block manager 10.42.0.221:33711 with 117.0 MiB RAM, BlockManagerId(1, <ip>, <port>, None)
> Writing local file to DFS
> 22/02/25 10:33:39 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/opt/spark/work-dir/spark-warehouse').
> 22/02/25 10:33:39 INFO SharedState: Warehouse path is 'file:/opt/spark/work-dir/spark-warehouse'.
> 22/02/25 10:33:41 WARN Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
> 22/02/25 10:33:41 WARN Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
> 22/02/25 10:33:41 INFO RetryInvocationHandler: Exception while invoking getFileInfo of class ClientNamenodeProtocolTranslatorPB over <address>/<ip>:8020 after 1 fail over attempts. Trying to fail over immediately.
> java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "spark-dfsreadwrite-09f12c7f30714a72-driver/<ip>"; destination host is: "<address>":8020; 
> 	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:776)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1480)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1413)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
> 	at com.sun.proxy.$Proxy34.getFileInfo(Unknown Source)
> 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:776)
> 	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
> 	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
> 	at java.base/java.lang.reflect.Method.invoke(Unknown Source)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> 	at com.sun.proxy.$Proxy35.getFileInfo(Unknown Source)
> 	at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2108)
> 	at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305)
> 	at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
> 	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> 	at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1317)
> 	at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1426)
> 	at org.apache.spark.examples.DFSReadWriteTest$.main(DFSReadWriteTest.scala:115)
> 	at org.apache.spark.examples.DFSReadWriteTest.main(DFSReadWriteTest.scala)
> 	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
> 	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
> 	at java.base/java.lang.reflect.Method.invoke(Unknown Source)
> 	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
> 	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
> 	at org.apache.spark.deploy.SparkSubmit$$anon$1.run(SparkSubmit.scala:165)
> 	at org.apache.spark.deploy.SparkSubmit$$anon$1.run(SparkSubmit.scala:163)
> 	at java.base/java.security.AccessController.doPrivileged(Native Method)
> 	at java.base/javax.security.auth.Subject.doAs(Unknown Source)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746)
> 	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:163)
> 	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> 	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> 	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1030)
> 	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1039)
> 	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
> 	at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:688)
> 	at java.base/java.security.AccessController.doPrivileged(Native Method)
> 	at java.base/javax.security.auth.Subject.doAs(Unknown Source)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746)
> 	at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:651)
> 	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:738)
> 	at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:376)
> 	at org.apache.hadoop.ipc.Client.getConnection(Client.java:1529)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1452)
> 	... 36 more
> Caused by: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
> 	at jdk.security.jgss/com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(Unknown Source)
> 	at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:414)
> 	at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:561)
> 	at org.apache.hadoop.ipc.Client$Connection.access$1900(Client.java:376)
> 	at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:730)
> 	at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:726)
> 	at java.base/java.security.AccessController.doPrivileged(Native Method)
> 	at java.base/javax.security.auth.Subject.doAs(Unknown Source)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746)
> 	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:726)
> 	... 39 more
> Caused by: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
> 	at java.security.jgss/sun.security.jgss.krb5.Krb5InitCredential.getInstance(Unknown Source)
> 	at java.security.jgss/sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Unknown Source)
> 	at java.security.jgss/sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Unknown Source)
> 	at java.security.jgss/sun.security.jgss.GSSManagerImpl.getMechanismContext(Unknown Source)
> 	at java.security.jgss/sun.security.jgss.GSSContextImpl.initSecContext(Unknown Source)
> 	at java.security.jgss/sun.security.jgss.GSSContextImpl.initSecContext(Unknown Source)
> 	... 49 more
> {code}
> Main question is why the same case but with yarn works well. Is it some restriction for spark submit k8s or configuration issue?
> {code:java}
> kinit -kt test.keytab principal@REALM 
> spark-submit \
> --class org.apache.spark.examples.DFSReadWriteTest \
> --deploy-mode client  \
> --proxy-user ambari-qa \ 
> --conf spark.app.name=spark-dfsreadwrite \ 
> --conf spark.driver.cores=1 \ 
> --conf spark.driver.memory=512m \
> --conf spark.executor.instances=1 \
> --conf spark.executor.cores=1 \
> --conf spark.executor.memory=512m \
> /opt/spark/examples/jars/spark-examples_2.12-3.0.1.jar \ 
> /etc/profile /tmp
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org