You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Gabor Somogyi (Jira)" <ji...@apache.org> on 2022/05/02 11:33:00 UTC
[jira] [Comment Edited] (SPARK-25355) Support --proxy-user for Spark on K8s

    [ https://issues.apache.org/jira/browse/SPARK-25355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17530678#comment-17530678 ] 

Gabor Somogyi edited comment on SPARK-25355 at 5/2/22 11:32 AM:
----------------------------------------------------------------

Guys, when I take a look at the logs and hear what you say honestly not fully understand what you do :)

You're telling that you do kinit which creates a TGT in the users credentials cache on the local machine. Please be aware that this TGT is NOT transferred by default to the cluster.
On the other hand the driver is reading credentials from file:
{code:java}
...
22/04/26 08:54:39 DEBUG UserGroupInformation: Loaded 3 tokens
22/04/26 08:54:39 DEBUG UserGroupInformation: UGI loginUser:185 (auth:SIMPLE)
22/04/26 08:54:39 DEBUG UserGroupInformation: PrivilegedAction as:shrprasa (auth:PROXY) via 185 (auth:SIMPLE) 
...
22/04/26 08:54:38 DEBUG UserGroupInformation: Reading credentials from location set in HADOOP_TOKEN_FILE_LOCATION: /mnt/secrets/hadoop-credentials/..2022_04_26_08_54_34.1262645511/hadoop-tokens
...
{code}

One can authenticate from both credentials (TGT and HADOOP_TOKEN_FILE_LOCATION) so which one is the plan and which one is a side effect?

As a general suggestion client mode kerberos authentication suffers from many issue especially with TGT so not advised. If you want a peaceful life then I warmly suggest to use keytab :)



was (Author: gaborgsomogyi):
Guys, when I take a look at the logs and hear what you say honestly not fully understand what you do :)

You're telling that you do kinit which creates a TGT in the users credentials cache on the local machine. Please be aware that this TGT is NOT transferred by default to the cluster.
On the other hand the driver is reading credentials from file:
{code:java}
...
22/04/26 08:54:38 DEBUG UserGroupInformation: Reading credentials from location set in HADOOP_TOKEN_FILE_LOCATION: /mnt/secrets/hadoop-credentials/..2022_04_26_08_54_34.1262645511/hadoop-tokens
...
{code}

One can authenticate from both credentials (TGT and HADOOP_TOKEN_FILE_LOCATION) so which one is the plan and which one is a side effect?

As a general suggestion client mode kerberos authentication suffers from many issue especially with TGT so not advised. If you want a peaceful life then I warmly suggest to use keytab :)


> Support --proxy-user for Spark on K8s
> -------------------------------------
>
>                 Key: SPARK-25355
>                 URL: https://issues.apache.org/jira/browse/SPARK-25355
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Kubernetes, Spark Core
>    Affects Versions: 3.1.0
>            Reporter: Stavros Kontopoulos
>            Assignee: Pedro Rossi
>            Priority: Major
>             Fix For: 3.1.0
>
>
> SPARK-23257 adds kerberized hdfs support for Spark on K8s. A major addition needed is the support for proxy user. A proxy user is impersonated by a superuser who executes operations on behalf of the proxy user. More on this: 
> [https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html]
> [https://github.com/spark-notebook/spark-notebook/blob/master/docs/proxyuser_impersonation.md]
> This has been implemented for Yarn upstream and Spark on Mesos here:
> [https://github.com/mesosphere/spark/pull/26]
> [~ifilonenko] creating this issue according to our discussion.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org