You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Laurenceau Julien (Jira)" <ji...@apache.org> on 2020/07/29 09:07:00 UTC

[jira] [Commented] (SPARK-30519) Executor can't use spark.executorEnv.HADOOP_USER_NAME to change the user accessing to hdfs

    [ https://issues.apache.org/jira/browse/SPARK-30519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17167075#comment-17167075 ] 

Laurenceau Julien commented on SPARK-30519:
-------------------------------------------

As I understand a possible work around would be to directly run spark as a non root inside the container ie, run spark as  user HADOOP_USER_NAME inside the container.

However, I may be missing something but this is not possible using spark only, but it could be done from kubernetes.

Spark Manual says:

 
{panel}
Images built from the project provided Dockerfiles contain a default [{{USER}}|https://docs.docker.com/engine/reference/builder/#user] directive with a default UID of {{185}}. This means that the resulting images will be running the Spark processes as this UID inside the container. Security conscious deployments should consider providing custom images with {{USER}} directives specifying their desired unprivileged UID and GID. The resulting UID should include the root group in its supplementary groups in order to be able to run the Spark executables. Users building their own images with the provided {{docker-image-tool.sh}} script can use the {{-u <uid>}} option to specify the desired UID.

Alternatively the [Pod Template|http://spark.apache.org/docs/latest/running-on-kubernetes.html#pod-template] feature can be used to add a [Security Context|https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#volumes-and-file-systems] with a {{runAsUser}} to the pods that Spark submits. This can be used to override the {{USER}} directives in the images themselves. Please bear in mind that this requires cooperation from your users and as such may not be a suitable solution for shared environments. Cluster administrators should use [Pod Security Policies|https://kubernetes.io/docs/concepts/policy/pod-security-policy/#users-and-groups] if they wish to limit the users that pods may run as.
{panel}
 

 

This seems to be the approach used by Google on their spark-on-k8s-operator since they propose the feature: 
{panel}
Automatically runs {{spark-submit}} on behalf of users for each {{SparkApplication}} eligible for submission.{panel}
Could someone confirm it as a possible workaround ?

> Executor can't use spark.executorEnv.HADOOP_USER_NAME to change the user accessing to hdfs
> ------------------------------------------------------------------------------------------
>
>                 Key: SPARK-30519
>                 URL: https://issues.apache.org/jira/browse/SPARK-30519
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.4.3
>            Reporter: Xiaoming
>            Priority: Minor
>
> Currently, we can specify hadoop user by setting HADOOP_USER_NAME on driver when submit a job. However it's invalid to executor by setting spark.executorEnv.HADOOP_USER_NAME.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org