You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Kwangsun Noh <no...@gmail.com> on 2021/04/12 15:19:40 UTC

UserGroupInformation.doAS is working well in Spark Executors?

Hi, Spark users.


I wanted to make unknown users create HDFS files, not the OS user who
executes the spark application.


And I thought it would be possible using
UserGroupInformation.createRemoteUser(“other”).doAS(…)


However, the files are created by the OS user who launched the spark
application in Spark Executors.


Although I’ve tested it on Spark Standalone and Yarn, I got the same
results.


Is it impossible to impersonate a Spark job user using the
UserGroupInformation.doAS?


PS. In fact, I posted a similar question on the Spark user mailing list,

       But I didn’t get the answer I wanted.


http://apache-spark-user-list.1001560.n3.nabble.com/Is-it-enable-to-use-Multiple-UGIs-in-One-Spark-Context-td39859.html

Re: UserGroupInformation.doAS is working well in Spark Executors?

Posted by yaooqinn <ya...@gmail.com>.
Hi Kwangsun,

You may use `—proxy-user` to impersonate.

For example,

bin/spark-shell --proxy-user kent
21/04/12 23:31:34 WARN Utils: Your hostname, hulk.local resolves to a
loopback address: 127.0.0.1; using 192.168.1.14 instead (on interface en0)
21/04/12 23:31:34 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to
another address
21/04/12 23:31:34 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile:
org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use
setLogLevel(newLevel).
Spark context Web UI available at http://192.168.1.14:4040
Spark context available as 'sc' (master = local[*], app id =
local-1618241499136).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.2.0-SNAPSHOT
      /_/

Using Scala version 2.12.10 (Java HotSpot(TM) 64-Bit Server VM, Java
1.8.0_251)
Type in expressions to have them evaluated.
Type :help for more information.

scala> spark.sparkContext.sparkUser
res0: String = kent

scala>
org.apache.hadoop.security.UserGroupInformation.getCurrentUser.getShortUserName
res1: String = kent

scala>
org.apache.hadoop.security.UserGroupInformation.getLoginUser.getShortUserName
res2: String = kentyao

Bests,

Kent Yao



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Re: UserGroupInformation.doAS is working well in Spark Executors?

Posted by Steve Loughran <st...@cloudera.com.INVALID>.
If are using kerberized HDFS the spark principal (or whoever is running the
cluster) has to be declared as a proxy user.

https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/Superusers.html

Once done, you call the


val ugi =  UserGroupInformation.createProxyUser("joe",
UserGroupInformation.getLoginUser())

that user is then used to create the FS

val proxyFS = ugi.doAs( { FileSystem.newInstance(new
URI("hdfs://nn1/home/user/"), conf)  }})     /* whatever the scala syntax
is here */


The proxyFS will then do all its IO as the given user, even when done
outside a doAs clause, e.g.

proxyFS.mkdirs(new Path("/home/user/alice/"))

FileSystem.get() also works on a UGI basis, so ugi.doAs(
FileSystem.get("hdfs://nn1"))) returns a different FS instance than
FileSystem.get() outside of the clause

Once you are done with the FS, close it. If you know you are completely
done with the user across all threads, you can release them all

FileSystem.closeAllForUGI(ugi)

This closes all filesystems for that user. This is critical on long-lived
processes as otherwise you'll run out memory/threads.

On Mon, 12 Apr 2021 at 16:20, Kwangsun Noh <no...@gmail.com> wrote:

> Hi, Spark users.
>
>
> I wanted to make unknown users create HDFS files, not the OS user who
> executes the spark application.
>
>
> And I thought it would be possible using
> UserGroupInformation.createRemoteUser(“other”).doAS(…)
>
>
> However, the files are created by the OS user who launched the spark
> application in Spark Executors.
>
>
> Although I’ve tested it on Spark Standalone and Yarn, I got the same
> results.
>
>
> Is it impossible to impersonate a Spark job user using the
> UserGroupInformation.doAS?
>
>
> PS. In fact, I posted a similar question on the Spark user mailing list,
>
>        But I didn’t get the answer I wanted.
>
>
>
> http://apache-spark-user-list.1001560.n3.nabble.com/Is-it-enable-to-use-Multiple-UGIs-in-One-Spark-Context-td39859.html
>