You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Tao Li <tl...@hortonworks.com> on 2016/09/14 17:42:03 UTC

Question about impersonation on Spark executor

Hi,

I am new to Spark and would like to have a quick question about the end user impersonation on Spark executor process.

Basically I am running SQL queries through Spark thrift server with doAs set to true to enable end user impersonation. In my experiment, I was able to start session for multiple end users at the same time and all queries look fine. For example, user A can query table 1, which is accessible to A exclusively (according to HDFS permission). At the same time, user B can query table 2, which is accessible to B exclusively. Looks like the end user UGI has been flowed to the executor process successfully. I checked SparkContext code and looks like the end user info is flowed to executor by specifying “SPARK_USER” env variable. Correct me if I am wrong.

I only see 1 executor process running for all the queries from multiple users in my experiment. The question is why the single process can impersonate multiple end users at the same time. I assume the value of “SPARK_USER” env variable should be either user A or B in the executor. Then there has to be HDFS permission errors for the other user. But I did not see any error for any user.

Can someone give some insights on that question? Thanks so much.