You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/15 08:46:02 UTC

[jira] [Updated] (SPARK-17516) Current user info is not checked on STS in DML queries

     [ https://issues.apache.org/jira/browse/SPARK-17516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon updated SPARK-17516:
---------------------------------
    Priority: Major  (was: Critical)

> Current user info is not checked on STS in DML queries
> ------------------------------------------------------
>
>                 Key: SPARK-17516
>                 URL: https://issues.apache.org/jira/browse/SPARK-17516
>             Project: Spark
>          Issue Type: Bug
>            Reporter: Tao Li
>            Priority: Major
>
> I have captured some issues related to doAs support from STS. I am using a non-secure cluster as my test environment. Simply speaking, the end user info is not being passed when STS talks to metastore, so the impersonation is not happening on metastore.
> STS is using a ClientWarpper instance (which is wrapped in HiveContext) for each session. However by design all ClientWarpper instances are sharing the same Hive instance, which is responsible for talking to Metastore. A singleton IsolatedClientLoader instance is initialized when STS starts up and it contains the cachedHive instance. The cachedHive is associated “hive” UGI, since no session has been set up so current user is “hive". Then each session creates a ClientWarpper instance which is associated with the same cachedHive instance.
> When we make queries after session is established, the code path to retrieve the Hive instance is different for DML and DDL operation. Looks like DML operation related code has less dependency on hive-exec module.
> For the DML operations (e.g. “select *”), STS calls into ClientWarpper code and talks to metastore through the singleton Hive instance directly. There is no code involved to check the current user. That’s why doAs is not being respected, even though current user is already switched to the end user in the thread context.
> For DDL operations (e.g. “ALTER table”), STS eventually calls into hive driver code (e.g. BaseSemanticAnalyzer). From there Hive.get() is called to get the thread local Hive instance and refresh it if necessary. If the current user has changed, we refresh the Hive instance by recreating the metastore connection with the current user info. So even though all thread locals are actually referencing the singleton Hive instance, calling Hive.get() is playing an important role here to take any UGI change into account. That’s why the DDL operations respects doAs . 
> The fix should be calling Hive.get() for the DML operations, like the hive driver code called from DDL operation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org