You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by jerryshao <gi...@git.apache.org> on 2017/03/17 09:42:26 UTC

[GitHub] spark pull request #17335: [SPARK-19995][Hive][Yarn] Using real user to init...

GitHub user jerryshao opened a pull request:

    https://github.com/apache/spark/pull/17335

    [SPARK-19995][Hive][Yarn] Using real user to initialize hive SessionState

    ## What changes were proposed in this pull request?
    
    Using current user to connect MetaStore in `HiveClientImpl` will introduce tgt not found issue if current user is not kinited.
    
    This could be happened when using `--proxy-user`, only real user is kinited. So we should use real user to connect Metastore instead of current user to avoid this issue.
    
    ## How was this patch tested?
    
    Local verified in secure cluster.
    
    @vanzin @tgravescs @dongjoon-hyun please help to review, thanks a lot.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jerryshao/apache-spark SPARK-19995

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17335.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17335
    
----
commit d31dcb340eba66d6adf6c027675fb9ebb5b18ce2
Author: jerryshao <ss...@hortonworks.com>
Date:   2017-03-17T09:14:56Z

    Using real user to initialize hive SessionState
    
    Change-Id: If423f3fdc709ed3284cafc01efd1fe389f635560

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17335: [SPARK-19995][Hive][Yarn] Using real user to initialize ...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/17335
  
    Thank you, @jerryshao . I'll test on this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17335: [SPARK-19995][Hive][Yarn] Using real user to initialize ...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the issue:

    https://github.com/apache/spark/pull/17335
  
    Thanks @vanzin , I agree with you. The scenario what @subrotosanyal mentioned is a little bit customized, so this problem might be better to handle out of Spark
    
    Sure, I will update it.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17335: [SPARK-19995][Hive][Yarn] Using real user to initialize ...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/17335
  
    @subrotosanyal 
    
    I was able to write some code that should work for your use case even without the fix for SPARK-15754. I reverted that change and ran the following code a few times in the same JVM:
    
    ```
        PrivilegedExceptionAction<Void> action = () -> {
          dumpTokens("before");
          runSpark();
          dumpTokens("after");
          return null;
        };
        UserGroupInformation ugi = UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytab);
        ugi.doAs(action);
    ```
    
    (Where `dumpTokens` prints the tokens in the UGI, and `runSpark` starts a SparkContext and stops it.)
    
    Each iteration starts with no tokens and finishes with an HDFS delegation token, so it seems to have the behavior you want.
    
    With that being said, if reverting the fix for SPARK-15754 fixes the Hive token issue, we should probably do that since there seems to be a way for things to work in the embedded case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17335: [SPARK-19995][Hive][Yarn] Using real user to initialize ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17335
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17335: [SPARK-19995][Hive][Yarn] Using real user to initialize ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17335
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74946/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17335: [SPARK-19995][Hive][Yarn] Using real user to initialize ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17335
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74740/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17335: [SPARK-19995][YARN] Register tokens to current UG...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/17335


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17335: [SPARK-19995][Hive][Yarn] Using real user to initialize ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17335
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17335: [SPARK-19995][Hive][Yarn] Using real user to initialize ...

Posted by yaooqinn <gi...@git.apache.org>.
Github user yaooqinn commented on the issue:

    https://github.com/apache/spark/pull/17335
  
    I have tested this with my kerberized hdfs and it works for me. LGTM, thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17335: [SPARK-19995][Hive][Yarn] Using real user to initialize ...

Posted by yaooqinn <gi...@git.apache.org>.
Github user yaooqinn commented on the issue:

    https://github.com/apache/spark/pull/17335
  
    @subrotosanyal would you please help to describe https://github.com/apache/spark/pull/13499  in detail\uff1fThanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17335: [SPARK-19995][Hive][Yarn] Using real user to initialize ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17335
  
    **[Test build #74740 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74740/testReport)** for PR 17335 at commit [`d31dcb3`](https://github.com/apache/spark/commit/d31dcb340eba66d6adf6c027675fb9ebb5b18ce2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17335: [SPARK-19995][Hive][Yarn] Using real user to initialize ...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the issue:

    https://github.com/apache/spark/pull/17335
  
    Broaden this issue a bit. Currently in driver side (client mode), issued delegation tokens are not added into current ugi, this makes follow-up hdfs/metastore/hbase communication still use tgt instead of delegation tokens, this is unnecessary and should be avoided, since we already get tokens in yarn#client.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17335: [SPARK-19995][Hive][Yarn] Using real user to initialize ...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the issue:

    https://github.com/apache/spark/pull/17335
  
    I'm not sure if I understand your scenario correctly. In your case Spark application is embeded into your own application, your application is still worked after Spark is stopped. And because delegation tokens is expired explicitly after yarn app is finished. Then your following hdfs operations which honor delegation tokens will be failed, so you have to use tgt rather than delegation tokens. Am I right?
    
    I guess it related to this JIRA (https://issues.apache.org/jira/browse/YARN-2964). It may already be fixed in yarn side.
    
    But with your fix, proxy user is not worked. And I think to handle your scenario, we could deliberately remove all the tokens in current UGI after application is finished. So that your following hdfs operations could honor tgt to get new tokens.
    
     



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17335: [SPARK-19995][Hive][Yarn] Using real user to initialize ...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the issue:

    https://github.com/apache/spark/pull/17335
  
    @subrotosanyal would you please elaborate more about this:
    
    > Resource Manager expires the tokens of an application after a certain period of time lead to expiration of the token which is part of the Client which submitted the Spark Job.
    
    What will be happened when RM expired the tokens, also when will this be happened?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17335: [SPARK-19995][Hive][Yarn] Using real user to initialize ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17335
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74954/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17335: [SPARK-19995][Hive][Yarn] Using real user to initialize ...

Posted by yaooqinn <gi...@git.apache.org>.
Github user yaooqinn commented on the issue:

    https://github.com/apache/spark/pull/17335
  
    The dbs and tbls may be created on hdfs via the real user\uff0cso that the proxy user may have no rights to things such as:
    ```
    Error: java.lang.RuntimeException: Cannot create staging directory 'hdfs://hz-test01/user/hive/warehouse/hzyaoqin.db/src2/.hive-staging_hive_2017-03-20_22-43-44_189_8479160175818973314-1': Permission denied: user=hzyaoqin, access=WRITE, inode="/user/hive/warehouse/hzyaoqin.db/src2/.hive-staging_hive_2017-03-20_22-43-44_189_8479160175818973314-1":hive:hdfs:drwxr-xr-x
    ```
    
    Which means each `w` required hdfs operation has to be doAsRealUser. In that case, proxy-users may could visit other proxy ones data. Do you meet that error while using intert/ctas?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17335: [SPARK-19995][Hive][Yarn] Using real user to init...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17335#discussion_r106700053
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala ---
    @@ -353,6 +354,25 @@ class SparkHadoopUtil extends Logging {
         }
         buffer.toString
       }
    +
    +  /**
    +   * Run some code as the real logged in user (which may differ from the current user, for
    +   * example, when using proxying).
    +   */
    +  private[spark] def doAsRealUser[T](fn: => T): T = {
    +    val currentUser = UserGroupInformation.getCurrentUser()
    --- End diff --
    
    Hmmm... I'm not so sure this will work in all cases. Can you test this with both `spark.sql.hive.metastore.jars` and `spark.sql.hive.metastore.version` set?
    
    The problem is that this class is loaded by Spark's main class loader, while `HiveClientImpl` comes from a different class loader. So `UserGroupInformation` might be a different class in certain cases. It's the same reasoning why `HiveClientImpl` class does its own `loginUserFromKeytab` around L110.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17335: [SPARK-19995][Hive][Yarn] Using real user to initialize ...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the issue:

    https://github.com/apache/spark/pull/17335
  
    I've no idea about that issue, the description is so vague ("Resource Manager cancels the Delegation Token after 10 minutes of shutting down the spark context."). Not pretty sure the scenario mentioned in that jira.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17335: [SPARK-19995][Hive][Yarn] Using real user to initialize ...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the issue:

    https://github.com/apache/spark/pull/17335
  
    Thanks @yaooqinn , that's really an issue here. That was my concern when I had this fix, since we wrap the whole `SessionState.start` with real user, it means all the operations inside this `start` will be executed as real user, ideally we should only wrap the metastore connection code.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17335: [SPARK-19995][Hive][Yarn] Using real user to initialize ...

Posted by yaooqinn <gi...@git.apache.org>.
Github user yaooqinn commented on the issue:

    https://github.com/apache/spark/pull/17335
  
    with this creds provided by HiveCredentialProvider and configured by `hive.metastore.kerberos.principal`,  do we need to re-login with `spark.yarn.principal` aiming to connect metastore?  
    
    I guess that `spark.yarn.principal` is used to auth yarn's rm to submit apps, `hive.metastore.kerberos.principal` to metastore, and `dfs.namenode.kerberos.principal` to namenode, all these and other principals are used by  spark driver to connect different services\uff0c am I right\uff1f



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17335: [SPARK-19995][Hive][Yarn] Using real user to initialize ...

Posted by yaooqinn <gi...@git.apache.org>.
Github user yaooqinn commented on the issue:

    https://github.com/apache/spark/pull/17335
  
    https://issues.apache.org/jira/browse/SPARK-15754  will this patch cause this problem?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17335: [SPARK-19995][YARN] Register tokens to current UGI to av...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/17335
  
    LGTM, merging to master / 2.1.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17335: [SPARK-19995][Hive][Yarn] Using real user to initialize ...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/17335
  
    @jerryshao the PR description seems to be out of sync with the current code, can you update it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17335: [SPARK-19995][Hive][Yarn] Using real user to initialize ...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the issue:

    https://github.com/apache/spark/pull/17335
  
    @yaooqinn , pushed another way to fix this issue, I think hdfs folder owner should be the right user (proxy user).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17335: [SPARK-19995][YARN] Register tokens to current UGI to av...

Posted by rajeshcode <gi...@git.apache.org>.
Github user rajeshcode commented on the issue:

    https://github.com/apache/spark/pull/17335
  
    Is this patch will work for spark-sql --master local mode as well.
     
    In our environment  localmode is not supporting proxy user where as yarn mode looks ok. Do we have a solution for proxy user support on localmode


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17335: [SPARK-19995][Hive][Yarn] Using real user to initialize ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17335
  
    **[Test build #74946 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74946/testReport)** for PR 17335 at commit [`11a1094`](https://github.com/apache/spark/commit/11a10946a575d6ed0f707ea0735b7a0a0024090d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17335: [SPARK-19995][Hive][Yarn] Using real user to initialize ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17335
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17335: [SPARK-19995][Hive][Yarn] Using real user to initialize ...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the issue:

    https://github.com/apache/spark/pull/17335
  
    @yaooqinn , You only need one principal (for example principal "foo@EXAMPLE.COM") to get authentication from different services, the configurations for hive and NN mentioned above is only for this two services, it is not for user who submits Spark application.
    
    For the user who launches Spark application,  only `spark.yarn.principal` could represent this user.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17335: [SPARK-19995][YARN] Register tokens to current UGI to av...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the issue:

    https://github.com/apache/spark/pull/17335
  
    Sorry @vanzin about it. Just update the description, please review again. Thanks a lot.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17335: [SPARK-19995][Hive][Yarn] Using real user to initialize ...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the issue:

    https://github.com/apache/spark/pull/17335
  
    Ping @vanzin , mind reviewing again? Thanks a lot.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17335: [SPARK-19995][Hive][Yarn] Using real user to initialize ...

Posted by subrotosanyal <gi...@git.apache.org>.
Github user subrotosanyal commented on the issue:

    https://github.com/apache/spark/pull/17335
  
    hi @yaooqinn ,
    This is a scenario where Spark is embed in client application (spark-client mode).
    In the method `Client#createContainerLaunchContext ()`, the credentials(delegation tokens) obtained to run the spark application is added to current `UserGroupInformation`(refer the the deleted line in the PR) which shouldn't be the case. `UserGroupInformation` is a static global object which once changed at any point of application id reflected throughout the JVM. Further, the delegation tokens so added are also passed to the YARN platform (specifically ResourceManager). Resource Manager expires the tokens of an application after a certain period of time lead to expiration of the token which is part of the Client which submitted the Spark Job.
    
    The fix tries to remove the code where the Spark job specific credentials are added to current `UserGroupInformation`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17335: [SPARK-19995][Hive][Yarn] Using real user to initialize ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17335
  
    **[Test build #74954 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74954/testReport)** for PR 17335 at commit [`e9b5580`](https://github.com/apache/spark/commit/e9b55800e9ad02ef07f081a5dfb4943ac5d80523).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17335: [SPARK-19995][Hive][Yarn] Using real user to initialize ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17335
  
    **[Test build #74946 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74946/testReport)** for PR 17335 at commit [`11a1094`](https://github.com/apache/spark/commit/11a10946a575d6ed0f707ea0735b7a0a0024090d).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17335: [SPARK-19995][Hive][Yarn] Using real user to initialize ...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/17335
  
    So I had to dig up some e-mails to refresh my brain about SPARK-15754. It is not related to YARN-2964 (that one is for things like Oozie, where the same token is using by multiple YARN apps). It's related to YARN cancelling tokens after apps finish (or a group of apps sharing the same token, in case of YARN-2964).
    
    So, in the embedded case, something like this:
    
    ```
    val sc1 = new SparkContext("yarn-client")
    // do stuff
    sc1.stop()
    
    // wait a bit
    
    // The following will fail because YARN will have cancelled the old delegation tokens
    // which are still in the current UGI object.
    val sc2 = new SparkContext("yarn-client")
    ```
    
    The problem is caused by Spark adding the tokens to the current UGI, and the UGI API has no way to remove them. So when you start the new context, the code will try to use the tokens in the UGI and fail because they've been cancelled.
    
    Allowing Spark to overwrite the current UGI's credentials seems to fix a bunch of issues, and is obviously fine for everybody using `spark-submit`. But I wonder if there's a way to avoid that in these applications that embed Spark without requiring them to manage their own delegation tokens.
    
    Let me dig up some code from my e-mail and see if I can reproduce the original issue and find a workaround...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17335: [SPARK-19995][YARN] Register tokens to current UGI to av...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/17335
  
    @jerryshao the description is still about the initial version of the patch, not the current code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17335: [SPARK-19995][Hive][Yarn] Using real user to init...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17335#discussion_r106828435
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala ---
    @@ -353,6 +354,25 @@ class SparkHadoopUtil extends Logging {
         }
         buffer.toString
       }
    +
    +  /**
    +   * Run some code as the real logged in user (which may differ from the current user, for
    +   * example, when using proxying).
    +   */
    +  private[spark] def doAsRealUser[T](fn: => T): T = {
    +    val currentUser = UserGroupInformation.getCurrentUser()
    --- End diff --
    
    I see. Let me take a try. But I'm guessing this is the only place where the issue can be handled from Spark side.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17335: [SPARK-19995][Hive][Yarn] Using real user to init...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17335#discussion_r106831705
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala ---
    @@ -353,6 +354,25 @@ class SparkHadoopUtil extends Logging {
         }
         buffer.toString
       }
    +
    +  /**
    +   * Run some code as the real logged in user (which may differ from the current user, for
    +   * example, when using proxying).
    +   */
    +  private[spark] def doAsRealUser[T](fn: => T): T = {
    +    val currentUser = UserGroupInformation.getCurrentUser()
    --- End diff --
    
    @vanzin I tried with Above two configurations, though having some class not found issue in our HDP environment, but metastore connect can be correct established without GSSAPI tgt not found issue. Tried with spark.sql.hive.metastore.jars=maven, spark.sql.hive.metastore.version=1.2.1 and 2.0.1.
    
    ```
    17/03/20 03:35:48 INFO metastore: Trying to connect to metastore with URI thrift://c6402.ambari.apache.org:9083
    17/03/20 03:35:48 INFO metastore: Opened a connection to metastore, current connections: 1
    17/03/20 03:35:48 INFO metastore: Connected to metastore.
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17335: [SPARK-19995][Hive][Yarn] Using real user to initialize ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17335
  
    **[Test build #74740 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74740/testReport)** for PR 17335 at commit [`d31dcb3`](https://github.com/apache/spark/commit/d31dcb340eba66d6adf6c027675fb9ebb5b18ce2).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17335: [SPARK-19995][Hive][Yarn] Using real user to initialize ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17335
  
    **[Test build #74954 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74954/testReport)** for PR 17335 at commit [`e9b5580`](https://github.com/apache/spark/commit/e9b55800e9ad02ef07f081a5dfb4943ac5d80523).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org