You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by vanzin <gi...@git.apache.org> on 2015/02/05 23:27:00 UTC

[GitHub] spark pull request: [SPARK-5493] [core] Add option to impersonate ...

GitHub user vanzin opened a pull request:

    https://github.com/apache/spark/pull/4405

    [SPARK-5493] [core] Add option to impersonate user.

    Hadoop has a feature that allows users to impersonate other users
    when submitting applications or talking to HDFS, for example. These
    impersonated users are referred generally as "proxy users".
    
    Services such as Oozie or Hive use this feature to run applications
    as the requesting user.
    
    This change makes SparkSubmit accept a new command line option to
    run the application as a proxy user. It also fixes the plumbing
    of the user name through the UI (and a couple of other places) to
    refer to the correct user running the application, which can be
    different that `sys.props("user.name")` even without proxies (e.g.
    when using kerberos).

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/vanzin/spark SPARK-5493

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/4405.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4405
    
----
commit 0540d38fabe08cb63aecd9df953bed5cbe3bfa62
Author: Marcelo Vanzin <va...@cloudera.com>
Date:   2015-02-05T21:49:47Z

    [SPARK-5493] [core] Add option to impersonate user.
    
    Hadoop has a feature that allows users to impersonate other users
    when submitting applications or talking to HDFS, for example. These
    impersonated users are referred generally as "proxy users".
    
    Services such as Oozie or Hive use this feature to run applications
    as the requesting user.
    
    This change makes SparkSubmit accept a new command line option to
    run the application as a proxy user. It also fixes the plumbing
    of the user name through the UI (and a couple of other places) to
    refer to the correct user running the application, which can be
    different that `sys.props("user.name")` even without proxies (e.g.
    when using kerberos).

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5493] [core] Add option to impersonate ...

Posted by chesterxgchen <gi...@git.apache.org>.
Github user chesterxgchen commented on the pull request:

    https://github.com/apache/spark/pull/4405#issuecomment-73636154
  
    @vanzin 
    
        Did you test with the secured Hadoop Cluster or just normal cluster ?  If the hadoop cluster is secured, I think these assumptions are required. I just finished our Hadoop Kerberos authentication implementation with Pig, MapReduce, HDFS, Sqoop and Spark recently (For Spark with Yarn Cluster mode).  I don't think you can access the secure cluster without kerberos authentication ( assumption 1). And if the UserGroupInformation uses the SIMPLE mode to access secured hadoop cluster, you will get exception at certain point ( assumption 3).
    
    In our case, we did not use SparkSubmit, but directly use Yarn Client. I don't understand why the standalone mode or messos mode won't need have job delegation token ? Maybe you can elaborate that a bit more.  
    
    If you see in oozie's implementation, you can see that before the MR job is submitted, the job delegation is added to the Jobclient's credential.  This is regardless using Yarn or not. 
    
    Another question related to the overall approach. This seems to be fine with the command line calling SparkSubmmit.  As the current user can be authenticated with kinit, and the proxy-user can be impersonated via createProxyUser.  The user who manages the spark job submit is responsible for managing kerberos TGT lifetime and renewal etc. If the ticket is expired, user can re-run the kinit or use the cron job to keep it from expire. In this case, the spark is merely create proxy user. 
    
    For application ( for example, a programs that submit the spark job directly, not from command line), this seems approach doesn't seem to help much. As the application can createProxyUser in its program instead of let spark do it, application already do the kerberosLogin ( UserGroupInformation.loginUserFromKeytab), renew UserGroupInformation.checkTGTAndReloginFromKeytab, handle ticket expiration, add job token etc. 
    
    So is the approach is only intended for command line use ? does it make sense to push more logic into spark ? Or this logic doesn't belong to spark ? 
    
    
    thanks
    Chester Chen
    
    
    
    
    
    
     
    
    
      
        
        


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5493] [core] Add option to impersonate ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/4405


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5493] [core] Add option to impersonate ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4405#issuecomment-73790093
  
      [Test build #27233 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27233/consoleFull) for   PR 4405 at commit [`05bfc08`](https://github.com/apache/spark/commit/05bfc08192b1af7e56170e4a9a80b3bfc5456459).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5493] [core] Add option to impersonate ...

Posted by harishreedharan <gi...@git.apache.org>.
Github user harishreedharan commented on the pull request:

    https://github.com/apache/spark/pull/4405#issuecomment-73305341
  
    In this case, you are only running SparkSubmit as the proxy user. Should we not have the executor code also run as the proxy user, so any writes from the app to HDFS shows the proxy user - or is that not the intent?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5493] [core] Add option to impersonate ...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4405#discussion_r24394065
  
    --- Diff: bin/utils.sh ---
    @@ -35,7 +35,8 @@ function gatherSparkSubmitOpts() {
           --master | --deploy-mode | --class | --name | --jars | --packages | --py-files | --files | \
           --conf | --repositories | --properties-file | --driver-memory | --driver-java-options | \
           --driver-library-path | --driver-class-path | --executor-memory | --driver-cores | \
    -      --total-executor-cores | --executor-cores | --queue | --num-executors | --archives)
    +      --total-executor-cores | --executor-cores | --queue | --num-executors | --archives |
    --- End diff --
    
    does this need a `\` similar to earlier lines?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5493] [core] Add option to impersonate ...

Posted by hemshankar <gi...@git.apache.org>.
Github user hemshankar commented on the pull request:

    https://github.com/apache/spark/pull/4405#issuecomment-171233550
  
    I have few doubts about running in client mode and cluster mode.
    Currently I am using a cloudera hadoop single node cluster (kerberos enabled.)
    
    In client mode I use following commands 
    
        kinit
        spark-submit --master yarn-client --proxy-user cloudera examples/src/main/python/pi.py 
    
    This works fine. In cluster mode I use following command (no kinit done and no TGT is present in the cache) 
    
        spark-submit --principal <myprinc> --keytab <KT location> --master yarn-cluster examples/src/main/python/pi.py 
    
    Also works fine. But when I use following command in cluster mode (no kinit done and no TGT is present in the cache) 
    
           spark-submit --principal <myprinc> --keytab <KT location> --master yarn-cluster --proxy-user cloudera examples/src/main/python/pi.py 
    
    throws following error
    
          No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
    
    I guess in cluster mode the spark-submit do not look for TGT in the client machine... it transfers the "keytab" file to the cluster and then starts the spark job. So why does the specifying "--proxy-user" option looks for TGT while submitting in the "yarn-cluster" mode. Am I doing some thing wrong.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5493] [core] Add option to impersonate ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4405#issuecomment-73805185
  
      [Test build #27234 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27234/consoleFull) for   PR 4405 at commit [`df82427`](https://github.com/apache/spark/commit/df82427a3bae88957744b5d5aeca06a8d6d36d1f).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5493] [core] Add option to impersonate ...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4405#discussion_r24451160
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
    @@ -145,7 +147,7 @@ object SparkSubmit {
           }
         // In all other modes, just run the main class as prepared
         } else {
    -      runMain(childArgs, childClasspath, sysProps, childMainClass)
    +      runMain(childArgs, childClasspath, sysProps, childMainClass, args)
    --- End diff --
    
    Yeah, working on it. I'm trying to figure out why the exceptions are not being printed when I do that (the thing I mentioned to you on IM). Hopefully I'll have an update soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5493] [core] Add option to impersonate ...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/4405#issuecomment-73640354
  
    bq. Did you test with the secured Hadoop Cluster or just normal cluster ?
    
    Both. In kerberos mode you have to be logged in before you submit the app, but that's true before this change. If you're not logged in, you can't submit. So ths change is not changing any assumptions.
    
    bq. I don't understand why the standalone mode or messos mode won't need have job delegation token ?
    
    Because they don't need it. They don't work with kerberos, and you don't need delegation tokens without kerberos.
    
    bq. If you see in oozie's implementation, you can see that before the MR job is submitted
    
    Not sure how that's related to Spark. Spark gets the needed delegation tokens, there's nothing else to be done.
    
    bq. For application ( for example, a programs that submit the spark job directly, not from command line), this seems approach doesn't seem to help much.
    
    Both Oozie and Hive need to fork external processes to run Spark. When those processes fork, they'll be running with the kerberos credentials of the user running those services, not as the "proxy user". So the forked process needs to know which user to impersonate. `loginUserFromKeytab` is irrelevant here.
    
    bq. So is the approach is only intended for command line use ? does it make sense to push more logic into spark ?
    
    Yes, this is only for command line use (or, in other words, running Spark as a separate process). Anything else would be a lot more complicated and probably a much larger project, that is really not needed for the use case at hand (Hive and, eventually, Oozie).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5493] [core] Add option to impersonate ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4405#issuecomment-73143546
  
      [Test build #26861 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26861/consoleFull) for   PR 4405 at commit [`0540d38`](https://github.com/apache/spark/commit/0540d38fabe08cb63aecd9df953bed5cbe3bfa62).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5493] [core] Add option to impersonate ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4405#issuecomment-73299786
  
      [Test build #26930 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26930/consoleFull) for   PR 4405 at commit [`b6c947d`](https://github.com/apache/spark/commit/b6c947df7131b88455380115088ef7bf336a17f3).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5493] [core] Add option to impersonate ...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/4405#issuecomment-73658373
  
    Hey @vanzin I made two comments but overall this looks good.
    
    One question - in cluster mode, does this `doAs` propagate along to the yarn submission client? I'm assuming it does (and that this handles both cases) but I thought I would ask to be sure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5493] [core] Add option to impersonate ...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/4405#issuecomment-73754222
  
    Hi @pwendell ,
    
    > One question - in cluster mode, does this doAs propagate along to the yarn submission client?
    
    The submission is run as the proxy user. So when you start the "cluster mode" app, it will be started as that proxy user.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5493] [core] Add option to impersonate ...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/4405#issuecomment-73819102
  
    LGTM - thanks @vanzin I'll pull it in!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5493] [core] Add option to impersonate ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4405#issuecomment-73154049
  
      [Test build #26861 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26861/consoleFull) for   PR 4405 at commit [`0540d38`](https://github.com/apache/spark/commit/0540d38fabe08cb63aecd9df953bed5cbe3bfa62).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5493] [core] Add option to impersonate ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4405#issuecomment-73792703
  
      [Test build #27234 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27234/consoleFull) for   PR 4405 at commit [`df82427`](https://github.com/apache/spark/commit/df82427a3bae88957744b5d5aeca06a8d6d36d1f).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5493] [core] Add option to impersonate ...

Posted by hemshankar <gi...@git.apache.org>.
Github user hemshankar commented on the pull request:

    https://github.com/apache/spark/pull/4405#issuecomment-171229397
  
    What do we mean by plumbing of the user name through the UI


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5493] [core] Add option to impersonate ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4405#issuecomment-73803436
  
      [Test build #27233 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27233/consoleFull) for   PR 4405 at commit [`05bfc08`](https://github.com/apache/spark/commit/05bfc08192b1af7e56170e4a9a80b3bfc5456459).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5493] [core] Add option to impersonate ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4405#issuecomment-73154059
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26861/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5493] [core] Add option to impersonate ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4405#issuecomment-73305261
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26930/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5493] [core] Add option to impersonate ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4405#issuecomment-73346318
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26979/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5493] [core] Add option to impersonate ...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4405#discussion_r24448906
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
    @@ -145,7 +147,7 @@ object SparkSubmit {
           }
         // In all other modes, just run the main class as prepared
         } else {
    -      runMain(childArgs, childClasspath, sysProps, childMainClass)
    +      runMain(childArgs, childClasspath, sysProps, childMainClass, args)
    --- End diff --
    
    @vanzin any thoughts on this one?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5493] [core] Add option to impersonate ...

Posted by harishreedharan <gi...@git.apache.org>.
Github user harishreedharan commented on the pull request:

    https://github.com/apache/spark/pull/4405#issuecomment-73325840
  
    Oh, I didn't know YARN already takes care of run the container as the requesting user.
    
    +1.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5493] [core] Add option to impersonate ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4405#issuecomment-73803442
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27233/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5493] [core] Add option to impersonate ...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/4405#issuecomment-73754047
  
    Hi @chesterxgchen,
    
    >  is spark standalone mode that does not support kerberos ?
    
    Correct. Currently, there's no point in talking about kerberos for anything but Yarn.
    
    >  I am thinking about the application that run spark job without forking a process.
    
    That is obviously not covered by this patch. For that application, it would need to do the impersonation itself before creating a SparkContext (or doing whatever it does to launch Spark).
    
    Well, it could potentially call `SparkSubmit` programatically and pass this argument, but I don't think that's a supported use case.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5493] [core] Add option to impersonate ...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4405#discussion_r24394308
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
    @@ -145,7 +147,7 @@ object SparkSubmit {
           }
         // In all other modes, just run the main class as prepared
         } else {
    -      runMain(childArgs, childClasspath, sysProps, childMainClass)
    +      runMain(childArgs, childClasspath, sysProps, childMainClass, args)
    --- End diff --
    
    Could we avoid having to pass `SparkSubmitArguments` to `runMain` and instead just deal with the `doAs` block out here? You only need it do it for this lower case. In this script there is a actually a fairly good separation of concerns here that the interface to `runMain` occurs strictly downstream from the somewhat complicated logic in `SparkSubmitArguments`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5493] [core] Add option to impersonate ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4405#issuecomment-73346313
  
      [Test build #26979 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26979/consoleFull) for   PR 4405 at commit [`8af06ff`](https://github.com/apache/spark/commit/8af06ff3154807807332c3619ebbff32a6349977).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5493] [core] Add option to impersonate ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4405#issuecomment-73342841
  
      [Test build #26979 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26979/consoleFull) for   PR 4405 at commit [`8af06ff`](https://github.com/apache/spark/commit/8af06ff3154807807332c3619ebbff32a6349977).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5493] [core] Add option to impersonate ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4405#issuecomment-73805192
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27234/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5493] [core] Add option to impersonate ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4405#issuecomment-73305254
  
      [Test build #26930 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26930/consoleFull) for   PR 4405 at commit [`b6c947d`](https://github.com/apache/spark/commit/b6c947df7131b88455380115088ef7bf336a17f3).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `      "public class " + className + extendsText + " implements java.io.Serializable `
      * `  case class RegisterExecutor(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5493] [core] Add option to impersonate ...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/4405#issuecomment-171383089
  
    @hemshankar Please don't use github to ask questions / point out possible issues. See http://spark.apache.org/community.html.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5493] [core] Add option to impersonate ...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/4405#issuecomment-73308548
  
    That should already be handled.
    
    - `CoarseGrainedExecutorBackend` does run the executor inside a `doAs` block, run as `SPARK_USER`
    - Yarn actually runs the underlying container processes as the requesting user too (unlike standalone)
    - HDFS authorization is done through the delegation tokens generated by the driver (see `Client.scala:obtainTokensForNamenodes`).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5493] [core] Add option to impersonate ...

Posted by chesterxgchen <gi...@git.apache.org>.
Github user chesterxgchen commented on the pull request:

    https://github.com/apache/spark/pull/4405#issuecomment-73642035
  
    Thanks for the detailed reply.
    
    >>
    
    >>bq. I don't understand why the standalone mode or messos mode won't
    >>need have job delegation token ?
    
    >>Because they don't need it. They don't work with kerberos, and you don't
    >>need delegation tokens without kerberos.
    
    >>bq. If you see in oozie's implementation, you can see that before the MR
    job >>is submitted
    
    >>Not sure how that's related to Spark. Spark gets the needed delegation
    >>tokens, there's nothing else to be done.
    
    you answered above as through they are separate questions, actually it's
    one question.  I am not sure you understand my original question.
    
    Spark submit MR job just like other application such as oozie to submit MR
    job. In the secured cluster, you will need to use delegation token before
    submit, i am just using oozie as example, this can be Oozie, Hue, Knox or
    Tajo.  I am under the assumption that spark would need to have delegation
    as well even in the standalone mode similar to other application.
    
    I did not get a clear picture from your answer: is spark standalone mode
    that does not support kerberos ? or is the spark already has the delegation
    token ?
    
    
    >>Both Oozie and Hive need to fork external processes to run Spark. When
    >>those processes fork, they'll be running with the kerberos credentials of
    the >>user running those services, not as the "proxy user". So the forked
    process >>needs to know which user to impersonate. loginUserFromKeytab is
    >>irrelevant here.
    
    if you forked the process, of course, that's a different story. I am
    thinking about the application that run spark job without forking a
    process. Our application is like this and I am sure there are many other
    applications like this, which don't need run kinit from commandline.
    
    Thanks for clarify what's the PR is intended for.
    
    Chester
    
    On Mon, Feb 9, 2015 at 7:54 PM, Marcelo Vanzin <no...@github.com>
    wrote:
    
    > bq. Did you test with the secured Hadoop Cluster or just normal cluster ?
    >
    > Both. In kerberos mode you have to be logged in before you submit the app,
    > but that's true before this change. If you're not logged in, you can't
    > submit. So ths change is not changing any assumptions.
    >
    > bq. I don't understand why the standalone mode or messos mode won't need
    > have job delegation token ?
    >
    > Because they don't need it. They don't work with kerberos, and you don't
    > need delegation tokens without kerberos.
    >
    > bq. If you see in oozie's implementation, you can see that before the MR
    > job is submitted
    >
    > Not sure how that's related to Spark. Spark gets the needed delegation
    > tokens, there's nothing else to be done.
    >
    > bq. For application ( for example, a programs that submit the spark job
    > directly, not from command line), this seems approach doesn't seem to help
    > much.
    >
    > Both Oozie and Hive need to fork external processes to run Spark. When
    > those processes fork, they'll be running with the kerberos credentials of
    > the user running those services, not as the "proxy user". So the forked
    > process needs to know which user to impersonate. loginUserFromKeytab is
    > irrelevant here.
    >
    > bq. So is the approach is only intended for command line use ? does it
    > make sense to push more logic into spark ?
    >
    > Yes, this is only for command line use (or, in other words, running Spark
    > as a separate process). Anything else would be a lot more complicated and
    > probably a much larger project, that is really not needed for the use case
    > at hand (Hive and, eventually, Oozie).
    >
    > —
    > Reply to this email directly or view it on GitHub
    > <https://github.com/apache/spark/pull/4405#issuecomment-73640354>.
    >



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5493] [core] Add option to impersonate ...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/4405#issuecomment-73143745
  
    Some description of the testing I did:
    
    - normal submission without kerberos
    - impersonated submition without kerberos (got expected "unauthorized" error from Yarn since I did not have impersonation configured)
    - normal submission as system user ("oozie" - id = 480) with kerberos (denied by Yarn since id < 1000)
    - impersonated submission as same system user with kerberos, app runs as proxy user (and proxy user shows up in history as owner of the app).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5493] [core] Add option to impersonate ...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/4405#issuecomment-73558881
  
    Hi @chesterxgchen.
    
    Assumption 1 and 3 wrong: the code doesn't assume that. Proxy users work just fine without kerberos as long as the configuration allows it (in fact I just tested it). You may argue that it's useless in non-kerberos mode (where you can just use `UserGroupInfomation.createRemoteUser()` and achieve the same), but that's beyond the point of this patch.
    
    Errors, just like any other error when submitting the app to Yarn, are reported as exceptions thrown by the Yarn client code. I assume that standalone and mesos have no notion of proxy users, so they wouldn't ever complain about it.
    
    > For spark jobs, should one need to add delegation token to the job's credential ?
    
    Delegation tokens are only needed with kerberos, which is only supported by Yarn currently, which as you already noticed already handles them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org