You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zeppelin.apache.org by khalidhuseynov <gi...@git.apache.org> on 2017/01/04 10:20:13 UTC

[GitHub] zeppelin pull request #1840: [ZEPPELIN-1730, 1587] WIP add spark impersonati...

GitHub user khalidhuseynov opened a pull request:

    https://github.com/apache/zeppelin/pull/1840

    [ZEPPELIN-1730, 1587] WIP add spark impersonation through --proxy-user option

    ### What is this PR for?
    This is to add spark impersonation using --proxy-user option. note that it enables also to use spark impersonation without having logged user as system user with configured ssh.
    
    
    ### What type of PR is it?
    Improvement
    
    ### Todos
    * [x] - add for *nix
    * [ ] - add for windows
    * [x] - testing for standalone
    * [ ] - testing for yarn mode
    
    ### What is the Jira issue?
    Directly solves [ZEPPELIN-1730](https://issues.apache.org/jira/browse/ZEPPELIN-1730) and also solves [ZEPPELIN-1587](https://issues.apache.org/jira/browse/ZEPPELIN-1587) according to discussion in #1566 since using `--proxy-user` in `spark-submit` is preferable method.
    
    ### How should this be tested?
    1. switch your spark cluster to `per user` and `isolated` mode
    2. set up `user impersonation` flag
    3. run some job using that spark interpreter
    4. spark context should be created with currently logged in user credentials on behalf of system user
    
    ### Screenshots (if appropriate)
    ![spark_sc_impersonation](https://cloud.githubusercontent.com/assets/1642088/21639292/24240286-d224-11e6-8099-9bc74a06f0c2.gif)
    
    
    
    ### Questions:
    * Does the licenses files need update? no
    * Is there breaking changes for older versions? no
    * Does this needs documentation? no?


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/khalidhuseynov/incubator-zeppelin feat/spark-proxy-user

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/zeppelin/pull/1840.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1840
    
----
commit b68a4a06218977d94621ef2bde7506da45f821f9
Author: Khalid Huseynov <kh...@gmail.com>
Date:   2017-01-04T09:49:24Z

    add --proxy-user option for spark

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1840: [ZEPPELIN-1730, 1587] add spark impersonation through ...

Posted by khalidhuseynov <gi...@git.apache.org>.
Github user khalidhuseynov commented on the issue:

    https://github.com/apache/zeppelin/pull/1840
  
    Also @Leemoonsoo review on this one would be helpful


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1840: [ZEPPELIN-1730, 1587] add spark impersonation through ...

Posted by khalidhuseynov <gi...@git.apache.org>.
Github user khalidhuseynov commented on the issue:

    https://github.com/apache/zeppelin/pull/1840
  
    I just pushed changes to keep compatibility using `ZEPPELIN_IMPERSONATE_SPARK_PROXY_USER` env. variable that will disable usage of `--proxy-user` option.  after [SPARK-19143](https://issues.apache.org/jira/browse/SPARK-19143) resolved, maybe can come back to it again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1840: [ZEPPELIN-1730, 1587] add spark impersonation through ...

Posted by khalidhuseynov <gi...@git.apache.org>.
Github user khalidhuseynov commented on the issue:

    https://github.com/apache/zeppelin/pull/1840
  
    @zjffdu I agree about bringing security related features together in longer term, possibly `Credentials` menu could be used for that. 
    Also regarding previously discussed running of `--proxy-user` with yarn cluster mode, I believe it's currently not supported in Zeppelin. As far as I know only standalone and yarn-client modes are supported by pure Spark interpreter.
    @Tagar right, if used in that way, kerberos tickets wouldn't be renewed automatically. However as i said, I think Spark interpreter doesn't support yarn cluster mode, so using `ZEPPELIN_IMPERSONATE_CMD` with `kinit` wouldn't be required in that case. 
    
    also anyone having yarn cluster mode setup with kerberos is more than welcome to test it :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1840: [ZEPPELIN-1730, 1587] add spark impersonation through ...

Posted by zjffdu <gi...@git.apache.org>.
Github user zjffdu commented on the issue:

    https://github.com/apache/zeppelin/pull/1840
  
    @khalidhuseynov  Have you try it in secured cluster ? IIRC, `--proxy-user` can not work with `--principal` & `--keytab` together, that means in secured cluster, user have to run `kinit` instead of using `--principal` & `--keytab'. This might not be user expect. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin pull request #1840: [ZEPPELIN-1730, 1587] add spark impersonation t...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/zeppelin/pull/1840


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1840: [ZEPPELIN-1730, 1587] add spark impersonation through ...

Posted by khalidhuseynov <gi...@git.apache.org>.
Github user khalidhuseynov commented on the issue:

    https://github.com/apache/zeppelin/pull/1840
  
    this is ready for review. @prabhjyotsingh plz help review as original author, also @zjffdu @astroshim @Leemoonsoo as followup from #1566. CI failure in first profile is irrelevant and due to rat problem under [ZEPPELIN-1850](https://issues.apache.org/jira/browse/ZEPPELIN-1850)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1840: [ZEPPELIN-1730, 1587] add spark impersonation through ...

Posted by zjffdu <gi...@git.apache.org>.
Github user zjffdu commented on the issue:

    https://github.com/apache/zeppelin/pull/1840
  
    ```
    user configures export ZEPPELIN_IMPERSONATE_CMD in here with kinit <principal>@<REALM> -k -t <keytab file> and then it's run before spark-submit
    ```
    One concern is that this requires all the interpreters of one user share the same keytab/principal. e.g. spark interpreter may affect shell interpreter if they use different keytab/principal for the same user. For the long term, we may need to put security related settings in one central place rather than in each interpreter setting. 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin pull request #1840: [ZEPPELIN-1730, 1587] add spark impersonation t...

Posted by khalidhuseynov <gi...@git.apache.org>.
Github user khalidhuseynov closed the pull request at:

    https://github.com/apache/zeppelin/pull/1840


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1840: [ZEPPELIN-1730, 1587] add spark impersonation through ...

Posted by Tagar <gi...@git.apache.org>.
Github user Tagar commented on the issue:

    https://github.com/apache/zeppelin/pull/1840
  
    Thank you @khalidhuseynov .
    On 
    
    > user configures export ZEPPELIN_IMPERSONATE_CMD in here with kinit <principal>@<REALM> -k -t <keytab file> and then it's run before spark-submit
    
    The only problem I see with this option is that Kerberos tickets will not be renewed automatically, and will expire at some point.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin pull request #1840: [ZEPPELIN-1730, 1587] add spark impersonation t...

Posted by khalidhuseynov <gi...@git.apache.org>.
Github user khalidhuseynov closed the pull request at:

    https://github.com/apache/zeppelin/pull/1840


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1840: [ZEPPELIN-1730, 1587] add spark impersonation through ...

Posted by jongyoul <gi...@git.apache.org>.
Github user jongyoul commented on the issue:

    https://github.com/apache/zeppelin/pull/1840
  
    Merged it into master and branch-0.7


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin pull request #1840: [ZEPPELIN-1730, 1587] add spark impersonation t...

Posted by khalidhuseynov <gi...@git.apache.org>.
GitHub user khalidhuseynov reopened a pull request:

    https://github.com/apache/zeppelin/pull/1840

    [ZEPPELIN-1730, 1587] add spark impersonation through --proxy-user option

    ### What is this PR for?
    This is to add spark impersonation using --proxy-user option. note that it enables also to use spark impersonation without having logged user as system user with configured ssh.
    
    
    ### What type of PR is it?
    Improvement
    
    ### Todos
    * [x] - add `--proxy-user`
    * [x] - try on standalone spark 1.6.2
    * [x] - try on yarn-client mode spark 2.0.1
    
    ### What is the Jira issue?
    Directly solves [ZEPPELIN-1730](https://issues.apache.org/jira/browse/ZEPPELIN-1730) and also solves [ZEPPELIN-1587](https://issues.apache.org/jira/browse/ZEPPELIN-1587) according to discussion in #1566 since using `--proxy-user` in `spark-submit` is preferable method.
    
    ### How should this be tested?
    1. switch your spark cluster to `per user` and `isolated` mode
    2. set up `user impersonation` flag
    3. run some job using that spark interpreter
    4. spark context should be created with currently logged in user credentials on behalf of system user
    
    ### Screenshots (if appropriate)
    standalone
    ![spark_sc_impersonation](https://cloud.githubusercontent.com/assets/1642088/21639292/24240286-d224-11e6-8099-9bc74a06f0c2.gif)
    
    yarn-client
    <img width="997" alt="screen shot 2017-01-04 at 10 00 13 am" src="https://cloud.githubusercontent.com/assets/1642088/21653117/75410fde-d264-11e6-886f-11d8b5dbd29e.png">
    
    
    ### Questions:
    * Does the licenses files need update? no
    * Is there breaking changes for older versions? no
    * Does this needs documentation? yes


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/khalidhuseynov/incubator-zeppelin feat/spark-proxy-user

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/zeppelin/pull/1840.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1840
    
----
commit 4c3dba9e95ca23fe42055f2a039fbdb423a7f466
Author: Khalid Huseynov <kh...@gmail.com>
Date:   2017-01-04T09:49:24Z

    add --proxy-user option for spark

commit c1239726fe322b6d5281589716ce2006a3944095
Author: Khalid Huseynov <kh...@gmail.com>
Date:   2017-01-04T17:25:27Z

    add note in docs

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1840: [ZEPPELIN-1730, 1587] add spark impersonation through ...

Posted by khalidhuseynov <gi...@git.apache.org>.
Github user khalidhuseynov commented on the issue:

    https://github.com/apache/zeppelin/pull/1840
  
    @zjffdu i didn't try secured cluster mode yet, but as i checked spark documentation, they indeed don't allow using `--principal` & `--keytab` for spark-submit alongside with `--proxy-user` because of security issue on exposing keytab. Then possible solutions could be: 
    1. user configures `export ZEPPELIN_IMPERSONATE_CMD` in [here](https://github.com/apache/zeppelin/blob/d1fc86b7b2d2012c0323345166c98cc02886e9f1/conf/zeppelin-env.sh.template#L83) with `kinit <principal>@<REALM> -k -t <keytab file>` and then it's run before `spark-submit`
    2. don't use `--proxy-user` in cluster mode
    3. other suggestions



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1840: [ZEPPELIN-1730, 1587] add spark impersonation through ...

Posted by Tagar <gi...@git.apache.org>.
Github user Tagar commented on the issue:

    https://github.com/apache/zeppelin/pull/1840
  
    As far as credentials refresh are concerned, please see new comments in [SPARK-19143](https://issues.apache.org/jira/browse/SPARK-19143). 
    Hope this helps.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin pull request #1840: [ZEPPELIN-1730, 1587] add spark impersonation t...

Posted by khalidhuseynov <gi...@git.apache.org>.
GitHub user khalidhuseynov reopened a pull request:

    https://github.com/apache/zeppelin/pull/1840

    [ZEPPELIN-1730, 1587] add spark impersonation through --proxy-user option

    ### What is this PR for?
    This is to add spark impersonation using --proxy-user option. note that it enables also to use spark impersonation without having logged user as system user with configured ssh.
    
    
    ### What type of PR is it?
    Improvement
    
    ### Todos
    * [x] - add `--proxy-user`
    * [x] - try on standalone spark 1.6.2
    * [x] - try on yarn-client mode spark 2.0.1
    
    ### What is the Jira issue?
    Directly solves [ZEPPELIN-1730](https://issues.apache.org/jira/browse/ZEPPELIN-1730) and also solves [ZEPPELIN-1587](https://issues.apache.org/jira/browse/ZEPPELIN-1587) according to discussion in #1566 since using `--proxy-user` in `spark-submit` is preferable method.
    
    ### How should this be tested?
    1. switch your spark cluster to `per user` and `isolated` mode
    2. set up `user impersonation` flag
    3. run some job using that spark interpreter
    4. spark context should be created with currently logged in user credentials on behalf of system user
    
    ### Screenshots (if appropriate)
    standalone
    ![spark_sc_impersonation](https://cloud.githubusercontent.com/assets/1642088/21639292/24240286-d224-11e6-8099-9bc74a06f0c2.gif)
    
    yarn-client
    <img width="997" alt="screen shot 2017-01-04 at 10 00 13 am" src="https://cloud.githubusercontent.com/assets/1642088/21653117/75410fde-d264-11e6-886f-11d8b5dbd29e.png">
    
    
    ### Questions:
    * Does the licenses files need update? no
    * Is there breaking changes for older versions? no
    * Does this needs documentation? yes


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/khalidhuseynov/incubator-zeppelin feat/spark-proxy-user

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/zeppelin/pull/1840.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1840
    
----
commit 4c3dba9e95ca23fe42055f2a039fbdb423a7f466
Author: Khalid Huseynov <kh...@gmail.com>
Date:   2017-01-04T09:49:24Z

    add --proxy-user option for spark

commit c1239726fe322b6d5281589716ce2006a3944095
Author: Khalid Huseynov <kh...@gmail.com>
Date:   2017-01-04T17:25:27Z

    add note in docs

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---