You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zeppelin.apache.org by khalidhuseynov <gi...@git.apache.org> on 2017/01/04 10:20:13 UTC
[GitHub] zeppelin pull request #1840: [ZEPPELIN-1730, 1587] WIP add spark impersonati...
GitHub user khalidhuseynov opened a pull request:
https://github.com/apache/zeppelin/pull/1840
[ZEPPELIN-1730, 1587] WIP add spark impersonation through --proxy-user option
### What is this PR for?
This is to add spark impersonation using --proxy-user option. note that it enables also to use spark impersonation without having logged user as system user with configured ssh.
### What type of PR is it?
Improvement
### Todos
* [x] - add for *nix
* [ ] - add for windows
* [x] - testing for standalone
* [ ] - testing for yarn mode
### What is the Jira issue?
Directly solves [ZEPPELIN-1730](https://issues.apache.org/jira/browse/ZEPPELIN-1730) and also solves [ZEPPELIN-1587](https://issues.apache.org/jira/browse/ZEPPELIN-1587) according to discussion in #1566 since using `--proxy-user` in `spark-submit` is preferable method.
### How should this be tested?
1. switch your spark cluster to `per user` and `isolated` mode
2. set up `user impersonation` flag
3. run some job using that spark interpreter
4. spark context should be created with currently logged in user credentials on behalf of system user
### Screenshots (if appropriate)
![spark_sc_impersonation](https://cloud.githubusercontent.com/assets/1642088/21639292/24240286-d224-11e6-8099-9bc74a06f0c2.gif)
### Questions:
* Does the licenses files need update? no
* Is there breaking changes for older versions? no
* Does this needs documentation? no?
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/khalidhuseynov/incubator-zeppelin feat/spark-proxy-user
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/zeppelin/pull/1840.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1840
----
commit b68a4a06218977d94621ef2bde7506da45f821f9
Author: Khalid Huseynov <kh...@gmail.com>
Date: 2017-01-04T09:49:24Z
add --proxy-user option for spark
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] zeppelin issue #1840: [ZEPPELIN-1730, 1587] add spark impersonation through ...
Posted by khalidhuseynov <gi...@git.apache.org>.
Github user khalidhuseynov commented on the issue:
https://github.com/apache/zeppelin/pull/1840
Also @Leemoonsoo review on this one would be helpful
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] zeppelin issue #1840: [ZEPPELIN-1730, 1587] add spark impersonation through ...
Posted by khalidhuseynov <gi...@git.apache.org>.
Github user khalidhuseynov commented on the issue:
https://github.com/apache/zeppelin/pull/1840
I just pushed changes to keep compatibility using `ZEPPELIN_IMPERSONATE_SPARK_PROXY_USER` env. variable that will disable usage of `--proxy-user` option. after [SPARK-19143](https://issues.apache.org/jira/browse/SPARK-19143) resolved, maybe can come back to it again.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] zeppelin issue #1840: [ZEPPELIN-1730, 1587] add spark impersonation through ...
Posted by khalidhuseynov <gi...@git.apache.org>.
Github user khalidhuseynov commented on the issue:
https://github.com/apache/zeppelin/pull/1840
@zjffdu I agree about bringing security related features together in longer term, possibly `Credentials` menu could be used for that.
Also regarding previously discussed running of `--proxy-user` with yarn cluster mode, I believe it's currently not supported in Zeppelin. As far as I know only standalone and yarn-client modes are supported by pure Spark interpreter.
@Tagar right, if used in that way, kerberos tickets wouldn't be renewed automatically. However as i said, I think Spark interpreter doesn't support yarn cluster mode, so using `ZEPPELIN_IMPERSONATE_CMD` with `kinit` wouldn't be required in that case.
also anyone having yarn cluster mode setup with kerberos is more than welcome to test it :)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] zeppelin issue #1840: [ZEPPELIN-1730, 1587] add spark impersonation through ...
Posted by zjffdu <gi...@git.apache.org>.
Github user zjffdu commented on the issue:
https://github.com/apache/zeppelin/pull/1840
@khalidhuseynov Have you try it in secured cluster ? IIRC, `--proxy-user` can not work with `--principal` & `--keytab` together, that means in secured cluster, user have to run `kinit` instead of using `--principal` & `--keytab'. This might not be user expect.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] zeppelin pull request #1840: [ZEPPELIN-1730, 1587] add spark impersonation t...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/zeppelin/pull/1840
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] zeppelin issue #1840: [ZEPPELIN-1730, 1587] add spark impersonation through ...
Posted by khalidhuseynov <gi...@git.apache.org>.
Github user khalidhuseynov commented on the issue:
https://github.com/apache/zeppelin/pull/1840
this is ready for review. @prabhjyotsingh plz help review as original author, also @zjffdu @astroshim @Leemoonsoo as followup from #1566. CI failure in first profile is irrelevant and due to rat problem under [ZEPPELIN-1850](https://issues.apache.org/jira/browse/ZEPPELIN-1850)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] zeppelin issue #1840: [ZEPPELIN-1730, 1587] add spark impersonation through ...
Posted by zjffdu <gi...@git.apache.org>.
Github user zjffdu commented on the issue:
https://github.com/apache/zeppelin/pull/1840
```
user configures export ZEPPELIN_IMPERSONATE_CMD in here with kinit <principal>@<REALM> -k -t <keytab file> and then it's run before spark-submit
```
One concern is that this requires all the interpreters of one user share the same keytab/principal. e.g. spark interpreter may affect shell interpreter if they use different keytab/principal for the same user. For the long term, we may need to put security related settings in one central place rather than in each interpreter setting.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] zeppelin pull request #1840: [ZEPPELIN-1730, 1587] add spark impersonation t...
Posted by khalidhuseynov <gi...@git.apache.org>.
Github user khalidhuseynov closed the pull request at:
https://github.com/apache/zeppelin/pull/1840
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] zeppelin issue #1840: [ZEPPELIN-1730, 1587] add spark impersonation through ...
Posted by Tagar <gi...@git.apache.org>.
Github user Tagar commented on the issue:
https://github.com/apache/zeppelin/pull/1840
Thank you @khalidhuseynov .
On
> user configures export ZEPPELIN_IMPERSONATE_CMD in here with kinit <principal>@<REALM> -k -t <keytab file> and then it's run before spark-submit
The only problem I see with this option is that Kerberos tickets will not be renewed automatically, and will expire at some point.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] zeppelin pull request #1840: [ZEPPELIN-1730, 1587] add spark impersonation t...
Posted by khalidhuseynov <gi...@git.apache.org>.
Github user khalidhuseynov closed the pull request at:
https://github.com/apache/zeppelin/pull/1840
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] zeppelin issue #1840: [ZEPPELIN-1730, 1587] add spark impersonation through ...
Posted by jongyoul <gi...@git.apache.org>.
Github user jongyoul commented on the issue:
https://github.com/apache/zeppelin/pull/1840
Merged it into master and branch-0.7
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] zeppelin pull request #1840: [ZEPPELIN-1730, 1587] add spark impersonation t...
Posted by khalidhuseynov <gi...@git.apache.org>.
GitHub user khalidhuseynov reopened a pull request:
https://github.com/apache/zeppelin/pull/1840
[ZEPPELIN-1730, 1587] add spark impersonation through --proxy-user option
### What is this PR for?
This is to add spark impersonation using --proxy-user option. note that it enables also to use spark impersonation without having logged user as system user with configured ssh.
### What type of PR is it?
Improvement
### Todos
* [x] - add `--proxy-user`
* [x] - try on standalone spark 1.6.2
* [x] - try on yarn-client mode spark 2.0.1
### What is the Jira issue?
Directly solves [ZEPPELIN-1730](https://issues.apache.org/jira/browse/ZEPPELIN-1730) and also solves [ZEPPELIN-1587](https://issues.apache.org/jira/browse/ZEPPELIN-1587) according to discussion in #1566 since using `--proxy-user` in `spark-submit` is preferable method.
### How should this be tested?
1. switch your spark cluster to `per user` and `isolated` mode
2. set up `user impersonation` flag
3. run some job using that spark interpreter
4. spark context should be created with currently logged in user credentials on behalf of system user
### Screenshots (if appropriate)
standalone
![spark_sc_impersonation](https://cloud.githubusercontent.com/assets/1642088/21639292/24240286-d224-11e6-8099-9bc74a06f0c2.gif)
yarn-client
<img width="997" alt="screen shot 2017-01-04 at 10 00 13 am" src="https://cloud.githubusercontent.com/assets/1642088/21653117/75410fde-d264-11e6-886f-11d8b5dbd29e.png">
### Questions:
* Does the licenses files need update? no
* Is there breaking changes for older versions? no
* Does this needs documentation? yes
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/khalidhuseynov/incubator-zeppelin feat/spark-proxy-user
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/zeppelin/pull/1840.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1840
----
commit 4c3dba9e95ca23fe42055f2a039fbdb423a7f466
Author: Khalid Huseynov <kh...@gmail.com>
Date: 2017-01-04T09:49:24Z
add --proxy-user option for spark
commit c1239726fe322b6d5281589716ce2006a3944095
Author: Khalid Huseynov <kh...@gmail.com>
Date: 2017-01-04T17:25:27Z
add note in docs
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] zeppelin issue #1840: [ZEPPELIN-1730, 1587] add spark impersonation through ...
Posted by khalidhuseynov <gi...@git.apache.org>.
Github user khalidhuseynov commented on the issue:
https://github.com/apache/zeppelin/pull/1840
@zjffdu i didn't try secured cluster mode yet, but as i checked spark documentation, they indeed don't allow using `--principal` & `--keytab` for spark-submit alongside with `--proxy-user` because of security issue on exposing keytab. Then possible solutions could be:
1. user configures `export ZEPPELIN_IMPERSONATE_CMD` in [here](https://github.com/apache/zeppelin/blob/d1fc86b7b2d2012c0323345166c98cc02886e9f1/conf/zeppelin-env.sh.template#L83) with `kinit <principal>@<REALM> -k -t <keytab file>` and then it's run before `spark-submit`
2. don't use `--proxy-user` in cluster mode
3. other suggestions
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] zeppelin issue #1840: [ZEPPELIN-1730, 1587] add spark impersonation through ...
Posted by Tagar <gi...@git.apache.org>.
Github user Tagar commented on the issue:
https://github.com/apache/zeppelin/pull/1840
As far as credentials refresh are concerned, please see new comments in [SPARK-19143](https://issues.apache.org/jira/browse/SPARK-19143).
Hope this helps.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] zeppelin pull request #1840: [ZEPPELIN-1730, 1587] add spark impersonation t...
Posted by khalidhuseynov <gi...@git.apache.org>.
GitHub user khalidhuseynov reopened a pull request:
https://github.com/apache/zeppelin/pull/1840
[ZEPPELIN-1730, 1587] add spark impersonation through --proxy-user option
### What is this PR for?
This is to add spark impersonation using --proxy-user option. note that it enables also to use spark impersonation without having logged user as system user with configured ssh.
### What type of PR is it?
Improvement
### Todos
* [x] - add `--proxy-user`
* [x] - try on standalone spark 1.6.2
* [x] - try on yarn-client mode spark 2.0.1
### What is the Jira issue?
Directly solves [ZEPPELIN-1730](https://issues.apache.org/jira/browse/ZEPPELIN-1730) and also solves [ZEPPELIN-1587](https://issues.apache.org/jira/browse/ZEPPELIN-1587) according to discussion in #1566 since using `--proxy-user` in `spark-submit` is preferable method.
### How should this be tested?
1. switch your spark cluster to `per user` and `isolated` mode
2. set up `user impersonation` flag
3. run some job using that spark interpreter
4. spark context should be created with currently logged in user credentials on behalf of system user
### Screenshots (if appropriate)
standalone
![spark_sc_impersonation](https://cloud.githubusercontent.com/assets/1642088/21639292/24240286-d224-11e6-8099-9bc74a06f0c2.gif)
yarn-client
<img width="997" alt="screen shot 2017-01-04 at 10 00 13 am" src="https://cloud.githubusercontent.com/assets/1642088/21653117/75410fde-d264-11e6-886f-11d8b5dbd29e.png">
### Questions:
* Does the licenses files need update? no
* Is there breaking changes for older versions? no
* Does this needs documentation? yes
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/khalidhuseynov/incubator-zeppelin feat/spark-proxy-user
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/zeppelin/pull/1840.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1840
----
commit 4c3dba9e95ca23fe42055f2a039fbdb423a7f466
Author: Khalid Huseynov <kh...@gmail.com>
Date: 2017-01-04T09:49:24Z
add --proxy-user option for spark
commit c1239726fe322b6d5281589716ce2006a3944095
Author: Khalid Huseynov <kh...@gmail.com>
Date: 2017-01-04T17:25:27Z
add note in docs
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---