You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zeppelin.apache.org by zjffdu <gi...@git.apache.org> on 2016/09/21 07:52:46 UTC

[GitHub] zeppelin pull request #1446: ZEPPELIN-1263. Should specify zeppelin's spark ...

GitHub user zjffdu opened a pull request:

    https://github.com/apache/zeppelin/pull/1446

    ZEPPELIN-1263. Should specify zeppelin's spark configuration through --conf arguments of spark-submit

    ### What is this PR for?
    
    For now we spark configuration at runtime rather than pass them through `--conf`, it would cause several issues.
    * Some configuration has to be set through --conf, otherwise we need to duplicate code in SparkSubmit.scala (spark.yarn.keytab, spark.yarn.principal)
    * Some configuration would conflict with spark-defaults.conf. If you specify spark.master as yarn-client in spark-defaults.conf but specify spark.master as local in zeppelin side, you will see the spark interpreter fail to start due to this inconsistency. 
    * As ZEPPELIN-1460 described, it is hard to figure what is the effective configuration. 
    * We can not use yarn-cluster mode although it is not supported now, but I think it is necessary to do that as zeppelin needs to support multiple users.
    
    So this PR would pass all the spark related configuration to spark-submit through `--conf`, so that it is easy to know and guarantee that configuration on zeppelin interpreter setting take precedence over spark-defaults.conf.  And it is also good for maintenance that upstream change (any change about configuration in spark) would not affect us. 
    
    
    ### What type of PR is it?
    [Improvement]
    
    ### Todos
    * [ ] - Task
    
    ### What is the Jira issue?
    * https://issues.apache.org/jira/browse/ZEPPELIN-1263
    
    ### How should this be tested?
    Tested spark 1.6 spark 2.0 on both yarn-client mode and embedded mode. 
    
    ### Screenshots (if appropriate)
    ![image](https://cloud.githubusercontent.com/assets/164491/18702212/3e7b54d0-8013-11e6-95f7-502b3cf89d67.png)
    
    ### Questions:
    * Does the licenses files need update? No
    * Is there breaking changes for older versions? No
    * Does this needs documentation? No
    
    \u2026

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zjffdu/zeppelin ZEPPELIN-1263

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/zeppelin/pull/1446.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1446
    
----
commit e36dfc1eb7cf06df4acb717a9701bf36f7a0afd5
Author: Jeff Zhang <zj...@apache.org>
Date:   2016-08-03T04:50:04Z

    ZEPPELIN-1263. Should specify zeppelin's spark configuration through --conf arguments of spark-submit

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1446: ZEPPELIN-1263. Should specify zeppelin's spark configu...

Posted by zjffdu <gi...@git.apache.org>.
Github user zjffdu commented on the issue:

    https://github.com/apache/zeppelin/pull/1446
  
    I will merge it if no more comments. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1446: ZEPPELIN-1263. Should specify zeppelin's spark configu...

Posted by zjffdu <gi...@git.apache.org>.
Github user zjffdu commented on the issue:

    https://github.com/apache/zeppelin/pull/1446
  
    I haven't tried that, but I guess it won't work. Because embedded mode is special case for zeppelin, and zeppelin didn't use spark-submit to launch embedded mode, so I guess it won't work in embedded mode. The workaround is to still set the properties in java code like what we did for now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1446: ZEPPELIN-1263. Should specify zeppelin's spark configu...

Posted by zjffdu <gi...@git.apache.org>.
Github user zjffdu commented on the issue:

    https://github.com/apache/zeppelin/pull/1446
  
    @jongyoul I won't do it in the existing interpreter refactoring PR. I will do it in other PRs, I believe there would still be several PRs for refactoring. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1446: ZEPPELIN-1263. Should specify zeppelin's spark configu...

Posted by jongyoul <gi...@git.apache.org>.
Github user jongyoul commented on the issue:

    https://github.com/apache/zeppelin/pull/1446
  
    Sounds good


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1446: ZEPPELIN-1263. Should specify zeppelin's spark configu...

Posted by jongyoul <gi...@git.apache.org>.
Github user jongyoul commented on the issue:

    https://github.com/apache/zeppelin/pull/1446
  
    I didn't dig into it but have simple question. Does it support embedded mode?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin pull request #1446: ZEPPELIN-1263. Should specify zeppelin's spark ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/zeppelin/pull/1446


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1446: ZEPPELIN-1263. Should specify zeppelin's spark configu...

Posted by zjffdu <gi...@git.apache.org>.
Github user zjffdu commented on the issue:

    https://github.com/apache/zeppelin/pull/1446
  
    @Leemoonsoo @jongyoul @felixcheung Please help review. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1446: ZEPPELIN-1263. Should specify zeppelin's spark configu...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the issue:

    https://github.com/apache/zeppelin/pull/1446
  
    I think we still support Spark 1.2, 1.3 officially so removing this might be an issue.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1446: ZEPPELIN-1263. Should specify zeppelin's spark configu...

Posted by jongyoul <gi...@git.apache.org>.
Github user jongyoul commented on the issue:

    https://github.com/apache/zeppelin/pull/1446
  
    You refactoring PR is already huge. I think creating new PR would be better.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1446: ZEPPELIN-1263. Should specify zeppelin's spark configu...

Posted by Leemoonsoo <gi...@git.apache.org>.
Github user Leemoonsoo commented on the issue:

    https://github.com/apache/zeppelin/pull/1446
  
    I agree on the direction of this PR. But on the implementation I wish we try keep `class RemoteInterpreter` interpreter independent as much as possible.
    
    This PR adds spark specific code to `RemoteInterpreter.getEnvFromInterpreterProperty()`.
    And we used to have spark specific code in `bin/interpreter.sh`.
    
    In the future, it'll be better if we provide generic facility in both `bin/interpreter.sh` and `class RemoteInterpreter` and move all spark specific code into spark interpreter module.
    
    What do you think @zjffdu? This doesn't need to be addressed in this PR but if you think it's right direction, i wish we at least create a issue for it, so we can keep track on it.
    
    Other than that, LGTM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1446: ZEPPELIN-1263. Should specify zeppelin's spark configu...

Posted by zjffdu <gi...@git.apache.org>.
Github user zjffdu commented on the issue:

    https://github.com/apache/zeppelin/pull/1446
  
    @jongyoul @Leemoonsoo @felixcheung Would you mind to take a look at this PR ? This issue blocks [ZEPPELIN-2715](https://issues.apache.org/jira/browse/ZEPPELIN-2715), [ZEPPELIN-2720](https://issues.apache.org/jira/browse/ZEPPELIN-2720) which I think is very critical. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1446: ZEPPELIN-1263. Should specify zeppelin's spark configu...

Posted by zjffdu <gi...@git.apache.org>.
Github user zjffdu commented on the issue:

    https://github.com/apache/zeppelin/pull/1446
  
    Thanks @Leemoonsoo , Agree to move them to spark interpreter instead of `RemoteInterpreter`, it is actually on my plan of interpreter refactoring.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---