You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zeppelin.apache.org by zjffdu <gi...@git.apache.org> on 2017/02/03 01:07:27 UTC

[GitHub] zeppelin pull request #1970: ZEPPELIN-2045. Pass interpreter properties with...

GitHub user zjffdu opened a pull request:

    https://github.com/apache/zeppelin/pull/1970

    ZEPPELIN-2045. Pass interpreter properties with "spark." as prefix to SparkConf

    ### What is this PR for?
    Minor change to only pass interpreter properties with "spark." as prefix to SparkConf. Other properties is used by zeppelin interpreter process, so don't need to be passed to SparkConf.
    
    
    ### What type of PR is it?
    [Bug Fix]
    
    ### Todos
    * [ ] - Task
    
    ### What is the Jira issue?
    * https://issues.apache.org/jira/browse/ZEPPELIN-2045
    
    ### How should this be tested?
    Tested manually, this is the log after this PR
    ```
     INFO [2017-02-03 09:05:33,664] ({pool-2-thread-2} SparkInterpreter.java[createSparkContext_1]:384) - ------ Create new SparkContext yarn-client -------
    DEBUG [2017-02-03 09:05:33,668] ({pool-2-thread-2} SparkInterpreter.java[createSparkContext_1]:467) - SparkConf: key = [spark.cores.max], value = [2]
    DEBUG [2017-02-03 09:05:33,668] ({pool-2-thread-2} SparkInterpreter.java[createSparkContext_1]:467) - SparkConf: key = [spark.app.name], value = [Zeppelin]
    ```
    
    ### Screenshots (if appropriate)
    
    ### Questions:
    * Does the licenses files need update? No
    * Is there breaking changes for older versions? No
    * Does this needs documentation? No


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zjffdu/zeppelin ZEPPELIN-2045

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/zeppelin/pull/1970.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1970
    
----
commit 3a146d3e0022f70576f34634910f3bd39c911ac3
Author: Jeff Zhang <zj...@apache.org>
Date:   2017-02-03T01:01:09Z

    ZEPPELIN-2045. Pass interpreter properties with "spark." as prefix to SparkConf

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1970: ZEPPELIN-2045. Pass interpreter properties with "spark...

Posted by zjffdu <gi...@git.apache.org>.
Github user zjffdu commented on the issue:

    https://github.com/apache/zeppelin/pull/1970
  
    @felixcheung I think this is a bug, but didn't cause potential issues yet for now. In the previous version it would pass all the properties with non-empty value to SparkConf which means it won't miss any properties with `spark.` as prefix as long as the value is non-empty. @Leemoonsoo Please help confirm ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1970: ZEPPELIN-2045. Pass interpreter properties with "spark...

Posted by costin <gi...@git.apache.org>.
Github user costin commented on the issue:

    https://github.com/apache/zeppelin/pull/1970
  
    I'm not sure why my name is mentioned on this thread or what exactly is the issue or question that I should be answering.
    Can somebody please open up a relevant thread / issue over at es-hadoop for a follow up?
    
    Thanks,



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1970: ZEPPELIN-2045. Pass interpreter properties with "spark...

Posted by zjffdu <gi...@git.apache.org>.
Github user zjffdu commented on the issue:

    https://github.com/apache/zeppelin/pull/1970
  
    This doesn't conform with spark. Spark don't allow property that doesn't start with `spark.` [1]
    If user want to use any properties in driver side, they should use properties of Interpreter rather than SparkConf. 
    
    [1] https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala#L132



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1970: ZEPPELIN-2045. Pass interpreter properties with "spark...

Posted by zjffdu <gi...@git.apache.org>.
Github user zjffdu commented on the issue:

    https://github.com/apache/zeppelin/pull/1970
  
    Thanks @orenpai for clarify. I won't revert this PR as even I revert this PR, non-spark configuration still won't be propagated to executor, this is due to spark not zeppelin.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin pull request #1970: ZEPPELIN-2045. Pass interpreter properties with...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/zeppelin/pull/1970


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1970: ZEPPELIN-2045. Pass interpreter properties with "spark...

Posted by Leemoonsoo <gi...@git.apache.org>.
Github user Leemoonsoo commented on the issue:

    https://github.com/apache/zeppelin/pull/1970
  
    If the non spark.* values are passed to executor and being used there, i think it need to be reverted and be included in 0.7.2. @zjffdu What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1970: ZEPPELIN-2045. Pass interpreter properties with "spark...

Posted by Leemoonsoo <gi...@git.apache.org>.
Github user Leemoonsoo commented on the issue:

    https://github.com/apache/zeppelin/pull/1970
  
    Merge to master if no further discussions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1970: ZEPPELIN-2045. Pass interpreter properties with "spark...

Posted by zjffdu <gi...@git.apache.org>.
Github user zjffdu commented on the issue:

    https://github.com/apache/zeppelin/pull/1970
  
    @Leemoonsoo @felixcheung Please help review. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1970: ZEPPELIN-2045. Pass interpreter properties with "spark...

Posted by orenpai <gi...@git.apache.org>.
Github user orenpai commented on the issue:

    https://github.com/apache/zeppelin/pull/1970
  
    As far as I can tell from the code, yes - it does some verification of this value on the driver, and then passes it after processing to the executors.
    @costin might be able to provide more details.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1970: ZEPPELIN-2045. Pass interpreter properties with "spark...

Posted by zjffdu <gi...@git.apache.org>.
Github user zjffdu commented on the issue:

    https://github.com/apache/zeppelin/pull/1970
  
    Actually spark **won't** pass non spark.* properties to executor. This is why I confuse how does ElasticsEarch use it. 
    See https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala#L372 
    
    @orenpai @costin Do you have more infos ?
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1970: ZEPPELIN-2045. Pass interpreter properties with "spark...

Posted by jongyoul <gi...@git.apache.org>.
Github user jongyoul commented on the issue:

    https://github.com/apache/zeppelin/pull/1970
  
    @zjffdu Can you handle this issue?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1970: ZEPPELIN-2045. Pass interpreter properties with "spark...

Posted by Leemoonsoo <gi...@git.apache.org>.
Github user Leemoonsoo commented on the issue:

    https://github.com/apache/zeppelin/pull/1970
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1970: ZEPPELIN-2045. Pass interpreter properties with "spark...

Posted by orenpai <gi...@git.apache.org>.
Github user orenpai commented on the issue:

    https://github.com/apache/zeppelin/pull/1970
  
    To summarize, es-hadoop is aware that from command line, non spark.* parameters get ignored so they created a workaround to allow setting the parameters through spark.es.* parameters. 
    Either way, as I suspected, the parameters themselves are serialized separately to the executors and not through the spark configuration.
    I must say that I still think that blocking any non-spark configuration settings is overprotective relative to what you can perform in code, but I am no longer blocked by this issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1970: ZEPPELIN-2045. Pass interpreter properties with "spark...

Posted by orenpai <gi...@git.apache.org>.
Github user orenpai commented on the issue:

    https://github.com/apache/zeppelin/pull/1970
  
    @zjffdu @Leemoonsoo I believe this is a mistake, and poses a real problem for me and other Zeppelin users.
    While it's true that the properties file is being skimmed to remove any non- spark.* properties, SparkConf.set allows it directly via code:
    https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkConf.scala#L84
    There are many libraries like es-hadoop for ElasticSearch integration that depend on the SparkConf having non spark.* properties, such as "es.nodes".
    Since Zeppelin does not allow a user to recreate the sc, this important mechanism of directly adding non spark properties via code is being blocked. I can't find any workaround either.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1970: ZEPPELIN-2045. Pass interpreter properties with "spark...

Posted by jongyoul <gi...@git.apache.org>.
Github user jongyoul commented on the issue:

    https://github.com/apache/zeppelin/pull/1970
  
    I didn't find a history for it but your code snippet tells us no spark properties doesn't affect anymore in current spark. LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1970: ZEPPELIN-2045. Pass interpreter properties with "spark...

Posted by orenpai <gi...@git.apache.org>.
Github user orenpai commented on the issue:

    https://github.com/apache/zeppelin/pull/1970
  
    Posted https://discuss.elastic.co/t/using-es-hadoop-with-zeppelin-and-the-use-of-es-nodes/88392


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1970: ZEPPELIN-2045. Pass interpreter properties with "spark...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the issue:

    https://github.com/apache/zeppelin/pull/1970
  
    change is good but I'm not completely sure of the behavior change, which is completely opposite to before?
    or more precisely, I'm not sure why it was not passing `spark.*` before?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1970: ZEPPELIN-2045. Pass interpreter properties with "spark...

Posted by zjffdu <gi...@git.apache.org>.
Github user zjffdu commented on the issue:

    https://github.com/apache/zeppelin/pull/1970
  
    @orenpai IIUC, spark won't propagate non spark.* properties to executor. How does ElasticsEarch use these configuration. Does it only use it in driver side ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1970: ZEPPELIN-2045. Pass interpreter properties with "spark...

Posted by jongyoul <gi...@git.apache.org>.
Github user jongyoul commented on the issue:

    https://github.com/apache/zeppelin/pull/1970
  
    That's intended behavior because Zeppelin wants to pass some other properties not starting "spark.". I didn't remembers exactly, but some properties should be passed with different prefix "spark." e.g. "hbase." which can has a value or don't have any value.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---