You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@toree.apache.org by ribamar-santarosa <gi...@git.apache.org> on 2017/09/12 13:11:24 UTC

[GitHub] incubator-toree pull request #141: fix kernel generation for Spark Yarn // T...

GitHub user ribamar-santarosa opened a pull request:

    https://github.com/apache/incubator-toree/pull/141

    fix kernel generation for Spark Yarn // TOREE-97

    It looks like the TOREE-97 issue -- support for Spark Yarn was closed without definitive solution (or something went wrong on the way). Toree does support it, but it won't work if a user don't add manually in their kernel.json definition, the env vars for `HADOOP_CONF_DIR`. Without that env var, Spark doesn't know what to do with the option `--master=yarn` (set in `__TOREE_SPARK_OPTS__`). It would be desirable to have it by default, and this patch provides this functionality. 
    
    Probably this is not the nicest way to solve the problem, because it  just hard codes more vars into the JSON file -- ideally it would be nice to have an interface to add or remove env vars from those files, however, `HADOOP_CONF_DIR`  and `SPARK_CONF_DIR`  look basic to be exported. Even for an Spark Standalone deployment, `HADOOP_CONF_DIR` won't hurt.  So, here it goes our 2 cents to improve a bit the situation.
    
    I cloned the TOREE-97 into TOREE-438 to sign this issue. 
    
    
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/Bright-Computing/incubator-toree master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-toree/pull/141.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #141
    
----
commit 688e96e75e306bc9172fdc49bb143f8cf67ceb11
Author: Ribamar Santarosa <ri...@brightcomputing.com>
Date:   2017-09-11T12:35:08Z

    fix kernel generation for Spark Yarn // TOREE-97

----


---

[GitHub] incubator-toree pull request #141: fix kernel generation for Spark Yarn // T...

Posted by ribamar-santarosa <gi...@git.apache.org>.
Github user ribamar-santarosa commented on a diff in the pull request:

    https://github.com/apache/incubator-toree/pull/141#discussion_r138678195
  
    --- Diff: etc/pip_install/toree/toreeapp.py ---
    @@ -57,6 +59,12 @@ class ToreeInstall(InstallKernelSpec):
         spark_home = Unicode(os.getenv(SPARK_HOME, '/usr/local/spark'), config=True,
             help='''Specify where the spark files can be found.'''
         )
    +    hadoop_conf_dir = Unicode(os.getenv(HADOOP_CONF_DIR, '/usr/local/hadoop'), config=True,
    +        help='''Specify where the hadoop config files can be found.'''
    +    )
    +    spark_conf_dir = Unicode(os.getenv(SPARK_CONF_DIR, '/usr/local/spark'), config=True,
    +        help='''Specify where the spark config files can be found.'''
    --- End diff --
    
    sure, it has to be the place where `spark-env.sh` is found.  


---

[GitHub] incubator-toree pull request #141: fix kernel generation for Spark Yarn // T...

Posted by lresende <gi...@git.apache.org>.
Github user lresende commented on a diff in the pull request:

    https://github.com/apache/incubator-toree/pull/141#discussion_r138676589
  
    --- Diff: etc/pip_install/toree/toreeapp.py ---
    @@ -57,6 +59,12 @@ class ToreeInstall(InstallKernelSpec):
         spark_home = Unicode(os.getenv(SPARK_HOME, '/usr/local/spark'), config=True,
             help='''Specify where the spark files can be found.'''
         )
    +    hadoop_conf_dir = Unicode(os.getenv(HADOOP_CONF_DIR, '/usr/local/hadoop'), config=True,
    +        help='''Specify where the hadoop config files can be found.'''
    +    )
    +    spark_conf_dir = Unicode(os.getenv(SPARK_CONF_DIR, '/usr/local/spark'), config=True,
    +        help='''Specify where the spark config files can be found.'''
    --- End diff --
    
    Should the default value actually be /usr/local/spark/conf ?


---

[GitHub] incubator-toree pull request #141: fix kernel generation for Spark Yarn // T...

Posted by ribamar-santarosa <gi...@git.apache.org>.
Github user ribamar-santarosa commented on a diff in the pull request:

    https://github.com/apache/incubator-toree/pull/141#discussion_r138678529
  
    --- Diff: etc/pip_install/toree/toreeapp.py ---
    @@ -57,6 +59,12 @@ class ToreeInstall(InstallKernelSpec):
         spark_home = Unicode(os.getenv(SPARK_HOME, '/usr/local/spark'), config=True,
    --- End diff --
    
    If they're set to be empty, it's just like they're not set -- so it's true that probably they're better empty. 


---

[GitHub] incubator-toree pull request #141: fix kernel generation for Spark Yarn // T...

Posted by lresende <gi...@git.apache.org>.
Github user lresende commented on a diff in the pull request:

    https://github.com/apache/incubator-toree/pull/141#discussion_r138677023
  
    --- Diff: etc/pip_install/toree/toreeapp.py ---
    @@ -57,6 +59,12 @@ class ToreeInstall(InstallKernelSpec):
         spark_home = Unicode(os.getenv(SPARK_HOME, '/usr/local/spark'), config=True,
    --- End diff --
    
    In general, do we actually want to use default values here? I am assuming we don't really have a standard default place to deploy Spark/Hadoop and maybe it would be better to use the env variables if they are available, otherwise, ignore or in case of required ones throw an error?


---