You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by iven <gi...@git.apache.org> on 2014/08/15 11:55:21 UTC

[GitHub] spark pull request: Use user defined $SPARK_HOME in spark-submit i...

GitHub user iven opened a pull request:

    https://github.com/apache/spark/pull/1969

    Use user defined $SPARK_HOME in spark-submit if possible

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/iven/spark spark-home

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/1969.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1969
    
----
commit 8dc9f7f16d414ce2fd285243afe8fb87c33e9a8d
Author: Xu Lijian <xu...@qiyi.com>
Date:   2014-08-07T08:46:08Z

    Use user defined $SPARK_HOME in spark-submit if possible

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Use user defined $SPARK_HOME in spark-submit i...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/1969#issuecomment-53633009
  
    @iven I'm a little confused here. Are you referring to some use case like this:
    
    1. Spark is installed in directory `A` on driver node, but directory `B` on all Mesos slave nodes
    1. Export `SPARK_HOME` to `B` on driver side
    1. Start `spark-shell` without specifying `spark.executor.uri`, and then expect Mesos to find Spark installation in `B` on executor side
    
    Is it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Use user defined $SPARK_HOME in spark-submit i...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/1969#issuecomment-53764382
  
    Hi @iven, `spark-shell` actually goes through `spark-submit`. As @liancheng mentioned, you can set `spark.home` to control the executor side Spark location. This is not super intuitive, however, and there is an open PR that adds a more specific way to do this. #2166
    
    At least with the existing code, the user should not set `SPARK_HOME` because the code depends on that in many places downstream. A better solution is to set an application-specific config. Would you mind closing this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Use user defined $SPARK_HOME in spark-submit i...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/1969#issuecomment-53689928
  
    Actually you can just set `spark.home` in `spark-defaults.conf` for this use case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Use user defined $SPARK_HOME in spark-submit i...

Posted by iven <gi...@git.apache.org>.
Github user iven commented on the pull request:

    https://github.com/apache/spark/pull/1969#issuecomment-52388869
  
    @andrewor14 OK. I've update the patch when we confirm this PR is necessary.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Use user defined $SPARK_HOME in spark-submit i...

Posted by iven <gi...@git.apache.org>.
Github user iven commented on the pull request:

    https://github.com/apache/spark/pull/1969#issuecomment-53665867
  
    @liancheng Yes. Although I'm using `spark-submit`, not `spark-shell`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Use user defined $SPARK_HOME in spark-submit i...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/1969#issuecomment-52348177
  
    This is an updated JIRA for the same issue [SPARK-2290](https://issues.apache.org/jira/browse/SPARK-2290). We established that, for standalone mode, we don't need to ship the driver's spark home to the executors, which may not use the same spark home. Instead, we should just use the `Worker`'s current working directory. However, I am not familiar enough with Mesos to comment on the need of shipping `SPARK_HOME` there.
    
    @iven There are many other places where we export `SPARK_HOME` in addition to these two. From a quick grep, I found the following:
    ```
    bin/pyspark:export SPARK_HOME="$FWDIR"
    bin/run-example:export SPARK_HOME="$FWDIR"
    bin/spark-class:export SPARK_HOME="$FWDIR"
    bin/spark-submit:export SPARK_HOME="$(cd `dirname $0`/..; pwd)"
    sbin/spark-config.sh:export SPARK_HOME=${SPARK_PREFIX}
    ```
    We need to do the same for all of these places in order for your original intended behavior to take effect. In the longer run, however, we should just clean up our usages of `SPARK_HOME`, since in many places we don't actually have any need to export it (or even use the variable `SPARK_HOME`).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Use user defined $SPARK_HOME in spark-submit i...

Posted by iven <gi...@git.apache.org>.
Github user iven commented on the pull request:

    https://github.com/apache/spark/pull/1969#issuecomment-52388625
  
    @JoshRosen I'm using Spark 1.0.1 with Mesos. If I don't specify SPARK_HOME in the driver, Mesos executors will LOST with error:
    
    ```
    sh: /root/spark_master/sbin/spark-executor: No such file or directory
    ```
    
    Where `/root/spark_master` is the `SPARK_HOME` of the driver.
    
    I think this is caused by `createExecutorInfo` method in `MesosSchedulerBackend.scala`. When `spark.executor.uri` is not specified, it will use `SPARK_HOME` from `SparkContext`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Use user defined $SPARK_HOME in spark-submit i...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/1969#issuecomment-52291226
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Use user defined $SPARK_HOME in spark-submit i...

Posted by CodingCat <gi...@git.apache.org>.
Github user CodingCat commented on the pull request:

    https://github.com/apache/spark/pull/1969#issuecomment-52309962
  
    I once submitted a similar patch, but the latest solution (merged?) is that we will not send local SPARK_HOME to the remote end entirely..... @andrewor14?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Use user defined $SPARK_HOME in spark-submit i...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/1969#issuecomment-52348600
  
    In PySpark, it looks like we only use `SPARK_HOME` on the driver, where it's used to find the path to `spark-submit` and to locate test support files.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Use user defined $SPARK_HOME in spark-submit i...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/1969#issuecomment-52342115
  
    There was a bunch of prior discussion about this in an old pull request for [SPARK-1110](http://issues.apache.org/jira/browse/SPARK-1110) (I'd link to it, but it's from the now-deleted `incubator-spark` GitHub repo).
    
    I think we decided that it didn't make sense for workers to inherit `SPARK_HOME` from the driver; there were some later patches that removed this dependency, if I recall.
    
    @iven Was this pull request motivated by an issue that you saw when deploying Spark?  Which version were you using, and on what platform?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Use user defined $SPARK_HOME in spark-submit i...

Posted by iven <gi...@git.apache.org>.
Github user iven commented on the pull request:

    https://github.com/apache/spark/pull/1969#issuecomment-53830594
  
    @liancheng @andrewor14 Thanks, it works! I'm closing this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Use user defined $SPARK_HOME in spark-submit i...

Posted by iven <gi...@git.apache.org>.
Github user iven closed the pull request at:

    https://github.com/apache/spark/pull/1969


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: Use user defined $SPARK_HOME in spark-submit i...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/1969#issuecomment-53359507
  
    @liancheng


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org