You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by brennonyork <gi...@git.apache.org> on 2014/12/02 20:38:33 UTC

[GitHub] spark pull request: [SPARK-4616][Core] - SPARK_CONF_DIR is not eff...

GitHub user brennonyork opened a pull request:

    https://github.com/apache/spark/pull/3559

    [SPARK-4616][Core] - SPARK_CONF_DIR is not effective in spark-submit

    By default the `SPARK_CONF_DIR` is not capable of being set from the `spark-submit` script, but a spark properties file is. After diving through the code it turned out that the `SPARK_CONF_DIR` is actually a cyclic reference as it is referenced within the `load-spark-env.sh` script and can also then be reset within the loaded `spark-env.sh` file. What's worse is that, if the `spark-env.sh` defined a `SPARK_CONF_DIR` it wouldn't be picked up and used to grab the default `spark-defaults.conf` if no `--properties-file` flag was present. As such, it seemed best to provide a `--environment-file` flag that can be used to present an arbitrary bash file to the system which would preload any necessary environment configuration options (including `SPARK_CONF_DIR`). This then solves the original problem that the `SPARK_CONF_DIR` wasn't effective within `spark-submit`, but also removes the cyclic dependency on where and when the `SPARK_CONF_DIR` is loaded.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/brennonyork/spark SPARK-4616

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/3559.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3559
    
----
commit d857bf713a431d02eb74f0852a0d3264d599e583
Author: Brennon York <br...@capitalone.com>
Date:   2014-12-01T20:12:43Z

    fixes SPARK-4616 by adding an environment file that can be passed in through the command line options

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4616][Core] - SPARK_CONF_DIR is not eff...

Posted by brennonyork <gi...@git.apache.org>.
Github user brennonyork commented on the pull request:

    https://github.com/apache/spark/pull/3559#issuecomment-68165117
  
    @andrewor14 @JoshRosen wondering what should be done with this issue, thoughts on my comments above??


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4616][Core] - SPARK_CONF_DIR is not eff...

Posted by brennonyork <gi...@git.apache.org>.
Github user brennonyork commented on the pull request:

    https://github.com/apache/spark/pull/3559#issuecomment-66163851
  
    @JoshRosen Is there anything else needed for this patch to be pushed in? Any feedback / review would be great as well!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4616][Core] - SPARK_CONF_DIR is not eff...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/3559#issuecomment-65726931
  
    Jenkins, this is ok to test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4616][Core] - SPARK_CONF_DIR is not eff...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3559#issuecomment-65727081
  
      [Test build #24152 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24152/consoleFull) for   PR 3559 at commit [`d857bf7`](https://github.com/apache/spark/commit/d857bf713a431d02eb74f0852a0d3264d599e583).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4616][Core] - SPARK_CONF_DIR is not eff...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/3559#issuecomment-65291188
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4616][Core] - SPARK_CONF_DIR is not eff...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/3559#issuecomment-65734572
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24152/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4616][Core] - SPARK_CONF_DIR is not eff...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/3559#issuecomment-69126210
  
    That sounds good. Feel free to make a brief mention in the docs where appropriate.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4616][Core] - SPARK_CONF_DIR is not eff...

Posted by brennonyork <gi...@git.apache.org>.
Github user brennonyork commented on the pull request:

    https://github.com/apache/spark/pull/3559#issuecomment-66694879
  
    @andrewor14 definitely understand that reasoning. I guess my only question would be how would we a) answer the bug then and still b) support the `--properties-file` option? Even if we remove the `--environment-file` CLI option I'm still confused / not seeing a solution to setting the `SPARK_CONF_DIR`. If we allowed `SPARK_CONF_DIR` to be set from the CLI then we would still run into a merge problem with the the `SPARK_CONF_DIR/spark-defaults.conf` and whatever was set from the `--properties-file`. I agree though that, regardless, we should minimize merging of configuration files as that would likely cause more confusion than help for the end user.
    
    What about this solution... We completely remove the `--properties-file` CLI option in place of the `--environment-file` option? I thinking that, at the end of the day, ensuring proper environment settings (i.e. `SPARK_CONF_DIR`, etc.) are more important than a single properties file, especially when the `SPARK_CONF_DIR` would point to the configuration file. I'm just throwing that out, but I'm wondering if any other solution would work without necessitating file merging. That being said, doesn't Spark already merge defaults into the given `--properties-file` if it was provided? Now I'm wondering if that actually **would** be more confusing if we did the same thing with the spark environment file.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4616][Core] - SPARK_CONF_DIR is not eff...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/3559


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4616][Core] - SPARK_CONF_DIR is not eff...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/3559#issuecomment-66529795
  
    Hey @brennonyork I'm actually not sure if we should introduce an additional command line option to specify the environment file. I think it makes the semantics very confusing because we can't be sure which environment file is being used: is it the one specified through `--environment-file`? Is it the one in `$SPARK_CONF_DIR/spark-env.sh`? Is it both? If it's both how to we merge the values, or which file overwrites the other?
    
    I think because of this circular dependency relationship between `SPARK_CONF_DIR` and `spark-env.sh` we should just add a note in `spark-env.sh.template` warning the user not to set `SPARK_CONF_DIR` there.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4616][Core] - SPARK_CONF_DIR is not eff...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/3559#issuecomment-65734559
  
      [Test build #24152 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24152/consoleFull) for   PR 3559 at commit [`d857bf7`](https://github.com/apache/spark/commit/d857bf713a431d02eb74f0852a0d3264d599e583).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4616][Core] - SPARK_CONF_DIR is not eff...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/3559#issuecomment-69112310
  
    @brennonyork we can't remove `--properties-file` because we need to maintain backwards compatibility. We currently don't attempt to merge the properties at all, but instead allow `--properties-file` to take precedence. Note that the `--environment-file` case is different in that it's not as simple as having the command line option overriding an equivalent environment variable.
    
    I think the better approach is to simply warn the user in `spark-env.sh` not to set `SPARK_CONF_DIR` there since this is somewhat minor of an issue in my opinion. Given that there is not a clean way to fix this I recommend that we close this issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-4616][Core] - SPARK_CONF_DIR is not eff...

Posted by brennonyork <gi...@git.apache.org>.
Github user brennonyork commented on the pull request:

    https://github.com/apache/spark/pull/3559#issuecomment-69116379
  
    Roger that. What do you think about, for this PR, to put merely put a blurb into the `spark-env.sh.template` and / or change the docs to reflect this issue? Just don't want the issue to be closed without documenting it anywhere for future users.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org