You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by davies <gi...@git.apache.org> on 2014/09/22 21:04:42 UTC

[GitHub] spark pull request: [SPARK-3634] [PySpark] User's module should ta...

GitHub user davies opened a pull request:

    https://github.com/apache/spark/pull/2492

    [SPARK-3634] [PySpark] User's module should take precedence over system modules

    Python modules added through addPyFile should take precedence over system modules.
    
    This patch put the path for user added module in the front of sys.path (just after ''), 
    
    BTW: it's a bit dangerous that user can upload new module to modify the default behavior of system. Currently, it's hard to find the the correct position to insert user's module.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/davies/spark path

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/2492.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2492
    
----
commit c16c392c6b41b0922e9fc95dc54125ba926bdb13
Author: Davies Liu <da...@gmail.com>
Date:   2014-09-22T18:58:37Z

    put addPyFile in front of sys.path

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3634] [PySpark] User's module should ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2492#issuecomment-56426523
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20661/consoleFull) for   PR 2492 at commit [`6b0002f`](https://github.com/apache/spark/commit/6b0002f3abb58944a3ffac2dc9f880b9e1845443).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3634] [PySpark] User's module should ta...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/2492#issuecomment-56463455
  
    > Maybe my JIRA was misleadingly named; my motivation here is allowing users to specify versions of packages that take precedence over other versions of that same package that might be installed on the system, not in overriding modules included in Python's standard library (although the ability to do that is a side-effect of this change).
    
    Understood, this side-effect is bit dangerous. The third-package could appear in sys.path in any order, such as 
    
    ```python
    >>> import sys
    >>> sys.path
    ['', '//anaconda/lib/python2.7/site-packages/DPark-0.1-py2.7.egg', '//anaconda/lib/python2.7/site-packages/protobuf-2.5.0-py2.7.egg', '//anaconda/lib/python2.7/site-packages/msgpack_python-0.4.2-py2.7-macosx-10.5-x86_64.egg', '//anaconda/lib/python2.7/site-packages/setuptools-3.6-py2.7.egg', '/Users/daviesliu/work/spark/python/lib', '/Users/daviesliu/work/spark/python/lib/py4j-0.8.2.1-src.zip', '/Users/daviesliu/work/spark/python', '//anaconda/lib/python27.zip', '//anaconda/lib/python2.7', '//anaconda/lib/python2.7/plat-darwin', '//anaconda/lib/python2.7/plat-mac', '//anaconda/lib/python2.7/plat-mac/lib-scriptpackages', '//anaconda/lib/python2.7/lib-tk', '//anaconda/lib/python2.7/lib-old', '//anaconda/lib/python2.7/lib-dynload', '//anaconda/lib/python2.7/site-packages', '//anaconda/lib/python2.7/site-packages/PIL', '//anaconda/lib/python2.7/site-packages/runipy-0.1.0-py2.7.egg']
    ```
    it's not easy to find a position which is before third-package but after standard module.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3634] [PySpark] User's module should ta...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2492#discussion_r17993171
  
    --- Diff: python/pyspark/context.py ---
    @@ -183,10 +183,9 @@ def _do_init(self, master, appName, sparkHome, pyFiles, environment, batchSize,
             for path in self._conf.get("spark.submit.pyFiles", "").split(","):
                 if path != "":
                     (dirname, filename) = os.path.split(path)
    -                self._python_includes.append(filename)
    -                sys.path.append(path)
    -                if dirname not in sys.path:
    -                    sys.path.append(dirname)
    +                if filename.lower().endswith("zip") or filename.lower().endswith("egg"):
    --- End diff --
    
    root_dir is already added into sys.path, see LINE 174


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3634] [PySpark] User's module should ta...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/2492


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3634] [PySpark] User's module should ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2492#issuecomment-56437938
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20664/consoleFull) for   PR 2492 at commit [`f7ff4da`](https://github.com/apache/spark/commit/f7ff4da4c484a61a03489cc444deb2486610dfd4).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3634] [PySpark] User's module should ta...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2492#discussion_r17993070
  
    --- Diff: python/pyspark/context.py ---
    @@ -183,10 +183,9 @@ def _do_init(self, master, appName, sparkHome, pyFiles, environment, batchSize,
             for path in self._conf.get("spark.submit.pyFiles", "").split(","):
                 if path != "":
                     (dirname, filename) = os.path.split(path)
    -                self._python_includes.append(filename)
    -                sys.path.append(path)
    -                if dirname not in sys.path:
    -                    sys.path.append(dirname)
    +                if filename.lower().endswith("zip") or filename.lower().endswith("egg"):
    --- End diff --
    
    Do we explicitly add `root_dir` to `sys.path`?  I don't think we can always assume that the Python driver / worker are executed from inside of `root_dir`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3634] [PySpark] User's module should ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2492#issuecomment-56445673
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20664/consoleFull) for   PR 2492 at commit [`f7ff4da`](https://github.com/apache/spark/commit/f7ff4da4c484a61a03489cc444deb2486610dfd4).
     * This patch **fails** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3634] [PySpark] User's module should ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2492#issuecomment-56445164
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/140/consoleFull) for   PR 2492 at commit [`6b0002f`](https://github.com/apache/spark/commit/6b0002f3abb58944a3ffac2dc9f880b9e1845443).
     * This patch **fails** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3634] [PySpark] User's module should ta...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2492#discussion_r17991203
  
    --- Diff: python/pyspark/context.py ---
    @@ -183,10 +183,9 @@ def _do_init(self, master, appName, sparkHome, pyFiles, environment, batchSize,
             for path in self._conf.get("spark.submit.pyFiles", "").split(","):
                 if path != "":
                     (dirname, filename) = os.path.split(path)
    -                self._python_includes.append(filename)
    -                sys.path.append(path)
    -                if dirname not in sys.path:
    -                    sys.path.append(dirname)
    +                if filename.lower().endswith("zip") or filename.lower().endswith("egg"):
    --- End diff --
    
    I think that `spark.submit.pyFiles` is allowed to contain `.py` files, too:
    
    ```
      --py-files PY_FILES         Comma-separated list of .zip, .egg, or .py files to place
                                  on the PYTHONPATH for Python apps.
    ```
    
    Will this new filtering by `.zip` and `.egg` prevent this from working?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3634] [PySpark] User's module should ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2492#issuecomment-56426550
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20661/consoleFull) for   PR 2492 at commit [`6b0002f`](https://github.com/apache/spark/commit/6b0002f3abb58944a3ffac2dc9f880b9e1845443).
     * This patch **fails** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3634] [PySpark] User's module should ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2492#issuecomment-56454753
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3634] [PySpark] User's module should ta...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/2492#issuecomment-56440200
  
    > BTW: it's a bit dangerous that user can upload new module to modify the default behavior of system. Currently, it's hard to find the the correct position to insert user's module.
    
    Maybe my JIRA was misleadingly named; my motivation here is allowing users to specify versions of packages that take precedence over other versions of that same package that might be installed on the system, not in overriding modules included in Python's standard library (although the ability to do that is a side-effect of this change).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3634] [PySpark] User's module should ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2492#issuecomment-56454754
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20668/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3634] [PySpark] User's module should ta...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/2492#issuecomment-56477603
  
    > Understood, this side-effect is bit dangerous. The third-package could appear in sys.path in any order
    
    Are you worried about a user adding a Python module whose name conflicts with a built-in module, thereby shadowing it?  I think this is a general Python problem that can occur even without `sys.path` manipulation, which is why it's bad to have top-level modules that have the same name as built-in ones (and also why relative imports can be bad): http://www.evanjones.ca/python-name-clashes.html


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3634] [PySpark] User's module should ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2492#issuecomment-56424317
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20660/consoleFull) for   PR 2492 at commit [`c16c392`](https://github.com/apache/spark/commit/c16c392c6b41b0922e9fc95dc54125ba926bdb13).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3634] [PySpark] User's module should ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2492#issuecomment-56443599
  
    **[Tests timed out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20660/consoleFull)** after     a configured wait of `120m`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3634] [PySpark] User's module should ta...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/2492#issuecomment-56598619
  
    I think it's fine to move on, and remove the comment about risk in PR's description.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3634] [PySpark] User's module should ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2492#issuecomment-56454746
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20668/consoleFull) for   PR 2492 at commit [`4a2af78`](https://github.com/apache/spark/commit/4a2af7803f955c3f85d0814fc5ee297a4198a8b9).
     * This patch **passes** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3634] [PySpark] User's module should ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2492#issuecomment-56446680
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/141/consoleFull) for   PR 2492 at commit [`f7ff4da`](https://github.com/apache/spark/commit/f7ff4da4c484a61a03489cc444deb2486610dfd4).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3634] [PySpark] User's module should ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2492#issuecomment-56437319
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/140/consoleFull) for   PR 2492 at commit [`6b0002f`](https://github.com/apache/spark/commit/6b0002f3abb58944a3ffac2dc9f880b9e1845443).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3634] [PySpark] User's module should ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2492#issuecomment-56454517
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/141/consoleFull) for   PR 2492 at commit [`f7ff4da`](https://github.com/apache/spark/commit/f7ff4da4c484a61a03489cc444deb2486610dfd4).
     * This patch **passes** unit tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `abstract class LogicalPlan extends QueryPlan[LogicalPlan] with Logging `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3634] [PySpark] User's module should ta...

Posted by mattf <gi...@git.apache.org>.
Github user mattf commented on the pull request:

    https://github.com/apache/spark/pull/2492#issuecomment-56659390
  
    this is a nice addition. re danger, i'll add that the user is only able to endanger herself.
    
    +1 lgtm


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3634] [PySpark] User's module should ta...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2492#discussion_r17993158
  
    --- Diff: python/pyspark/context.py ---
    @@ -183,10 +183,9 @@ def _do_init(self, master, appName, sparkHome, pyFiles, environment, batchSize,
             for path in self._conf.get("spark.submit.pyFiles", "").split(","):
                 if path != "":
                     (dirname, filename) = os.path.split(path)
    -                self._python_includes.append(filename)
    -                sys.path.append(path)
    -                if dirname not in sys.path:
    -                    sys.path.append(dirname)
    +                if filename.lower().endswith("zip") or filename.lower().endswith("egg"):
    --- End diff --
    
    Aha, I see that we _do_ add `root_dir` to the path in `worker.py`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3634] [PySpark] User's module should ta...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2492#discussion_r17993217
  
    --- Diff: python/pyspark/context.py ---
    @@ -183,10 +183,9 @@ def _do_init(self, master, appName, sparkHome, pyFiles, environment, batchSize,
             for path in self._conf.get("spark.submit.pyFiles", "").split(","):
                 if path != "":
                     (dirname, filename) = os.path.split(path)
    -                self._python_includes.append(filename)
    -                sys.path.append(path)
    -                if dirname not in sys.path:
    -                    sys.path.append(dirname)
    +                if filename.lower().endswith("zip") or filename.lower().endswith("egg"):
    --- End diff --
    
    Ah, great.  In that case, this PR looks good to me, so I'm going to merge it.  Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3634] [PySpark] User's module should ta...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2492#discussion_r17992971
  
    --- Diff: python/pyspark/context.py ---
    @@ -183,10 +183,9 @@ def _do_init(self, master, appName, sparkHome, pyFiles, environment, batchSize,
             for path in self._conf.get("spark.submit.pyFiles", "").split(","):
                 if path != "":
                     (dirname, filename) = os.path.split(path)
    -                self._python_includes.append(filename)
    -                sys.path.append(path)
    -                if dirname not in sys.path:
    -                    sys.path.append(dirname)
    +                if filename.lower().endswith("zip") or filename.lower().endswith("egg"):
    --- End diff --
    
    The `.py` files will be put in `root_dir`, can be imported by name, so it should not put in sys.path.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3634] [PySpark] User's module should ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2492#issuecomment-56446849
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20668/consoleFull) for   PR 2492 at commit [`4a2af78`](https://github.com/apache/spark/commit/4a2af7803f955c3f85d0814fc5ee297a4198a8b9).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org