You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by eastlondoner <gi...@git.apache.org> on 2016/07/21 15:16:51 UTC

[GitHub] spark pull request #14303: python import pyspark fails

GitHub user eastlondoner opened a pull request:

    https://github.com/apache/spark/pull/14303

    python import pyspark fails

    ## What changes were proposed in this pull request?
    
    Fix importing pyspark in python
    
    ## How was this patch tested?
    
    manually tested
    
    BEFORE patch error was:
    
    import pyspark
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "pyspark/__init__.py", line 44, in <module>
        from pyspark.context import SparkContext
      File "pyspark/context.py", line 28, in <module>
        from pyspark import accumulators
    ImportError: cannot import name accumulators

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/eastlondoner/spark fix-pyspark-import

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/14303.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #14303
    
----
commit 2b218ee9ae1299a1acff8779dc0a399d01c7ab14
Author: eastlondoner <ea...@users.noreply.github.com>
Date:   2016-07-21T15:15:43Z

    python import pyspark fails 
    
    import pyspark
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "pyspark/__init__.py", line 44, in <module>
        from pyspark.context import SparkContext
      File "pyspark/context.py", line 28, in <module>
        from pyspark import accumulators
    ImportError: cannot import name accumulators

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14303: python import pyspark fails

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14303
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14303: [SPARK-16665] python import pyspark fails

Posted by holdenk <gi...@git.apache.org>.

Github user holdenk commented on the issue:

    https://github.com/apache/spark/pull/14303
  
    I don't know if this is the intended way to use PySpark (normally you would use it through `spark-submit` or `pyspark`) - however if you want to use PySpark this way you can see https://github.com/sparklingpandas/sparklingpandas/blob/master/sparklingpandas/utils.py to see one way how to setup the imports and then just do an `import pyspark` - although it should be more or less equivalent with what you've posted for repro.
    That being said I tried your repro steps with Python 2.7.11 & 3.4.3 on 2ae7b88a07140e012b6c60db3c4a2a8ca360c684 and it doesn't repro:
    >>> import sys
    >>> import os
    >>> sys.path = [os.environ['SPARK_HOME'] + "/python/lib/py4j-0.10.1-src.zip"] + sys.path 
    >>> sys.path = [os.environ['SPARK_HOME'] + "/python/"] + sys.path
    >>> sys.path
    ['/home/holden/repos/spark/python/', '/home/holden/repos/spark/python/lib/py4j-0.10.1-src.zip', '', '/home/holden/miniconda2/lib/python27.zip', '/home/holden/miniconda2/lib/python2.7', '/home/holden/miniconda2/lib/python2.7/plat-linux2', '/home/holden/miniconda2/lib/python2.7/lib-tk', '/home/holden/miniconda2/lib/python2.7/lib-old', '/home/holden/miniconda2/lib/python2.7/lib-dynload', '/home/holden/.local/lib/python2.7/site-packages', '/home/holden/miniconda2/lib/python2.7/site-packages', '/home/holden/miniconda2/lib/python2.7/site-packages/setuptools-20.3-py2.7.egg']
    >>> import pyspark
    
    #delayedflightjuststartedtoboard


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14303: [SPARK-16665] python import pyspark fails

Posted by holdenk <gi...@git.apache.org>.

Github user holdenk commented on the issue:

    https://github.com/apache/spark/pull/14303
  
    Sorry for the super brief answer - got to run to a flight - but if we could get a repro that would be great :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14303: [SPARK-16665] python import pyspark fails

Posted by eastlondoner <gi...@git.apache.org>.

Github user eastlondoner commented on the issue:

    https://github.com/apache/spark/pull/14303
  
    @BryanCutler got it! 
    I had previously had a failed import, I didn't realise that python 'remembered' failed imports.
    Thanks for the help!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14303: [SPARK-16665] python import pyspark fails

Posted by eastlondoner <gi...@git.apache.org>.

Github user eastlondoner commented on the issue:

    https://github.com/apache/spark/pull/14303
  
    note the stacktrace. It occurs inside the __init__.py of pyspark.
    So the pyspark module is not actually initialized and you are trying to import accumulators from it.
    It *may* be that re-arranging the order of the __init__.py might make this go away
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14303: [SPARK-16665] python import pyspark fails

Posted by holdenk <gi...@git.apache.org>.

Github user holdenk commented on the issue:

    https://github.com/apache/spark/pull/14303
  
    I _think_ it is different in that the side effects are slightly different - but it shouldn't actually impact this reported problem. I'm a little fuzzy on the specific Python importing rules between Python versions though so I could be missremembering something.
    
    I think it would be useful to know how this problem came to be - are you running your python job through spark submit or are you trying to import pyspark in another way) and do you have a repro we could see?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #14303: [SPARK-16665] python import pyspark fails

Posted by eastlondoner <gi...@git.apache.org>.

Github user eastlondoner closed the pull request at:

    https://github.com/apache/spark/pull/14303


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14303: [SPARK-16665] python import pyspark fails

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/14303
  
    It seems fine with me too.
    
    ```bash
    $ python
    Python 2.7.10 (default, Oct 23 2015, 19:19:21)
    [GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.0.59.5)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import pyspark
    >>> filter(lambda x: x == "accumulators", dir(pyspark))
    ['accumulators']
    >>> pyspark.accumulators
    <module 'pyspark.accumulators' from .../spark/python/pyspark/accumulators.pyc'>
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14303: [SPARK-16665] python import pyspark fails

Posted by eastlondoner <gi...@git.apache.org>.

Github user eastlondoner commented on the issue:

    https://github.com/apache/spark/pull/14303
  
    Quick repro
    
    
    cd python
    python
    import sys
    sys.path = [os.environ[SPARK_HOME] + "/python/lib/py4j-0.10.1-src.zip"] + sys.path sys.path 
    sys.path = [os.environ[SPARK_HOME] + "/python/"] + sys.path sys.path 
    import pyspark


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14303: [SPARK-16665] python import pyspark fails

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/14303
  
    Is that actually different? CC @holdenk 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #14303: [SPARK-16665] python import pyspark fails

Posted by BryanCutler <gi...@git.apache.org>.

Github user BryanCutler commented on the issue:

    https://github.com/apache/spark/pull/14303
  
    @eastlondoner , the import error you show indicates that `accumulators` has already been imported.  I was able to run the steps in your reproduction without error, but I did accidentally get the error one time when I forgot to add the py4j to the sys path and the `context` module was partially loaded with `accumulators`, so when I called 'import pyspark' again, I got the same import error as you.
    
    Can you verify that there were no previous import attempts when you receive this error?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org