You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by eastlondoner <gi...@git.apache.org> on 2016/07/21 15:16:51 UTC
[GitHub] spark pull request #14303: python import pyspark fails
GitHub user eastlondoner opened a pull request:
https://github.com/apache/spark/pull/14303
python import pyspark fails
## What changes were proposed in this pull request?
Fix importing pyspark in python
## How was this patch tested?
manually tested
BEFORE patch error was:
import pyspark
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pyspark/__init__.py", line 44, in <module>
from pyspark.context import SparkContext
File "pyspark/context.py", line 28, in <module>
from pyspark import accumulators
ImportError: cannot import name accumulators
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/eastlondoner/spark fix-pyspark-import
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/14303.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #14303
----
commit 2b218ee9ae1299a1acff8779dc0a399d01c7ab14
Author: eastlondoner <ea...@users.noreply.github.com>
Date: 2016-07-21T15:15:43Z
python import pyspark fails
import pyspark
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pyspark/__init__.py", line 44, in <module>
from pyspark.context import SparkContext
File "pyspark/context.py", line 28, in <module>
from pyspark import accumulators
ImportError: cannot import name accumulators
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14303: python import pyspark fails
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/14303
Can one of the admins verify this patch?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14303: [SPARK-16665] python import pyspark fails
Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on the issue:
https://github.com/apache/spark/pull/14303
I don't know if this is the intended way to use PySpark (normally you would use it through `spark-submit` or `pyspark`) - however if you want to use PySpark this way you can see https://github.com/sparklingpandas/sparklingpandas/blob/master/sparklingpandas/utils.py to see one way how to setup the imports and then just do an `import pyspark` - although it should be more or less equivalent with what you've posted for repro.
That being said I tried your repro steps with Python 2.7.11 & 3.4.3 on 2ae7b88a07140e012b6c60db3c4a2a8ca360c684 and it doesn't repro:
>>> import sys
>>> import os
>>> sys.path = [os.environ['SPARK_HOME'] + "/python/lib/py4j-0.10.1-src.zip"] + sys.path
>>> sys.path = [os.environ['SPARK_HOME'] + "/python/"] + sys.path
>>> sys.path
['/home/holden/repos/spark/python/', '/home/holden/repos/spark/python/lib/py4j-0.10.1-src.zip', '', '/home/holden/miniconda2/lib/python27.zip', '/home/holden/miniconda2/lib/python2.7', '/home/holden/miniconda2/lib/python2.7/plat-linux2', '/home/holden/miniconda2/lib/python2.7/lib-tk', '/home/holden/miniconda2/lib/python2.7/lib-old', '/home/holden/miniconda2/lib/python2.7/lib-dynload', '/home/holden/.local/lib/python2.7/site-packages', '/home/holden/miniconda2/lib/python2.7/site-packages', '/home/holden/miniconda2/lib/python2.7/site-packages/setuptools-20.3-py2.7.egg']
>>> import pyspark
#delayedflightjuststartedtoboard
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14303: [SPARK-16665] python import pyspark fails
Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on the issue:
https://github.com/apache/spark/pull/14303
Sorry for the super brief answer - got to run to a flight - but if we could get a repro that would be great :)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14303: [SPARK-16665] python import pyspark fails
Posted by eastlondoner <gi...@git.apache.org>.
Github user eastlondoner commented on the issue:
https://github.com/apache/spark/pull/14303
@BryanCutler got it!
I had previously had a failed import, I didn't realise that python 'remembered' failed imports.
Thanks for the help!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14303: [SPARK-16665] python import pyspark fails
Posted by eastlondoner <gi...@git.apache.org>.
Github user eastlondoner commented on the issue:
https://github.com/apache/spark/pull/14303
note the stacktrace. It occurs inside the __init__.py of pyspark.
So the pyspark module is not actually initialized and you are trying to import accumulators from it.
It *may* be that re-arranging the order of the __init__.py might make this go away
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14303: [SPARK-16665] python import pyspark fails
Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on the issue:
https://github.com/apache/spark/pull/14303
I _think_ it is different in that the side effects are slightly different - but it shouldn't actually impact this reported problem. I'm a little fuzzy on the specific Python importing rules between Python versions though so I could be missremembering something.
I think it would be useful to know how this problem came to be - are you running your python job through spark submit or are you trying to import pyspark in another way) and do you have a repro we could see?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #14303: [SPARK-16665] python import pyspark fails
Posted by eastlondoner <gi...@git.apache.org>.
Github user eastlondoner closed the pull request at:
https://github.com/apache/spark/pull/14303
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14303: [SPARK-16665] python import pyspark fails
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/14303
It seems fine with me too.
```bash
$ python
Python 2.7.10 (default, Oct 23 2015, 19:19:21)
[GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.0.59.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyspark
>>> filter(lambda x: x == "accumulators", dir(pyspark))
['accumulators']
>>> pyspark.accumulators
<module 'pyspark.accumulators' from .../spark/python/pyspark/accumulators.pyc'>
```
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14303: [SPARK-16665] python import pyspark fails
Posted by eastlondoner <gi...@git.apache.org>.
Github user eastlondoner commented on the issue:
https://github.com/apache/spark/pull/14303
Quick repro
cd python
python
import sys
sys.path = [os.environ[SPARK_HOME] + "/python/lib/py4j-0.10.1-src.zip"] + sys.path sys.path
sys.path = [os.environ[SPARK_HOME] + "/python/"] + sys.path sys.path
import pyspark
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14303: [SPARK-16665] python import pyspark fails
Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/14303
Is that actually different? CC @holdenk
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #14303: [SPARK-16665] python import pyspark fails
Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/14303
@eastlondoner , the import error you show indicates that `accumulators` has already been imported. I was able to run the steps in your reproduction without error, but I did accidentally get the error one time when I forgot to add the py4j to the sys path and the `context` module was partially loaded with `accumulators`, so when I called 'import pyspark' again, I got the same import error as you.
Can you verify that there were no previous import attempts when you receive this error?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org