You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Andy Davidson <An...@SantaCruzIntegration.com> on 2018/04/05 00:36:43 UTC
how to set up pyspark eclipse, pyDev, virtualenv? syntaxError: yield
from walk(
I am having a heck of a time setting up my development environment. I used
pip to install pyspark. I also downloaded spark from apache.
My eclipse pyDev intereperter is configured as a python3 virtualenv
I have a simple unit test that loads a small dataframe. Df.show() generates
the following error
2018-04-04 17:13:56 ERROR Executor:91 - Exception in task 0.0 in stage 0.0
(TID 0)
org.apache.spark.SparkException:
Error from python worker:
Traceback (most recent call last):
File "/Users/a/workSpace/pythonEnv/spark-2.3.0/lib/python3.6/site.py",
line 67, in <module>
import os
File "/Users/a/workSpace/pythonEnv/spark-2.3.0/lib/python3.6/os.py",
line 409
yield from walk(new_path, topdown, onerror, followlinks)
^
SyntaxError: invalid syntax
My unittest classs is dervied from.
class PySparkTestCase(unittest.TestCase):
@classmethod
def setUpClass(cls):
conf = SparkConf().setMaster("local[2]") \
.setAppName(cls.__name__) #\
# .set("spark.authenticate.secret", "111111")
cls.sparkContext = SparkContext(conf=conf)
sc_values[cls.__name__] = cls.sparkContext
cls.sqlContext = SQLContext(cls.sparkContext)
print("aedwip:", SparkContext)
@classmethod
def tearDownClass(cls):
print("....calling stop tearDownClas, the content of sc_values=",
sc_values)
sc_values.clear()
cls.sparkContext.stop()
This looks similar to Class PySparkTestCase in
https://github.com/apache/spark/blob/master/python/pyspark/tests.py
Any suggestions would be greatly appreciated.
Andy
My downloaed version is spark-2.3.0-bin-hadoop2.7
My virtual env version is
(spark-2.3.0) $ pip show pySpark
Name: pyspark
Version: 2.3.0
Summary: Apache Spark Python API
Home-page: https://github.com/apache/spark/tree/master/python
Author: Spark Developers
Author-email: dev@spark.apache.org
License: http://www.apache.org/licenses/LICENSE-2.0
Location:
/Users/a/workSpace/pythonEnv/spark-2.3.0/lib/python3.6/site-packages
Requires: py4j
(spark-2.3.0) $
(spark-2.3.0) $ python --version
Python 3.6.1
(spark-2.3.0) $
Re: how to set up pyspark eclipse, pyDev, virtualenv? syntaxError:
yield from walk(
Posted by Andy Davidson <An...@SantaCruzIntegration.com>.
Hi Hyukjin
Thanks for the links.
At this point I sort of got my eclipse, pyDev, spark, unitTests working. In
my unit test I can run from the cmd line or from with in eclipse a simple
unit test. The test creates a data frame from a text file and calls
df.show()
The last challenge is that it appears pyspark.sql.functions defines some
functions at run time. Examples are lit() and col(). The causes problem with
my IDE
https://issues.apache.org/jira/browse/SPARK-23878?page=com.atlassian.jira.pl
ugin.system.issuetabpanels%3Acomment-tabpanel&focusedCommentId=16427812#comm
ent-16427812
Andy
P.s. I original started my project using jupyter notebooks. The code base
got to big to manage using notebooks. I am in the process of refactoring
common code into python modules using a standard python IDE. In the IDE I
need to be import all the spark functions and be able to write and run unit
tests.
I choose eclipse because I have a lot of spark code written in java. Its
easier for me to have one IDE for all my java and python code.
From: Hyukjin Kwon <gu...@gmail.com>
Date: Thursday, April 5, 2018 at 6:09 PM
To: Andrew Davidson <An...@SantaCruzIntegration.com>
Cc: "user @spark" <us...@spark.apache.org>
Subject: Re: how to set up pyspark eclipse, pyDev, virtualenv? syntaxError:
yield from walk(
> FYI, there is a PR and JIRA for virtualEnv support in PySpark
>
> https://issues.apache.org/jira/browse/SPARK-13587
> https://github.com/apache/spark/pull/13599
>
>
> 2018-04-06 7:48 GMT+08:00 Andy Davidson <An...@santacruzintegration.com>:
>> FYI
>>
>> http://www.learn4master.com/algorithms/pyspark-unit-test-set-up-sparkcontext
>>
>> From: Andrew Davidson <An...@SantaCruzIntegration.com>
>> Date: Wednesday, April 4, 2018 at 5:36 PM
>> To: "user @spark" <us...@spark.apache.org>
>> Subject: how to set up pyspark eclipse, pyDev, virtualenv? syntaxError:
>> yield from walk(
>>
>>> I am having a heck of a time setting up my development environment. I used
>>> pip to install pyspark. I also downloaded spark from apache.
>>>
>>> My eclipse pyDev intereperter is configured as a python3 virtualenv
>>>
>>> I have a simple unit test that loads a small dataframe. Df.show() generates
>>> the following error
>>>
>>>
>>> 2018-04-04 17:13:56 ERROR Executor:91 - Exception in task 0.0 in stage 0.0
>>> (TID 0)
>>>
>>> org.apache.spark.SparkException:
>>>
>>> Error from python worker:
>>>
>>> Traceback (most recent call last):
>>>
>>> File "/Users/a/workSpace/pythonEnv/spark-2.3.0/lib/python3.6/site.py",
>>> line 67, in <module>
>>>
>>> import os
>>>
>>> File "/Users/a/workSpace/pythonEnv/spark-2.3.0/lib/python3.6/os.py",
>>> line 409
>>>
>>> yield from walk(new_path, topdown, onerror, followlinks)
>>>
>>> ^
>>>
>>> SyntaxError: invalid syntax
>>>
>>>
>>>
>>>
>>>
>>> My unittest classs is dervied from.
>>>
>>>
>>>
>>> class PySparkTestCase(unittest.TestCase):
>>>
>>>
>>>
>>> @classmethod
>>>
>>> def setUpClass(cls):
>>>
>>> conf = SparkConf().setMaster("local[2]") \
>>>
>>> .setAppName(cls.__name__) #\
>>>
>>> # .set("spark.authenticate.secret", "111111")
>>>
>>> cls.sparkContext = SparkContext(conf=conf)
>>>
>>> sc_values[cls.__name__] = cls.sparkContext
>>>
>>> cls.sqlContext = SQLContext(cls.sparkContext)
>>>
>>> print("aedwip:", SparkContext)
>>>
>>>
>>>
>>> @classmethod
>>>
>>> def tearDownClass(cls):
>>>
>>> print("....calling stop tearDownClas, the content of sc_values=",
>>> sc_values)
>>>
>>> sc_values.clear()
>>>
>>> cls.sparkContext.stop()
>>>
>>>
>>>
>>> This looks similar to Class PySparkTestCase in
>>> https://github.com/apache/spark/blob/master/python/pyspark/tests.py
>>>
>>>
>>>
>>> Any suggestions would be greatly appreciated.
>>>
>>>
>>>
>>> Andy
>>>
>>>
>>>
>>> My downloaed version is spark-2.3.0-bin-hadoop2.7
>>>
>>>
>>>
>>> My virtual env version is
>>>
>>> (spark-2.3.0) $ pip show pySpark
>>>
>>> Name: pyspark
>>>
>>> Version: 2.3.0
>>>
>>> Summary: Apache Spark Python API
>>>
>>> Home-page: https://github.com/apache/spark/tree/master/python
>>>
>>> Author: Spark Developers
>>>
>>> Author-email: dev@spark.apache.org
>>>
>>> License: http://www.apache.org/licenses/LICENSE-2.0
>>>
>>> Location:
>>> /Users/a/workSpace/pythonEnv/spark-2.3.0/lib/python3.6/site-packages
>>>
>>> Requires: py4j
>>>
>>> (spark-2.3.0) $
>>>
>>>
>>>
>>> (spark-2.3.0) $ python --version
>>>
>>> Python 3.6.1
>>>
>>> (spark-2.3.0) $
>>>
>>>
>
Re: how to set up pyspark eclipse, pyDev, virtualenv? syntaxError:
yield from walk(
Posted by Hyukjin Kwon <gu...@gmail.com>.
FYI, there is a PR and JIRA for virtualEnv support in PySpark
https://issues.apache.org/jira/browse/SPARK-13587
https://github.com/apache/spark/pull/13599
2018-04-06 7:48 GMT+08:00 Andy Davidson <An...@santacruzintegration.com>:
> FYI
>
> http://www.learn4master.com/algorithms/pyspark-unit-test-
> set-up-sparkcontext
>
> From: Andrew Davidson <An...@SantaCruzIntegration.com>
> Date: Wednesday, April 4, 2018 at 5:36 PM
> To: "user @spark" <us...@spark.apache.org>
> Subject: how to set up pyspark eclipse, pyDev, virtualenv? syntaxError:
> yield from walk(
>
> I am having a heck of a time setting up my development environment. I used
> pip to install pyspark. I also downloaded spark from apache.
>
> My eclipse pyDev intereperter is configured as a python3 virtualenv
>
> I have a simple unit test that loads a small dataframe. Df.show()
> generates the following error
>
>
> 2018-04-04 17:13:56 ERROR Executor:91 - Exception in task 0.0 in stage 0.0
> (TID 0)
>
> org.apache.spark.SparkException:
>
> Error from python worker:
>
> Traceback (most recent call last):
>
> File "/Users/a/workSpace/pythonEnv/spark-2.3.0/lib/python3.6/site.py",
> line 67, in <module>
>
> import os
>
> File "/Users/a/workSpace/pythonEnv/spark-2.3.0/lib/python3.6/os.py",
> line 409
>
> yield from walk(new_path, topdown, onerror, followlinks)
>
> ^
>
> SyntaxError: invalid syntax
>
>
>
> My unittest classs is dervied from.
>
>
> class PySparkTestCase(unittest.TestCase):
>
>
> @classmethod
>
> def setUpClass(cls):
>
> conf = SparkConf().setMaster("local[2]") \
>
> .setAppName(cls.__name__) #\
>
> # .set("spark.authenticate.secret", "111111")
>
> cls.sparkContext = SparkContext(conf=conf)
>
> sc_values[cls.__name__] = cls.sparkContext
>
> cls.sqlContext = SQLContext(cls.sparkContext)
>
> print("aedwip:", SparkContext)
>
>
> @classmethod
>
> def tearDownClass(cls):
>
> print("....calling stop tearDownClas, the content of sc_values=",
> sc_values)
>
> sc_values.clear()
>
> cls.sparkContext.stop()
>
>
> This looks similar to Class PySparkTestCase in https://github.com/apache/
> spark/blob/master/python/pyspark/tests.py
>
>
> Any suggestions would be greatly appreciated.
>
>
> Andy
>
>
> My downloaed version is spark-2.3.0-bin-hadoop2.7
>
>
> My virtual env version is
>
> (spark-2.3.0) $ pip show pySpark
>
> Name: pyspark
>
> Version: 2.3.0
>
> Summary: Apache Spark Python API
>
> Home-page: https://github.com/apache/spark/tree/master/python
>
> Author: Spark Developers
>
> Author-email: dev@spark.apache.org
>
> License: http://www.apache.org/licenses/LICENSE-2.0
>
> Location: /Users/a/workSpace/pythonEnv/spark-2.3.0/lib/python3.6/
> site-packages
>
> Requires: py4j
>
> (spark-2.3.0) $
>
>
> (spark-2.3.0) $ python --version
>
> Python 3.6.1
>
> (spark-2.3.0) $
>
>
>
Re: how to set up pyspark eclipse, pyDev, virtualenv? syntaxError:
yield from walk(
Posted by Andy Davidson <An...@SantaCruzIntegration.com>.
FYI
http://www.learn4master.com/algorithms/pyspark-unit-test-set-up-sparkcontext
From: Andrew Davidson <An...@SantaCruzIntegration.com>
Date: Wednesday, April 4, 2018 at 5:36 PM
To: "user @spark" <us...@spark.apache.org>
Subject: how to set up pyspark eclipse, pyDev, virtualenv? syntaxError:
yield from walk(
> I am having a heck of a time setting up my development environment. I used pip
> to install pyspark. I also downloaded spark from apache.
>
> My eclipse pyDev intereperter is configured as a python3 virtualenv
>
> I have a simple unit test that loads a small dataframe. Df.show() generates
> the following error
>
>
> 2018-04-04 17:13:56 ERROR Executor:91 - Exception in task 0.0 in stage 0.0
> (TID 0)
>
> org.apache.spark.SparkException:
>
> Error from python worker:
>
> Traceback (most recent call last):
>
> File "/Users/a/workSpace/pythonEnv/spark-2.3.0/lib/python3.6/site.py",
> line 67, in <module>
>
> import os
>
> File "/Users/a/workSpace/pythonEnv/spark-2.3.0/lib/python3.6/os.py", line
> 409
>
> yield from walk(new_path, topdown, onerror, followlinks)
>
> ^
>
> SyntaxError: invalid syntax
>
>
>
>
>
> My unittest classs is dervied from.
>
>
>
> class PySparkTestCase(unittest.TestCase):
>
>
>
> @classmethod
>
> def setUpClass(cls):
>
> conf = SparkConf().setMaster("local[2]") \
>
> .setAppName(cls.__name__) #\
>
> # .set("spark.authenticate.secret", "111111")
>
> cls.sparkContext = SparkContext(conf=conf)
>
> sc_values[cls.__name__] = cls.sparkContext
>
> cls.sqlContext = SQLContext(cls.sparkContext)
>
> print("aedwip:", SparkContext)
>
>
>
> @classmethod
>
> def tearDownClass(cls):
>
> print("....calling stop tearDownClas, the content of sc_values=",
> sc_values)
>
> sc_values.clear()
>
> cls.sparkContext.stop()
>
>
>
> This looks similar to Class PySparkTestCase in
> https://github.com/apache/spark/blob/master/python/pyspark/tests.py
>
>
>
> Any suggestions would be greatly appreciated.
>
>
>
> Andy
>
>
>
> My downloaed version is spark-2.3.0-bin-hadoop2.7
>
>
>
> My virtual env version is
>
> (spark-2.3.0) $ pip show pySpark
>
> Name: pyspark
>
> Version: 2.3.0
>
> Summary: Apache Spark Python API
>
> Home-page: https://github.com/apache/spark/tree/master/python
>
> Author: Spark Developers
>
> Author-email: dev@spark.apache.org
>
> License: http://www.apache.org/licenses/LICENSE-2.0
>
> Location: /Users/a/workSpace/pythonEnv/spark-2.3.0/lib/python3.6/site-packages
>
> Requires: py4j
>
> (spark-2.3.0) $
>
>
>
> (spark-2.3.0) $ python --version
>
> Python 3.6.1
>
> (spark-2.3.0) $
>
>