You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Robert C Senkbeil <rc...@us.ibm.com> on 2014/10/05 19:16:25 UTC

Jython importing pyspark?


Hi there,

I wanted to ask whether or not anyone has successfully used Jython with the
pyspark library. I wasn't sure if the C extension support was needed for
pyspark itself or was just a bonus of using Cython.

There was a claim (
http://apache-spark-developers-list.1001551.n3.nabble.com/PySpark-Driver-from-Jython-td7142.html#a7269
) that using Jython would be better - if you didn't need C extension
support - because the cost of serialization is lower. However, I have not
been able to import pyspark into a Jython session. I'm using version 2.7b3
of Jython and version 1.1.0 of Spark for reference.

Jython 2.7b3 (default:e81256215fb0, Aug 4 2014, 02:39:51)
[Java HotSpot(TM) 64-Bit Server VM (Oracle Corporation)] on java1.7.0_51
Type "help", "copyright", "credits" or "license" for more information.
>>> from pyspark import SparkContext, SparkConf
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pyspark/__init__.py", line 63, in <module>
  File "pyspark/context.py", line 25, in <module>
  File "pyspark/accumulators.py", line 94, in <module>
  File "pyspark/serializers.py", line 341, in <module>
  File "pyspark/serializers.py", line 328, in _hijack_namedtuple
RuntimeError: maximum recursion depth exceeded (Java StackOverflowError)

Is there something I am missing with this? Did Jython ever work for
pyspark? The same error happens regardless of whether I use the Python
files or compile them down to Java class files using Jython first.

I know that previous documentation (0.9.1) indicated, "PySpark requires
Python 2.6 or higher. PySpark applications are executed using a standard
CPython interpreter in order to support Python modules that use C
extensions. We have not tested PySpark with Python 3 or with alternative
Python interpreters, such as PyPy or Jython."

In later versions, it now reflects, "Spark 1.1.0 works with Python 2.6 or
higher (but not Python 3). It uses the standard CPython interpreter, so C
libraries like NumPy can be used."

I'm assuming this means that attempts to use other interpreters failed. If
so, are there any plans to support something like Jython in the future?

Signed,
Chip Senkbeil

Re: Jython importing pyspark?

Posted by Matei Zaharia <ma...@gmail.com>.
PySpark doesn't attempt to support Jython at present. IMO while it might be a bit faster, it would lose a lot of the benefits of Python, which are the very strong data processing libraries (NumPy, SciPy, Pandas, etc). So I'm not sure it's worth supporting unless someone demonstrates a really major performance benefit.

There was actually a recent patch to add PyPy support (https://github.com/apache/spark/pull/2144), which is worth a try if you want Python applications to run faster. It might actually be faster overall than Jython.

Matei

On Oct 5, 2014, at 10:16 AM, Robert C Senkbeil <rc...@us.ibm.com> wrote:

> 
> 
> Hi there,
> 
> I wanted to ask whether or not anyone has successfully used Jython with the
> pyspark library. I wasn't sure if the C extension support was needed for
> pyspark itself or was just a bonus of using Cython.
> 
> There was a claim (
> http://apache-spark-developers-list.1001551.n3.nabble.com/PySpark-Driver-from-Jython-td7142.html#a7269
> ) that using Jython would be better - if you didn't need C extension
> support - because the cost of serialization is lower. However, I have not
> been able to import pyspark into a Jython session. I'm using version 2.7b3
> of Jython and version 1.1.0 of Spark for reference.
> 
> Jython 2.7b3 (default:e81256215fb0, Aug 4 2014, 02:39:51)
> [Java HotSpot(TM) 64-Bit Server VM (Oracle Corporation)] on java1.7.0_51
> Type "help", "copyright", "credits" or "license" for more information.
>>>> from pyspark import SparkContext, SparkConf
> Traceback (most recent call last):
>  File "<stdin>", line 1, in <module>
>  File "pyspark/__init__.py", line 63, in <module>
>  File "pyspark/context.py", line 25, in <module>
>  File "pyspark/accumulators.py", line 94, in <module>
>  File "pyspark/serializers.py", line 341, in <module>
>  File "pyspark/serializers.py", line 328, in _hijack_namedtuple
> RuntimeError: maximum recursion depth exceeded (Java StackOverflowError)
> 
> Is there something I am missing with this? Did Jython ever work for
> pyspark? The same error happens regardless of whether I use the Python
> files or compile them down to Java class files using Jython first.
> 
> I know that previous documentation (0.9.1) indicated, "PySpark requires
> Python 2.6 or higher. PySpark applications are executed using a standard
> CPython interpreter in order to support Python modules that use C
> extensions. We have not tested PySpark with Python 3 or with alternative
> Python interpreters, such as PyPy or Jython."
> 
> In later versions, it now reflects, "Spark 1.1.0 works with Python 2.6 or
> higher (but not Python 3). It uses the standard CPython interpreter, so C
> libraries like NumPy can be used."
> 
> I'm assuming this means that attempts to use other interpreters failed. If
> so, are there any plans to support something like Jython in the future?
> 
> Signed,
> Chip Senkbeil


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org