You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Tom Graves <tg...@yahoo.com.INVALID> on 2017/08/14 20:55:08 UTC

spark pypy support?

Anyone know if pypy works with spark. Saw a jira that it was supported back in Spark 1.2 but getting an error when trying and not sure if its something with my pypy version of just something spark doesn't support.

AttributeError: 'builtin-code' object has no attribute 'co_filename'
Traceback (most recent call last):
  File "<builtin>/app_main.py", line 75, in run_toplevel
  File "/homes/tgraves/mbe.py", line 40, in <module>
    count = sc.parallelize(range(1, n + 1), partitions).map(f).reduce(add)
  File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py", line 834, in reduce
    vals = self.mapPartitions(func).collect()
  File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py", line 808, in collect
    port = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
  File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py", line 2440, in _jrdd
    self._jrdd_deserializer, profiler)
  File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py", line 2373, in _wrap_function
    pickled_command, broadcast_vars, env, includes = _prepare_for_python_RDD(sc, command)
  File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py", line 2359, in _prepare_for_python_RDD
    pickled_command = ser.dumps(command)
  File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/serializers.py", line 460, in dumps
    return cloudpickle.dumps(obj, 2)
  File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 703, in dumps
    cp.dump(obj)
  File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 160, in dump

Thanks,Tom

Re: spark pypy support?

Posted by Tom Graves <tg...@yahoo.com.INVALID>.

Just curious, is this using the portable version of pypy or standard version (ubuntu?)?
Tom

On Monday, August 14, 2017, 5:27:11 PM CDT, Holden Karau <ho...@pigscanfly.ca> wrote:

Ah interesting, looking at our latest docs we imply that it should work with PyPy 2.3+ -- we might want to update that to 2.5+ since we aren't testing with 2.3 anymore?
On Mon, Aug 14, 2017 at 3:09 PM, Tom Graves <tg...@yahoo.com.invalid> wrote:

I tried 5.7 and 2.5.1 so its probably something in my setup.  I'll investigate that more, wanted to make sure it was still supported because I didn't see anything about it since the original jira that added it.
Thanks,Tom

On Monday, August 14, 2017, 4:29:01 PM CDT, shane knapp <sk...@berkeley.edu> wrote:

actually, we *have* locked on a particular pypy versions for the
jenkins workers:  2.5.1

this applies to both the 2.7 and 3.5 conda environments.

(py3k)-bash-4.1$ pypy --version
Python 2.7.9 ( 9c4588d731b7fe0b08669bd732c2b6 76cb0a8233, Apr 09 2015, 02:17:39)
[PyPy 2.5.1 with GCC 4.4.7 20120313 (Red Hat 4.4.7-11)]

On Mon, Aug 14, 2017 at 2:24 PM, Holden Karau <ho...@pigscanfly.ca> wrote:
> As Dong says yes we do test with PyPy in our CI env; but we expect a "newer"
> version of PyPy (although I don't think we ever bothered to write down what
> the exact version requirements are for the PyPy support unlike regular
> Python).
>
> On Mon, Aug 14, 2017 at 2:06 PM, Dong Joon Hyun <dh...@hortonworks.com>
> wrote:
>>
>> Hi, Tom.
>>
>>
>>
>> What version of PyPy do you use?
>>
>>
>>
>> In the Jenkins environment, `pypy` always passes like Python 2.7 and
>> Python 3.4.
>>
>>
>>
>>
>> https://amplab.cs.berkeley. edu/jenkins/view/Spark%20QA% 20Test%20(Dashboard)/job/ spark-master-test-sbt-hadoop- 2.7/3340/consoleFull
>>
>>
>>
>> ============================== ============================== ============
>>
>> Running PySpark tests
>>
>> ============================== ============================== ============
>>
>> Running PySpark tests. Output is in
>> /home/jenkins/workspace/spark- master-test-sbt-hadoop-2.7/ python/unit-tests.log
>>
>> Will test against the following Python executables: ['python2.7',
>> 'python3.4', 'pypy']
>>
>> Will test the following Python modules: ['pyspark-core', 'pyspark-ml',
>> 'pyspark-mllib', 'pyspark-sql', 'pyspark-streaming']
>>
>> Starting test(python2.7): pyspark.mllib.tests
>>
>> Starting test(pypy): pyspark.sql.tests
>>
>> Starting test(pypy): pyspark.tests
>>
>> Starting test(pypy): pyspark.streaming.tests
>>
>> Finished test(pypy): pyspark.tests (181s)
>>
>> …
>>
>>
>>
>> Tests passed in 1130 seconds
>>
>>
>>
>>
>>
>> Bests,
>>
>> Dongjoon.
>>
>>
>>
>>
>>
>> From: Tom Graves <tg...@yahoo.com.INVALID>
>> Date: Monday, August 14, 2017 at 1:55 PM
>> To: "dev@spark.apache.org" <de...@spark.apache.org>
>> Subject: spark pypy support?
>>
>>
>>
>> Anyone know if pypy works with spark. Saw a jira that it was supported
>> back in Spark 1.2 but getting an error when trying and not sure if its
>> something with my pypy version of just something spark doesn't support.
>>
>>
>>
>>
>>
>> AttributeError: 'builtin-code' object has no attribute 'co_filename'
>> Traceback (most recent call last):
>>  File "<builtin>/app_main.py", line 75, in run_toplevel
>>  File "/homes/tgraves/mbe.py", line 40, in <module>
>>    count = sc.parallelize(range(1, n + 1), partitions).map(f).reduce(add)
>>  File "/home/gs/spark/latest/python/ lib/pyspark.zip/pyspark/rdd. py", line
>> 834, in reduce
>>    vals = self.mapPartitions(func). collect()
>>  File "/home/gs/spark/latest/python/ lib/pyspark.zip/pyspark/rdd. py", line
>> 808, in collect
>>    port = self.ctx._jvm.PythonRDD. collectAndServe(self._jrdd. rdd())
>>  File "/home/gs/spark/latest/python/ lib/pyspark.zip/pyspark/rdd. py", line
>> 2440, in _jrdd
>>    self._jrdd_deserializer, profiler)
>>  File "/home/gs/spark/latest/python/ lib/pyspark.zip/pyspark/rdd. py", line
>> 2373, in _wrap_function
>>    pickled_command, broadcast_vars, env, includes =
>> _prepare_for_python_RDD(sc, command)
>>  File "/home/gs/spark/latest/python/ lib/pyspark.zip/pyspark/rdd. py", line
>> 2359, in _prepare_for_python_RDD
>>    pickled_command = ser.dumps(command)
>>  File
>> "/home/gs/spark/latest/python/ lib/pyspark.zip/pyspark/ serializers.py", line
>> 460, in dumps
>>    return cloudpickle.dumps(obj, 2)
>>  File
>> "/home/gs/spark/latest/python/ lib/pyspark.zip/pyspark/ cloudpickle.py", line
>> 703, in dumps
>>    cp.dump(obj)
>>  File
>> "/home/gs/spark/latest/python/ lib/pyspark.zip/pyspark/ cloudpickle.py", line
>> 160, in dump
>>
>>
>>
>> Thanks,
>>
>> Tom
>
>
>
>
> --
> Cell : 425-233-8271
> Twitter: https://twitter.com/ holdenkarau

------------------------------ ------------------------------ ---------
To unsubscribe e-mail: dev-unsubscribe@spark.apache. org




-- 
Cell : 425-233-8271Twitter: https://twitter.com/holdenkarau

Re: spark pypy support?

Posted by Holden Karau <ho...@pigscanfly.ca>.

Ah interesting, looking at our latest docs we imply that it should work
with PyPy 2.3+ -- we might want to update that to 2.5+ since we aren't
testing with 2.3 anymore?

On Mon, Aug 14, 2017 at 3:09 PM, Tom Graves <tg...@yahoo.com.invalid>
wrote:

> I tried 5.7 and 2.5.1 so its probably something in my setup.  I'll
> investigate that more, wanted to make sure it was still supported because I
> didn't see anything about it since the original jira that added it.
>
> Thanks,
> Tom
>
>
> On Monday, August 14, 2017, 4:29:01 PM CDT, shane knapp <
> sknapp@berkeley.edu> wrote:
>
>
> actually, we *have* locked on a particular pypy versions for the
> jenkins workers:  2.5.1
>
> this applies to both the 2.7 and 3.5 conda environments.
>
> (py3k)-bash-4.1$ pypy --version
> Python 2.7.9 (9c4588d731b7fe0b08669bd732c2b676cb0a8233, Apr 09 2015,
> 02:17:39)
> [PyPy 2.5.1 with GCC 4.4.7 20120313 (Red Hat 4.4.7-11)]
>
> On Mon, Aug 14, 2017 at 2:24 PM, Holden Karau <ho...@pigscanfly.ca>
> wrote:
> > As Dong says yes we do test with PyPy in our CI env; but we expect a
> "newer"
> > version of PyPy (although I don't think we ever bothered to write down
> what
> > the exact version requirements are for the PyPy support unlike regular
> > Python).
> >
> > On Mon, Aug 14, 2017 at 2:06 PM, Dong Joon Hyun <dh...@hortonworks.com>
> > wrote:
> >>
> >> Hi, Tom.
> >>
> >>
> >>
> >> What version of PyPy do you use?
> >>
> >>
> >>
> >> In the Jenkins environment, `pypy` always passes like Python 2.7 and
> >> Python 3.4.
> >>
> >>
> >>
> >>
> >> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20
> (Dashboard)/job/spark-master-test-sbt-hadoop-2.7/3340/consoleFull
> >>
> >>
> >>
> >> ============================================================
> ============
> >>
> >> Running PySpark tests
> >>
> >> ============================================================
> ============
> >>
> >> Running PySpark tests. Output is in
> >> /home/jenkins/workspace/spark-master-test-sbt-hadoop-2.7/
> python/unit-tests.log
> >>
> >> Will test against the following Python executables: ['python2.7',
> >> 'python3.4', 'pypy']
> >>
> >> Will test the following Python modules: ['pyspark-core', 'pyspark-ml',
> >> 'pyspark-mllib', 'pyspark-sql', 'pyspark-streaming']
> >>
> >> Starting test(python2.7): pyspark.mllib.tests
> >>
> >> Starting test(pypy): pyspark.sql.tests
> >>
> >> Starting test(pypy): pyspark.tests
> >>
> >> Starting test(pypy): pyspark.streaming.tests
> >>
> >> Finished test(pypy): pyspark.tests (181s)
> >>
> >> …
> >>
> >>
> >>
> >> Tests passed in 1130 seconds
> >>
> >>
> >>
> >>
> >>
> >> Bests,
> >>
> >> Dongjoon.
> >>
> >>
> >>
> >>
> >>
> >> From: Tom Graves <tg...@yahoo.com.INVALID>
> >> Date: Monday, August 14, 2017 at 1:55 PM
> >> To: "dev@spark.apache.org" <de...@spark.apache.org>
> >> Subject: spark pypy support?
> >>
> >>
> >>
> >> Anyone know if pypy works with spark. Saw a jira that it was supported
> >> back in Spark 1.2 but getting an error when trying and not sure if its
> >> something with my pypy version of just something spark doesn't support.
> >>
> >>
> >>
> >>
> >>
> >> AttributeError: 'builtin-code' object has no attribute 'co_filename'
> >> Traceback (most recent call last):
> >>  File "<builtin>/app_main.py", line 75, in run_toplevel
> >>  File "/homes/tgraves/mbe.py", line 40, in <module>
> >>    count = sc.parallelize(range(1, n + 1),
> partitions).map(f).reduce(add)
> >>  File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py",
> line
> >> 834, in reduce
> >>    vals = self.mapPartitions(func).collect()
> >>  File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py",
> line
> >> 808, in collect
> >>    port = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
> >>  File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py",
> line
> >> 2440, in _jrdd
> >>    self._jrdd_deserializer, profiler)
> >>  File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py",
> line
> >> 2373, in _wrap_function
> >>    pickled_command, broadcast_vars, env, includes =
> >> _prepare_for_python_RDD(sc, command)
> >>  File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py",
> line
> >> 2359, in _prepare_for_python_RDD
> >>    pickled_command = ser.dumps(command)
> >>  File
> >> "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/serializers.py",
> line
> >> 460, in dumps
> >>    return cloudpickle.dumps(obj, 2)
> >>  File
> >> "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/cloudpickle.py",
> line
> >> 703, in dumps
> >>    cp.dump(obj)
> >>  File
> >> "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/cloudpickle.py",
> line
> >> 160, in dump
> >>
> >>
> >>
> >> Thanks,
> >>
> >> Tom
> >
> >
> >
> >
> > --
> > Cell : 425-233-8271 <(425)%20233-8271>
> > Twitter: https://twitter.com/holdenkarau
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>


-- 
Cell : 425-233-8271
Twitter: https://twitter.com/holdenkarau

Re: spark pypy support?

Posted by Tom Graves <tg...@yahoo.com.INVALID>.

I tried 5.7 and 2.5.1 so its probably something in my setup.  I'll investigate that more, wanted to make sure it was still supported because I didn't see anything about it since the original jira that added it.
Thanks,Tom

On Monday, August 14, 2017, 4:29:01 PM CDT, shane knapp <sk...@berkeley.edu> wrote:

actually, we *have* locked on a particular pypy versions for the
jenkins workers:  2.5.1

this applies to both the 2.7 and 3.5 conda environments.

(py3k)-bash-4.1$ pypy --version
Python 2.7.9 (9c4588d731b7fe0b08669bd732c2b676cb0a8233, Apr 09 2015, 02:17:39)
[PyPy 2.5.1 with GCC 4.4.7 20120313 (Red Hat 4.4.7-11)]

On Mon, Aug 14, 2017 at 2:24 PM, Holden Karau <ho...@pigscanfly.ca> wrote:
> As Dong says yes we do test with PyPy in our CI env; but we expect a "newer"
> version of PyPy (although I don't think we ever bothered to write down what
> the exact version requirements are for the PyPy support unlike regular
> Python).
>
> On Mon, Aug 14, 2017 at 2:06 PM, Dong Joon Hyun <dh...@hortonworks.com>
> wrote:
>>
>> Hi, Tom.
>>
>>
>>
>> What version of PyPy do you use?
>>
>>
>>
>> In the Jenkins environment, `pypy` always passes like Python 2.7 and
>> Python 3.4.
>>
>>
>>
>>
>> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.7/3340/consoleFull
>>
>>
>>
>> ========================================================================
>>
>> Running PySpark tests
>>
>> ========================================================================
>>
>> Running PySpark tests. Output is in
>> /home/jenkins/workspace/spark-master-test-sbt-hadoop-2.7/python/unit-tests.log
>>
>> Will test against the following Python executables: ['python2.7',
>> 'python3.4', 'pypy']
>>
>> Will test the following Python modules: ['pyspark-core', 'pyspark-ml',
>> 'pyspark-mllib', 'pyspark-sql', 'pyspark-streaming']
>>
>> Starting test(python2.7): pyspark.mllib.tests
>>
>> Starting test(pypy): pyspark.sql.tests
>>
>> Starting test(pypy): pyspark.tests
>>
>> Starting test(pypy): pyspark.streaming.tests
>>
>> Finished test(pypy): pyspark.tests (181s)
>>
>> …
>>
>>
>>
>> Tests passed in 1130 seconds
>>
>>
>>
>>
>>
>> Bests,
>>
>> Dongjoon.
>>
>>
>>
>>
>>
>> From: Tom Graves <tg...@yahoo.com.INVALID>
>> Date: Monday, August 14, 2017 at 1:55 PM
>> To: "dev@spark.apache.org" <de...@spark.apache.org>
>> Subject: spark pypy support?
>>
>>
>>
>> Anyone know if pypy works with spark. Saw a jira that it was supported
>> back in Spark 1.2 but getting an error when trying and not sure if its
>> something with my pypy version of just something spark doesn't support.
>>
>>
>>
>>
>>
>> AttributeError: 'builtin-code' object has no attribute 'co_filename'
>> Traceback (most recent call last):
>>  File "<builtin>/app_main.py", line 75, in run_toplevel
>>  File "/homes/tgraves/mbe.py", line 40, in <module>
>>    count = sc.parallelize(range(1, n + 1), partitions).map(f).reduce(add)
>>  File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py", line
>> 834, in reduce
>>    vals = self.mapPartitions(func).collect()
>>  File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py", line
>> 808, in collect
>>    port = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
>>  File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py", line
>> 2440, in _jrdd
>>    self._jrdd_deserializer, profiler)
>>  File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py", line
>> 2373, in _wrap_function
>>    pickled_command, broadcast_vars, env, includes =
>> _prepare_for_python_RDD(sc, command)
>>  File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py", line
>> 2359, in _prepare_for_python_RDD
>>    pickled_command = ser.dumps(command)
>>  File
>> "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/serializers.py", line
>> 460, in dumps
>>    return cloudpickle.dumps(obj, 2)
>>  File
>> "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/cloudpickle.py", line
>> 703, in dumps
>>    cp.dump(obj)
>>  File
>> "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/cloudpickle.py", line
>> 160, in dump
>>
>>
>>
>> Thanks,
>>
>> Tom
>
>
>
>
> --
> Cell : 425-233-8271
> Twitter: https://twitter.com/holdenkarau

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org

Re: spark pypy support?

Posted by shane knapp <sk...@berkeley.edu>.

actually, we *have* locked on a particular pypy versions for the
jenkins workers:  2.5.1

this applies to both the 2.7 and 3.5 conda environments.

(py3k)-bash-4.1$ pypy --version
Python 2.7.9 (9c4588d731b7fe0b08669bd732c2b676cb0a8233, Apr 09 2015, 02:17:39)
[PyPy 2.5.1 with GCC 4.4.7 20120313 (Red Hat 4.4.7-11)]

On Mon, Aug 14, 2017 at 2:24 PM, Holden Karau <ho...@pigscanfly.ca> wrote:
> As Dong says yes we do test with PyPy in our CI env; but we expect a "newer"
> version of PyPy (although I don't think we ever bothered to write down what
> the exact version requirements are for the PyPy support unlike regular
> Python).
>
> On Mon, Aug 14, 2017 at 2:06 PM, Dong Joon Hyun <dh...@hortonworks.com>
> wrote:
>>
>> Hi, Tom.
>>
>>
>>
>> What version of PyPy do you use?
>>
>>
>>
>> In the Jenkins environment, `pypy` always passes like Python 2.7 and
>> Python 3.4.
>>
>>
>>
>>
>> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.7/3340/consoleFull
>>
>>
>>
>> ========================================================================
>>
>> Running PySpark tests
>>
>> ========================================================================
>>
>> Running PySpark tests. Output is in
>> /home/jenkins/workspace/spark-master-test-sbt-hadoop-2.7/python/unit-tests.log
>>
>> Will test against the following Python executables: ['python2.7',
>> 'python3.4', 'pypy']
>>
>> Will test the following Python modules: ['pyspark-core', 'pyspark-ml',
>> 'pyspark-mllib', 'pyspark-sql', 'pyspark-streaming']
>>
>> Starting test(python2.7): pyspark.mllib.tests
>>
>> Starting test(pypy): pyspark.sql.tests
>>
>> Starting test(pypy): pyspark.tests
>>
>> Starting test(pypy): pyspark.streaming.tests
>>
>> Finished test(pypy): pyspark.tests (181s)
>>
>> …
>>
>>
>>
>> Tests passed in 1130 seconds
>>
>>
>>
>>
>>
>> Bests,
>>
>> Dongjoon.
>>
>>
>>
>>
>>
>> From: Tom Graves <tg...@yahoo.com.INVALID>
>> Date: Monday, August 14, 2017 at 1:55 PM
>> To: "dev@spark.apache.org" <de...@spark.apache.org>
>> Subject: spark pypy support?
>>
>>
>>
>> Anyone know if pypy works with spark. Saw a jira that it was supported
>> back in Spark 1.2 but getting an error when trying and not sure if its
>> something with my pypy version of just something spark doesn't support.
>>
>>
>>
>>
>>
>> AttributeError: 'builtin-code' object has no attribute 'co_filename'
>> Traceback (most recent call last):
>>   File "<builtin>/app_main.py", line 75, in run_toplevel
>>   File "/homes/tgraves/mbe.py", line 40, in <module>
>>     count = sc.parallelize(range(1, n + 1), partitions).map(f).reduce(add)
>>   File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py", line
>> 834, in reduce
>>     vals = self.mapPartitions(func).collect()
>>   File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py", line
>> 808, in collect
>>     port = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
>>   File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py", line
>> 2440, in _jrdd
>>     self._jrdd_deserializer, profiler)
>>   File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py", line
>> 2373, in _wrap_function
>>     pickled_command, broadcast_vars, env, includes =
>> _prepare_for_python_RDD(sc, command)
>>   File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py", line
>> 2359, in _prepare_for_python_RDD
>>     pickled_command = ser.dumps(command)
>>   File
>> "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/serializers.py", line
>> 460, in dumps
>>     return cloudpickle.dumps(obj, 2)
>>   File
>> "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/cloudpickle.py", line
>> 703, in dumps
>>     cp.dump(obj)
>>   File
>> "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/cloudpickle.py", line
>> 160, in dump
>>
>>
>>
>> Thanks,
>>
>> Tom
>
>
>
>
> --
> Cell : 425-233-8271
> Twitter: https://twitter.com/holdenkarau

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org

Re: spark pypy support?

Posted by Holden Karau <ho...@pigscanfly.ca>.

As Dong says yes we do test with PyPy in our CI env; but we expect a
"newer" version of PyPy (although I don't think we ever bothered to write
down what the exact version requirements are for the PyPy support unlike
regular Python).

On Mon, Aug 14, 2017 at 2:06 PM, Dong Joon Hyun <dh...@hortonworks.com>
wrote:

> Hi, Tom.
>
>
>
> What version of PyPy do you use?
>
>
>
> In the Jenkins environment, `pypy` always passes like Python 2.7 and
> Python 3.4.
>
>
>
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%
> 20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.7/3340/consoleFull
>
>
>
> ========================================================================
>
> Running PySpark tests
>
> ========================================================================
>
> Running PySpark tests. Output is in /home/jenkins/workspace/spark-
> master-test-sbt-hadoop-2.7/python/unit-tests.log
>
> Will test against the following Python executables: ['python2.7',
> 'python3.4', 'pypy']
>
> Will test the following Python modules: ['pyspark-core', 'pyspark-ml',
> 'pyspark-mllib', 'pyspark-sql', 'pyspark-streaming']
>
> Starting test(python2.7): pyspark.mllib.tests
>
> Starting test(pypy): pyspark.sql.tests
>
> Starting test(pypy): pyspark.tests
>
> Starting test(pypy): pyspark.streaming.tests
>
> Finished test(pypy): pyspark.tests (181s)
>
> …
>
>
>
> Tests passed in 1130 seconds
>
>
>
>
>
> Bests,
>
> Dongjoon.
>
>
>
>
>
> *From: *Tom Graves <tg...@yahoo.com.INVALID>
> *Date: *Monday, August 14, 2017 at 1:55 PM
> *To: *"dev@spark.apache.org" <de...@spark.apache.org>
> *Subject: *spark pypy support?
>
>
>
> Anyone know if pypy works with spark. Saw a jira that it was supported
> back in Spark 1.2 but getting an error when trying and not sure if its
> something with my pypy version of just something spark doesn't support.
>
>
>
>
>
> AttributeError: 'builtin-code' object has no attribute 'co_filename'
> Traceback (most recent call last):
>   File "<builtin>/app_main.py", line 75, in run_toplevel
>   File "/homes/tgraves/mbe.py", line 40, in <module>
>     count = sc.parallelize(range(1, n + 1), partitions).map(f).reduce(add)
>   File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py",
> line 834, in reduce
>     vals = self.mapPartitions(func).collect()
>   File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py",
> line 808, in collect
>     port = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
>   File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py",
> line 2440, in _jrdd
>     self._jrdd_deserializer, profiler)
>   File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py",
> line 2373, in _wrap_function
>     pickled_command, broadcast_vars, env, includes =
> _prepare_for_python_RDD(sc, command)
>   File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py",
> line 2359, in _prepare_for_python_RDD
>     pickled_command = ser.dumps(command)
>   File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/serializers.py",
> line 460, in dumps
>     return cloudpickle.dumps(obj, 2)
>   File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/cloudpickle.py",
> line 703, in dumps
>     cp.dump(obj)
>   File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/cloudpickle.py",
> line 160, in dump
>
>
>
> Thanks,
>
> Tom
>



-- 
Cell : 425-233-8271
Twitter: https://twitter.com/holdenkarau

Re: spark pypy support?

Posted by Dong Joon Hyun <dh...@hortonworks.com>.

Hi, Tom.

What version of PyPy do you use?

In the Jenkins environment, `pypy` always passes like Python 2.7 and Python 3.4.

https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.7/3340/consoleFull

========================================================================
Running PySpark tests
========================================================================
Running PySpark tests. Output is in /home/jenkins/workspace/spark-master-test-sbt-hadoop-2.7/python/unit-tests.log
Will test against the following Python executables: ['python2.7', 'python3.4', 'pypy']
Will test the following Python modules: ['pyspark-core', 'pyspark-ml', 'pyspark-mllib', 'pyspark-sql', 'pyspark-streaming']
Starting test(python2.7): pyspark.mllib.tests
Starting test(pypy): pyspark.sql.tests
Starting test(pypy): pyspark.tests
Starting test(pypy): pyspark.streaming.tests
Finished test(pypy): pyspark.tests (181s)
…

Tests passed in 1130 seconds


Bests,
Dongjoon.


From: Tom Graves <tg...@yahoo.com.INVALID>
Date: Monday, August 14, 2017 at 1:55 PM
To: "dev@spark.apache.org" <de...@spark.apache.org>
Subject: spark pypy support?

Anyone know if pypy works with spark. Saw a jira that it was supported back in Spark 1.2 but getting an error when trying and not sure if its something with my pypy version of just something spark doesn't support.


AttributeError: 'builtin-code' object has no attribute 'co_filename'
Traceback (most recent call last):
  File "<builtin>/app_main.py", line 75, in run_toplevel
  File "/homes/tgraves/mbe.py", line 40, in <module>
    count = sc.parallelize(range(1, n + 1), partitions).map(f).reduce(add)
  File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py", line 834, in reduce
    vals = self.mapPartitions(func).collect()
  File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py", line 808, in collect
    port = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
  File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py", line 2440, in _jrdd
    self._jrdd_deserializer, profiler)
  File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py", line 2373, in _wrap_function
    pickled_command, broadcast_vars, env, includes = _prepare_for_python_RDD(sc, command)
  File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py", line 2359, in _prepare_for_python_RDD
    pickled_command = ser.dumps(command)
  File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/serializers.py", line 460, in dumps
    return cloudpickle.dumps(obj, 2)
  File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 703, in dumps
    cp.dump(obj)
  File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 160, in dump

Thanks,
Tom