You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Chang Ya-Hsuan <su...@gmail.com> on 2015/11/05 08:56:30 UTC

pyspark with pypy not work for spark-1.5.1

Hi all,

I am trying to run pyspark with pypy, and it is work when using spark-1.3.1
but failed when using spark-1.4.1 and spark-1.5.1

my pypy version:

$ /usr/bin/pypy --version
Python 2.7.3 (2.2.1+dfsg-1ubuntu0.3, Sep 30 2015, 15:18:40)
[PyPy 2.2.1 with GCC 4.8.4]

works with spark-1.3.1

$ PYSPARK_PYTHON=/usr/bin/pypy ~/Tool/spark-1.3.1-bin-hadoop2.6/bin/pyspark
Python 2.7.3 (2.2.1+dfsg-1ubuntu0.3, Sep 30 2015, 15:18:40)
[PyPy 2.2.1 with GCC 4.8.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
15/11/05 15:50:30 WARN Utils: Your hostname, xxxxxx resolves to a loopback
address: 127.0.1.1; using xxx.xxx.xxx.xxx instead (on interface eth0)
15/11/05 15:50:30 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to
another address
15/11/05 15:50:31 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 1.3.1
      /_/

Using Python version 2.7.3 (2.2.1+dfsg-1ubuntu0.3, Sep 30 2015)
SparkContext available as sc, HiveContext available as sqlContext.
And now for something completely different: ``Armin: "Prolog is a mess.",
CF:
"No, it's very cool!", Armin: "Isn't this what I said?"''
>>>

error message for 1.5.1

$ PYSPARK_PYTHON=/usr/bin/pypy ~/Tool/spark-1.5.1-bin-hadoop2.6/bin/pyspark
Python 2.7.3 (2.2.1+dfsg-1ubuntu0.3, Sep 30 2015, 15:18:40)
[PyPy 2.2.1 with GCC 4.8.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Traceback (most recent call last):
  File "app_main.py", line 72, in run_toplevel
  File "app_main.py", line 614, in run_it
  File
"/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/shell.py",
line 30, in <module>
    import pyspark
  File
"/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/__init__.py",
line 41, in <module>
    from pyspark.context import SparkContext
  File
"/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/context.py",
line 26, in <module>
    from pyspark import accumulators
  File
"/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/accumulators.py",
line 98, in <module>
    from pyspark.serializers import read_int, PickleSerializer
  File
"/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/serializers.py",
line 400, in <module>
    _hijack_namedtuple()
  File
"/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/serializers.py",
line 378, in _hijack_namedtuple
    _old_namedtuple = _copy_func(collections.namedtuple)
  File
"/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/serializers.py",
line 376, in _copy_func
    f.__defaults__, f.__closure__)
AttributeError: 'function' object has no attribute '__closure__'
And now for something completely different: ``the traces don't lie''

is this a known issue? any suggestion to resolve it? or how can I help to
fix this problem?

Thanks.

Re: pyspark with pypy not work for spark-1.5.1

Posted by Davies Liu <da...@databricks.com>.

We already test CPython 2.6, CPython 3.4 and PyPy 2.5, it took more
than 30 min to run (without parallelization),
I think it should be enough.

PyPy 2.2 is too old that we have not enough resource to support that.

On Fri, Nov 6, 2015 at 2:27 AM, Chang Ya-Hsuan <su...@gmail.com> wrote:
> Hi I run ./python/ru-tests to test following modules of spark-1.5.1:
>
> [pyspark-core', 'pyspark-ml', 'pyspark-mllib', 'pyspark-sql',
> 'pyspark-streaming]
>
> against to following pypy versions:
>
> pypy-2.2.1  pypy-2.3  pypy-2.3.1  pypy-2.4.0  pypy-2.5.0  pypy-2.5.1
> pypy-2.6.0  pypy-2.6.1  pypy-4.0.0
>
> except pypy-2.2.1, all others pass the test.
>
> the error message of pypy-2.2.1 is:
>
> Traceback (most recent call last):
>   File "app_main.py", line 72, in run_toplevel
>   File "/home/yahsuan/.pyenv/versions/pypy-2.2.1/lib-python/2.7/runpy.py",
> line 151, in _run_module_as_main
>     mod_name, loader, code, fname = _get_module_details(mod_name)
>   File "/home/yahsuan/.pyenv/versions/pypy-2.2.1/lib-python/2.7/runpy.py",
> line 101, in _get_module_details
>     loader = get_loader(mod_name)
>   File "/home/yahsuan/.pyenv/versions/pypy-2.2.1/lib-python/2.7/pkgutil.py",
> line 465, in get_loader
>     return find_loader(fullname)
>   File "/home/yahsuan/.pyenv/versions/pypy-2.2.1/lib-python/2.7/pkgutil.py",
> line 475, in find_loader
>     for importer in iter_importers(fullname):
>   File "/home/yahsuan/.pyenv/versions/pypy-2.2.1/lib-python/2.7/pkgutil.py",
> line 431, in iter_importers
>     __import__(pkg)
>   File "pyspark/__init__.py", line 41, in <module>
>     from pyspark.context import SparkContext
>   File "pyspark/context.py", line 26, in <module>
>     from pyspark import accumulators
>   File "pyspark/accumulators.py", line 98, in <module>
>     from pyspark.serializers import read_int, PickleSerializer
>   File "pyspark/serializers.py", line 400, in <module>
>     _hijack_namedtuple()
>   File "pyspark/serializers.py", line 378, in _hijack_namedtuple
>     _old_namedtuple = _copy_func(collections.namedtuple)
>   File "pyspark/serializers.py", line 376, in _copy_func
>     f.__defaults__, f.__closure__)
> AttributeError: 'function' object has no attribute '__closure__'
>
> p.s. would you want to test different pypy versions on your Jenkins? maybe I
> could help
>
> On Fri, Nov 6, 2015 at 2:23 AM, Josh Rosen <jo...@databricks.com> wrote:
>>
>> You could try running PySpark's own unit tests. Try ./python/run-tests
>> --help for instructions.
>>
>> On Thu, Nov 5, 2015 at 12:31 AM Chang Ya-Hsuan <su...@gmail.com> wrote:
>>>
>>> I've test on following pypy version against to spark-1.5.1
>>>
>>>   pypy-2.2.1
>>>   pypy-2.3
>>>   pypy-2.3.1
>>>   pypy-2.4.0
>>>   pypy-2.5.0
>>>   pypy-2.5.1
>>>   pypy-2.6.0
>>>   pypy-2.6.1
>>>
>>> I run
>>>
>>>     $ PYSPARK_PYTHON=/path/to/pypy-xx.xx/bin/pypy
>>> /path/to/spark-1.5.1/bin/pyspark
>>>
>>> and only pypy-2.2.1 failed.
>>>
>>> Any suggestion to run advanced test?
>>>
>>> On Thu, Nov 5, 2015 at 4:14 PM, Chang Ya-Hsuan <su...@gmail.com>
>>> wrote:
>>>>
>>>> Thanks for your quickly reply.
>>>>
>>>> I will test several pypy versions and report the result later.
>>>>
>>>> On Thu, Nov 5, 2015 at 4:06 PM, Josh Rosen <ro...@gmail.com> wrote:
>>>>>
>>>>> I noticed that you're using PyPy 2.2.1, but it looks like Spark 1.5.1's
>>>>> docs say that we only support PyPy 2.3+. Could you try using a newer PyPy
>>>>> version to see if that works?
>>>>>
>>>>> I just checked and it looks like our Jenkins tests are running against
>>>>> PyPy 2.5.1, so that version is known to work. I'm not sure what the actual
>>>>> minimum supported PyPy version is. Would you be interested in helping to
>>>>> investigate so that we can update the documentation or produce a fix to
>>>>> restore compatibility with earlier PyPy builds?
>>>>>
>>>>> On Wed, Nov 4, 2015 at 11:56 PM, Chang Ya-Hsuan <su...@gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I am trying to run pyspark with pypy, and it is work when using
>>>>>> spark-1.3.1 but failed when using spark-1.4.1 and spark-1.5.1
>>>>>>
>>>>>> my pypy version:
>>>>>>
>>>>>> $ /usr/bin/pypy --version
>>>>>> Python 2.7.3 (2.2.1+dfsg-1ubuntu0.3, Sep 30 2015, 15:18:40)
>>>>>> [PyPy 2.2.1 with GCC 4.8.4]
>>>>>>
>>>>>> works with spark-1.3.1
>>>>>>
>>>>>> $ PYSPARK_PYTHON=/usr/bin/pypy
>>>>>> ~/Tool/spark-1.3.1-bin-hadoop2.6/bin/pyspark
>>>>>> Python 2.7.3 (2.2.1+dfsg-1ubuntu0.3, Sep 30 2015, 15:18:40)
>>>>>> [PyPy 2.2.1 with GCC 4.8.4] on linux2
>>>>>> Type "help", "copyright", "credits" or "license" for more information.
>>>>>> 15/11/05 15:50:30 WARN Utils: Your hostname, xxxxxx resolves to a
>>>>>> loopback address: 127.0.1.1; using xxx.xxx.xxx.xxx instead (on interface
>>>>>> eth0)
>>>>>> 15/11/05 15:50:30 WARN Utils: Set SPARK_LOCAL_IP if you need to bind
>>>>>> to another address
>>>>>> 15/11/05 15:50:31 WARN NativeCodeLoader: Unable to load native-hadoop
>>>>>> library for your platform... using builtin-java classes where applicable
>>>>>> Welcome to
>>>>>>       ____              __
>>>>>>      / __/__  ___ _____/ /__
>>>>>>     _\ \/ _ \/ _ `/ __/  '_/
>>>>>>    /__ / .__/\_,_/_/ /_/\_\   version 1.3.1
>>>>>>       /_/
>>>>>>
>>>>>> Using Python version 2.7.3 (2.2.1+dfsg-1ubuntu0.3, Sep 30 2015)
>>>>>> SparkContext available as sc, HiveContext available as sqlContext.
>>>>>> And now for something completely different: ``Armin: "Prolog is a
>>>>>> mess.", CF:
>>>>>> "No, it's very cool!", Armin: "Isn't this what I said?"''
>>>>>> >>>
>>>>>>
>>>>>> error message for 1.5.1
>>>>>>
>>>>>> $ PYSPARK_PYTHON=/usr/bin/pypy
>>>>>> ~/Tool/spark-1.5.1-bin-hadoop2.6/bin/pyspark
>>>>>> Python 2.7.3 (2.2.1+dfsg-1ubuntu0.3, Sep 30 2015, 15:18:40)
>>>>>> [PyPy 2.2.1 with GCC 4.8.4] on linux2
>>>>>> Type "help", "copyright", "credits" or "license" for more information.
>>>>>> Traceback (most recent call last):
>>>>>>   File "app_main.py", line 72, in run_toplevel
>>>>>>   File "app_main.py", line 614, in run_it
>>>>>>   File
>>>>>> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/shell.py", line
>>>>>> 30, in <module>
>>>>>>     import pyspark
>>>>>>   File
>>>>>> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/__init__.py",
>>>>>> line 41, in <module>
>>>>>>     from pyspark.context import SparkContext
>>>>>>   File
>>>>>> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/context.py",
>>>>>> line 26, in <module>
>>>>>>     from pyspark import accumulators
>>>>>>   File
>>>>>> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/accumulators.py",
>>>>>> line 98, in <module>
>>>>>>     from pyspark.serializers import read_int, PickleSerializer
>>>>>>   File
>>>>>> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/serializers.py",
>>>>>> line 400, in <module>
>>>>>>     _hijack_namedtuple()
>>>>>>   File
>>>>>> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/serializers.py",
>>>>>> line 378, in _hijack_namedtuple
>>>>>>     _old_namedtuple = _copy_func(collections.namedtuple)
>>>>>>   File
>>>>>> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/serializers.py",
>>>>>> line 376, in _copy_func
>>>>>>     f.__defaults__, f.__closure__)
>>>>>> AttributeError: 'function' object has no attribute '__closure__'
>>>>>> And now for something completely different: ``the traces don't lie''
>>>>>>
>>>>>> is this a known issue? any suggestion to resolve it? or how can I help
>>>>>> to fix this problem?
>>>>>>
>>>>>> Thanks.
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> -- 張雅軒
>>>
>>>
>>>
>>>
>>> --
>>> -- 張雅軒
>
>
>
>
> --
> -- 張雅軒

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: pyspark with pypy not work for spark-1.5.1

Posted by Chang Ya-Hsuan <su...@gmail.com>.

Hi I run ./python/ru-tests to test following modules of spark-1.5.1:

[pyspark-core', 'pyspark-ml', 'pyspark-mllib', 'pyspark-sql',
'pyspark-streaming]

against to following pypy versions:

pypy-2.2.1  pypy-2.3  pypy-2.3.1  pypy-2.4.0  pypy-2.5.0  pypy-2.5.1
 pypy-2.6.0  pypy-2.6.1  pypy-4.0.0

except pypy-2.2.1, all others pass the test.

the error message of pypy-2.2.1 is:

Traceback (most recent call last):
  File "app_main.py", line 72, in run_toplevel
  File "/home/yahsuan/.pyenv/versions/pypy-2.2.1/lib-python/2.7/runpy.py",
line 151, in _run_module_as_main
    mod_name, loader, code, fname = _get_module_details(mod_name)
  File "/home/yahsuan/.pyenv/versions/pypy-2.2.1/lib-python/2.7/runpy.py",
line 101, in _get_module_details
    loader = get_loader(mod_name)
  File
"/home/yahsuan/.pyenv/versions/pypy-2.2.1/lib-python/2.7/pkgutil.py", line
465, in get_loader
    return find_loader(fullname)
  File
"/home/yahsuan/.pyenv/versions/pypy-2.2.1/lib-python/2.7/pkgutil.py", line
475, in find_loader
    for importer in iter_importers(fullname):
  File
"/home/yahsuan/.pyenv/versions/pypy-2.2.1/lib-python/2.7/pkgutil.py", line
431, in iter_importers
    __import__(pkg)
  File "pyspark/__init__.py", line 41, in <module>
    from pyspark.context import SparkContext
  File "pyspark/context.py", line 26, in <module>
    from pyspark import accumulators
  File "pyspark/accumulators.py", line 98, in <module>
    from pyspark.serializers import read_int, PickleSerializer
  File "pyspark/serializers.py", line 400, in <module>
    _hijack_namedtuple()
  File "pyspark/serializers.py", line 378, in _hijack_namedtuple
    _old_namedtuple = _copy_func(collections.namedtuple)
  File "pyspark/serializers.py", line 376, in _copy_func
    f.__defaults__, f.__closure__)
AttributeError: 'function' object has no attribute '__closure__'

p.s. would you want to test different pypy versions on your Jenkins? maybe
I could help

On Fri, Nov 6, 2015 at 2:23 AM, Josh Rosen <jo...@databricks.com> wrote:

> You could try running PySpark's own unit tests. Try ./python/run-tests
> --help for instructions.
>
> On Thu, Nov 5, 2015 at 12:31 AM Chang Ya-Hsuan <su...@gmail.com> wrote:
>
>> I've test on following pypy version against to spark-1.5.1
>>
>>   pypy-2.2.1
>>   pypy-2.3
>>   pypy-2.3.1
>>   pypy-2.4.0
>>   pypy-2.5.0
>>   pypy-2.5.1
>>   pypy-2.6.0
>>   pypy-2.6.1
>>
>> I run
>>
>>     $ PYSPARK_PYTHON=/path/to/pypy-xx.xx/bin/pypy
>> /path/to/spark-1.5.1/bin/pyspark
>>
>> and only pypy-2.2.1 failed.
>>
>> Any suggestion to run advanced test?
>>
>> On Thu, Nov 5, 2015 at 4:14 PM, Chang Ya-Hsuan <su...@gmail.com>
>> wrote:
>>
>>> Thanks for your quickly reply.
>>>
>>> I will test several pypy versions and report the result later.
>>>
>>> On Thu, Nov 5, 2015 at 4:06 PM, Josh Rosen <ro...@gmail.com> wrote:
>>>
>>>> I noticed that you're using PyPy 2.2.1, but it looks like Spark 1.5.1's
>>>> docs say that we only support PyPy 2.3+. Could you try using a newer PyPy
>>>> version to see if that works?
>>>>
>>>> I just checked and it looks like our Jenkins tests are running against
>>>> PyPy 2.5.1, so that version is known to work. I'm not sure what the actual
>>>> minimum supported PyPy version is. Would you be interested in helping to
>>>> investigate so that we can update the documentation or produce a fix to
>>>> restore compatibility with earlier PyPy builds?
>>>>
>>>> On Wed, Nov 4, 2015 at 11:56 PM, Chang Ya-Hsuan <su...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I am trying to run pyspark with pypy, and it is work when using
>>>>> spark-1.3.1 but failed when using spark-1.4.1 and spark-1.5.1
>>>>>
>>>>> my pypy version:
>>>>>
>>>>> $ /usr/bin/pypy --version
>>>>> Python 2.7.3 (2.2.1+dfsg-1ubuntu0.3, Sep 30 2015, 15:18:40)
>>>>> [PyPy 2.2.1 with GCC 4.8.4]
>>>>>
>>>>> works with spark-1.3.1
>>>>>
>>>>> $ PYSPARK_PYTHON=/usr/bin/pypy
>>>>> ~/Tool/spark-1.3.1-bin-hadoop2.6/bin/pyspark
>>>>> Python 2.7.3 (2.2.1+dfsg-1ubuntu0.3, Sep 30 2015, 15:18:40)
>>>>> [PyPy 2.2.1 with GCC 4.8.4] on linux2
>>>>> Type "help", "copyright", "credits" or "license" for more information.
>>>>> 15/11/05 15:50:30 WARN Utils: Your hostname, xxxxxx resolves to a
>>>>> loopback address: 127.0.1.1; using xxx.xxx.xxx.xxx instead (on interface
>>>>> eth0)
>>>>> 15/11/05 15:50:30 WARN Utils: Set SPARK_LOCAL_IP if you need to bind
>>>>> to another address
>>>>> 15/11/05 15:50:31 WARN NativeCodeLoader: Unable to load native-hadoop
>>>>> library for your platform... using builtin-java classes where applicable
>>>>> Welcome to
>>>>>       ____              __
>>>>>      / __/__  ___ _____/ /__
>>>>>     _\ \/ _ \/ _ `/ __/  '_/
>>>>>    /__ / .__/\_,_/_/ /_/\_\   version 1.3.1
>>>>>       /_/
>>>>>
>>>>> Using Python version 2.7.3 (2.2.1+dfsg-1ubuntu0.3, Sep 30 2015)
>>>>> SparkContext available as sc, HiveContext available as sqlContext.
>>>>> And now for something completely different: ``Armin: "Prolog is a
>>>>> mess.", CF:
>>>>> "No, it's very cool!", Armin: "Isn't this what I said?"''
>>>>> >>>
>>>>>
>>>>> error message for 1.5.1
>>>>>
>>>>> $ PYSPARK_PYTHON=/usr/bin/pypy
>>>>> ~/Tool/spark-1.5.1-bin-hadoop2.6/bin/pyspark
>>>>> Python 2.7.3 (2.2.1+dfsg-1ubuntu0.3, Sep 30 2015, 15:18:40)
>>>>> [PyPy 2.2.1 with GCC 4.8.4] on linux2
>>>>> Type "help", "copyright", "credits" or "license" for more information.
>>>>> Traceback (most recent call last):
>>>>>   File "app_main.py", line 72, in run_toplevel
>>>>>   File "app_main.py", line 614, in run_it
>>>>>   File
>>>>> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/shell.py",
>>>>> line 30, in <module>
>>>>>     import pyspark
>>>>>   File
>>>>> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/__init__.py",
>>>>> line 41, in <module>
>>>>>     from pyspark.context import SparkContext
>>>>>   File
>>>>> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/context.py",
>>>>> line 26, in <module>
>>>>>     from pyspark import accumulators
>>>>>   File
>>>>> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/accumulators.py",
>>>>> line 98, in <module>
>>>>>     from pyspark.serializers import read_int, PickleSerializer
>>>>>   File
>>>>> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/serializers.py",
>>>>> line 400, in <module>
>>>>>     _hijack_namedtuple()
>>>>>   File
>>>>> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/serializers.py",
>>>>> line 378, in _hijack_namedtuple
>>>>>     _old_namedtuple = _copy_func(collections.namedtuple)
>>>>>   File
>>>>> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/serializers.py",
>>>>> line 376, in _copy_func
>>>>>     f.__defaults__, f.__closure__)
>>>>> AttributeError: 'function' object has no attribute '__closure__'
>>>>> And now for something completely different: ``the traces don't lie''
>>>>>
>>>>> is this a known issue? any suggestion to resolve it? or how can I help
>>>>> to fix this problem?
>>>>>
>>>>> Thanks.
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> -- 張雅軒
>>>
>>
>>
>>
>> --
>> -- 張雅軒
>>
>


-- 
-- 張雅軒

Re: pyspark with pypy not work for spark-1.5.1

Posted by Josh Rosen <jo...@databricks.com>.

You could try running PySpark's own unit tests. Try ./python/run-tests
--help for instructions.

On Thu, Nov 5, 2015 at 12:31 AM Chang Ya-Hsuan <su...@gmail.com> wrote:

> I've test on following pypy version against to spark-1.5.1
>
>   pypy-2.2.1
>   pypy-2.3
>   pypy-2.3.1
>   pypy-2.4.0
>   pypy-2.5.0
>   pypy-2.5.1
>   pypy-2.6.0
>   pypy-2.6.1
>
> I run
>
>     $ PYSPARK_PYTHON=/path/to/pypy-xx.xx/bin/pypy
> /path/to/spark-1.5.1/bin/pyspark
>
> and only pypy-2.2.1 failed.
>
> Any suggestion to run advanced test?
>
> On Thu, Nov 5, 2015 at 4:14 PM, Chang Ya-Hsuan <su...@gmail.com> wrote:
>
>> Thanks for your quickly reply.
>>
>> I will test several pypy versions and report the result later.
>>
>> On Thu, Nov 5, 2015 at 4:06 PM, Josh Rosen <ro...@gmail.com> wrote:
>>
>>> I noticed that you're using PyPy 2.2.1, but it looks like Spark 1.5.1's
>>> docs say that we only support PyPy 2.3+. Could you try using a newer PyPy
>>> version to see if that works?
>>>
>>> I just checked and it looks like our Jenkins tests are running against
>>> PyPy 2.5.1, so that version is known to work. I'm not sure what the actual
>>> minimum supported PyPy version is. Would you be interested in helping to
>>> investigate so that we can update the documentation or produce a fix to
>>> restore compatibility with earlier PyPy builds?
>>>
>>> On Wed, Nov 4, 2015 at 11:56 PM, Chang Ya-Hsuan <su...@gmail.com>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I am trying to run pyspark with pypy, and it is work when using
>>>> spark-1.3.1 but failed when using spark-1.4.1 and spark-1.5.1
>>>>
>>>> my pypy version:
>>>>
>>>> $ /usr/bin/pypy --version
>>>> Python 2.7.3 (2.2.1+dfsg-1ubuntu0.3, Sep 30 2015, 15:18:40)
>>>> [PyPy 2.2.1 with GCC 4.8.4]
>>>>
>>>> works with spark-1.3.1
>>>>
>>>> $ PYSPARK_PYTHON=/usr/bin/pypy
>>>> ~/Tool/spark-1.3.1-bin-hadoop2.6/bin/pyspark
>>>> Python 2.7.3 (2.2.1+dfsg-1ubuntu0.3, Sep 30 2015, 15:18:40)
>>>> [PyPy 2.2.1 with GCC 4.8.4] on linux2
>>>> Type "help", "copyright", "credits" or "license" for more information.
>>>> 15/11/05 15:50:30 WARN Utils: Your hostname, xxxxxx resolves to a
>>>> loopback address: 127.0.1.1; using xxx.xxx.xxx.xxx instead (on interface
>>>> eth0)
>>>> 15/11/05 15:50:30 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to
>>>> another address
>>>> 15/11/05 15:50:31 WARN NativeCodeLoader: Unable to load native-hadoop
>>>> library for your platform... using builtin-java classes where applicable
>>>> Welcome to
>>>>       ____              __
>>>>      / __/__  ___ _____/ /__
>>>>     _\ \/ _ \/ _ `/ __/  '_/
>>>>    /__ / .__/\_,_/_/ /_/\_\   version 1.3.1
>>>>       /_/
>>>>
>>>> Using Python version 2.7.3 (2.2.1+dfsg-1ubuntu0.3, Sep 30 2015)
>>>> SparkContext available as sc, HiveContext available as sqlContext.
>>>> And now for something completely different: ``Armin: "Prolog is a
>>>> mess.", CF:
>>>> "No, it's very cool!", Armin: "Isn't this what I said?"''
>>>> >>>
>>>>
>>>> error message for 1.5.1
>>>>
>>>> $ PYSPARK_PYTHON=/usr/bin/pypy
>>>> ~/Tool/spark-1.5.1-bin-hadoop2.6/bin/pyspark
>>>> Python 2.7.3 (2.2.1+dfsg-1ubuntu0.3, Sep 30 2015, 15:18:40)
>>>> [PyPy 2.2.1 with GCC 4.8.4] on linux2
>>>> Type "help", "copyright", "credits" or "license" for more information.
>>>> Traceback (most recent call last):
>>>>   File "app_main.py", line 72, in run_toplevel
>>>>   File "app_main.py", line 614, in run_it
>>>>   File
>>>> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/shell.py",
>>>> line 30, in <module>
>>>>     import pyspark
>>>>   File
>>>> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/__init__.py",
>>>> line 41, in <module>
>>>>     from pyspark.context import SparkContext
>>>>   File
>>>> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/context.py",
>>>> line 26, in <module>
>>>>     from pyspark import accumulators
>>>>   File
>>>> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/accumulators.py",
>>>> line 98, in <module>
>>>>     from pyspark.serializers import read_int, PickleSerializer
>>>>   File
>>>> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/serializers.py",
>>>> line 400, in <module>
>>>>     _hijack_namedtuple()
>>>>   File
>>>> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/serializers.py",
>>>> line 378, in _hijack_namedtuple
>>>>     _old_namedtuple = _copy_func(collections.namedtuple)
>>>>   File
>>>> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/serializers.py",
>>>> line 376, in _copy_func
>>>>     f.__defaults__, f.__closure__)
>>>> AttributeError: 'function' object has no attribute '__closure__'
>>>> And now for something completely different: ``the traces don't lie''
>>>>
>>>> is this a known issue? any suggestion to resolve it? or how can I help
>>>> to fix this problem?
>>>>
>>>> Thanks.
>>>>
>>>
>>>
>>
>>
>> --
>> -- 張雅軒
>>
>
>
>
> --
> -- 張雅軒
>

Re: pyspark with pypy not work for spark-1.5.1

Posted by Chang Ya-Hsuan <su...@gmail.com>.

I've test on following pypy version against to spark-1.5.1

  pypy-2.2.1
  pypy-2.3
  pypy-2.3.1
  pypy-2.4.0
  pypy-2.5.0
  pypy-2.5.1
  pypy-2.6.0
  pypy-2.6.1

I run

    $ PYSPARK_PYTHON=/path/to/pypy-xx.xx/bin/pypy
/path/to/spark-1.5.1/bin/pyspark

and only pypy-2.2.1 failed.

Any suggestion to run advanced test?

On Thu, Nov 5, 2015 at 4:14 PM, Chang Ya-Hsuan <su...@gmail.com> wrote:

> Thanks for your quickly reply.
>
> I will test several pypy versions and report the result later.
>
> On Thu, Nov 5, 2015 at 4:06 PM, Josh Rosen <ro...@gmail.com> wrote:
>
>> I noticed that you're using PyPy 2.2.1, but it looks like Spark 1.5.1's
>> docs say that we only support PyPy 2.3+. Could you try using a newer PyPy
>> version to see if that works?
>>
>> I just checked and it looks like our Jenkins tests are running against
>> PyPy 2.5.1, so that version is known to work. I'm not sure what the actual
>> minimum supported PyPy version is. Would you be interested in helping to
>> investigate so that we can update the documentation or produce a fix to
>> restore compatibility with earlier PyPy builds?
>>
>> On Wed, Nov 4, 2015 at 11:56 PM, Chang Ya-Hsuan <su...@gmail.com>
>> wrote:
>>
>>> Hi all,
>>>
>>> I am trying to run pyspark with pypy, and it is work when using
>>> spark-1.3.1 but failed when using spark-1.4.1 and spark-1.5.1
>>>
>>> my pypy version:
>>>
>>> $ /usr/bin/pypy --version
>>> Python 2.7.3 (2.2.1+dfsg-1ubuntu0.3, Sep 30 2015, 15:18:40)
>>> [PyPy 2.2.1 with GCC 4.8.4]
>>>
>>> works with spark-1.3.1
>>>
>>> $ PYSPARK_PYTHON=/usr/bin/pypy
>>> ~/Tool/spark-1.3.1-bin-hadoop2.6/bin/pyspark
>>> Python 2.7.3 (2.2.1+dfsg-1ubuntu0.3, Sep 30 2015, 15:18:40)
>>> [PyPy 2.2.1 with GCC 4.8.4] on linux2
>>> Type "help", "copyright", "credits" or "license" for more information.
>>> 15/11/05 15:50:30 WARN Utils: Your hostname, xxxxxx resolves to a
>>> loopback address: 127.0.1.1; using xxx.xxx.xxx.xxx instead (on interface
>>> eth0)
>>> 15/11/05 15:50:30 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to
>>> another address
>>> 15/11/05 15:50:31 WARN NativeCodeLoader: Unable to load native-hadoop
>>> library for your platform... using builtin-java classes where applicable
>>> Welcome to
>>>       ____              __
>>>      / __/__  ___ _____/ /__
>>>     _\ \/ _ \/ _ `/ __/  '_/
>>>    /__ / .__/\_,_/_/ /_/\_\   version 1.3.1
>>>       /_/
>>>
>>> Using Python version 2.7.3 (2.2.1+dfsg-1ubuntu0.3, Sep 30 2015)
>>> SparkContext available as sc, HiveContext available as sqlContext.
>>> And now for something completely different: ``Armin: "Prolog is a
>>> mess.", CF:
>>> "No, it's very cool!", Armin: "Isn't this what I said?"''
>>> >>>
>>>
>>> error message for 1.5.1
>>>
>>> $ PYSPARK_PYTHON=/usr/bin/pypy
>>> ~/Tool/spark-1.5.1-bin-hadoop2.6/bin/pyspark
>>> Python 2.7.3 (2.2.1+dfsg-1ubuntu0.3, Sep 30 2015, 15:18:40)
>>> [PyPy 2.2.1 with GCC 4.8.4] on linux2
>>> Type "help", "copyright", "credits" or "license" for more information.
>>> Traceback (most recent call last):
>>>   File "app_main.py", line 72, in run_toplevel
>>>   File "app_main.py", line 614, in run_it
>>>   File
>>> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/shell.py",
>>> line 30, in <module>
>>>     import pyspark
>>>   File
>>> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/__init__.py",
>>> line 41, in <module>
>>>     from pyspark.context import SparkContext
>>>   File
>>> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/context.py",
>>> line 26, in <module>
>>>     from pyspark import accumulators
>>>   File
>>> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/accumulators.py",
>>> line 98, in <module>
>>>     from pyspark.serializers import read_int, PickleSerializer
>>>   File
>>> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/serializers.py",
>>> line 400, in <module>
>>>     _hijack_namedtuple()
>>>   File
>>> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/serializers.py",
>>> line 378, in _hijack_namedtuple
>>>     _old_namedtuple = _copy_func(collections.namedtuple)
>>>   File
>>> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/serializers.py",
>>> line 376, in _copy_func
>>>     f.__defaults__, f.__closure__)
>>> AttributeError: 'function' object has no attribute '__closure__'
>>> And now for something completely different: ``the traces don't lie''
>>>
>>> is this a known issue? any suggestion to resolve it? or how can I help
>>> to fix this problem?
>>>
>>> Thanks.
>>>
>>
>>
>
>
> --
> -- 張雅軒
>



-- 
-- 張雅軒

Re: pyspark with pypy not work for spark-1.5.1

Posted by Chang Ya-Hsuan <su...@gmail.com>.

Thanks for your quickly reply.

I will test several pypy versions and report the result later.

On Thu, Nov 5, 2015 at 4:06 PM, Josh Rosen <ro...@gmail.com> wrote:

> I noticed that you're using PyPy 2.2.1, but it looks like Spark 1.5.1's
> docs say that we only support PyPy 2.3+. Could you try using a newer PyPy
> version to see if that works?
>
> I just checked and it looks like our Jenkins tests are running against
> PyPy 2.5.1, so that version is known to work. I'm not sure what the actual
> minimum supported PyPy version is. Would you be interested in helping to
> investigate so that we can update the documentation or produce a fix to
> restore compatibility with earlier PyPy builds?
>
> On Wed, Nov 4, 2015 at 11:56 PM, Chang Ya-Hsuan <su...@gmail.com>
> wrote:
>
>> Hi all,
>>
>> I am trying to run pyspark with pypy, and it is work when using
>> spark-1.3.1 but failed when using spark-1.4.1 and spark-1.5.1
>>
>> my pypy version:
>>
>> $ /usr/bin/pypy --version
>> Python 2.7.3 (2.2.1+dfsg-1ubuntu0.3, Sep 30 2015, 15:18:40)
>> [PyPy 2.2.1 with GCC 4.8.4]
>>
>> works with spark-1.3.1
>>
>> $ PYSPARK_PYTHON=/usr/bin/pypy
>> ~/Tool/spark-1.3.1-bin-hadoop2.6/bin/pyspark
>> Python 2.7.3 (2.2.1+dfsg-1ubuntu0.3, Sep 30 2015, 15:18:40)
>> [PyPy 2.2.1 with GCC 4.8.4] on linux2
>> Type "help", "copyright", "credits" or "license" for more information.
>> 15/11/05 15:50:30 WARN Utils: Your hostname, xxxxxx resolves to a
>> loopback address: 127.0.1.1; using xxx.xxx.xxx.xxx instead (on interface
>> eth0)
>> 15/11/05 15:50:30 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to
>> another address
>> 15/11/05 15:50:31 WARN NativeCodeLoader: Unable to load native-hadoop
>> library for your platform... using builtin-java classes where applicable
>> Welcome to
>>       ____              __
>>      / __/__  ___ _____/ /__
>>     _\ \/ _ \/ _ `/ __/  '_/
>>    /__ / .__/\_,_/_/ /_/\_\   version 1.3.1
>>       /_/
>>
>> Using Python version 2.7.3 (2.2.1+dfsg-1ubuntu0.3, Sep 30 2015)
>> SparkContext available as sc, HiveContext available as sqlContext.
>> And now for something completely different: ``Armin: "Prolog is a mess.",
>> CF:
>> "No, it's very cool!", Armin: "Isn't this what I said?"''
>> >>>
>>
>> error message for 1.5.1
>>
>> $ PYSPARK_PYTHON=/usr/bin/pypy
>> ~/Tool/spark-1.5.1-bin-hadoop2.6/bin/pyspark
>> Python 2.7.3 (2.2.1+dfsg-1ubuntu0.3, Sep 30 2015, 15:18:40)
>> [PyPy 2.2.1 with GCC 4.8.4] on linux2
>> Type "help", "copyright", "credits" or "license" for more information.
>> Traceback (most recent call last):
>>   File "app_main.py", line 72, in run_toplevel
>>   File "app_main.py", line 614, in run_it
>>   File
>> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/shell.py",
>> line 30, in <module>
>>     import pyspark
>>   File
>> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/__init__.py",
>> line 41, in <module>
>>     from pyspark.context import SparkContext
>>   File
>> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/context.py",
>> line 26, in <module>
>>     from pyspark import accumulators
>>   File
>> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/accumulators.py",
>> line 98, in <module>
>>     from pyspark.serializers import read_int, PickleSerializer
>>   File
>> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/serializers.py",
>> line 400, in <module>
>>     _hijack_namedtuple()
>>   File
>> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/serializers.py",
>> line 378, in _hijack_namedtuple
>>     _old_namedtuple = _copy_func(collections.namedtuple)
>>   File
>> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/serializers.py",
>> line 376, in _copy_func
>>     f.__defaults__, f.__closure__)
>> AttributeError: 'function' object has no attribute '__closure__'
>> And now for something completely different: ``the traces don't lie''
>>
>> is this a known issue? any suggestion to resolve it? or how can I help to
>> fix this problem?
>>
>> Thanks.
>>
>
>


-- 
-- 張雅軒

Re: pyspark with pypy not work for spark-1.5.1

Posted by Josh Rosen <ro...@gmail.com>.

I noticed that you're using PyPy 2.2.1, but it looks like Spark 1.5.1's
docs say that we only support PyPy 2.3+. Could you try using a newer PyPy
version to see if that works?

I just checked and it looks like our Jenkins tests are running against PyPy
2.5.1, so that version is known to work. I'm not sure what the actual
minimum supported PyPy version is. Would you be interested in helping to
investigate so that we can update the documentation or produce a fix to
restore compatibility with earlier PyPy builds?

On Wed, Nov 4, 2015 at 11:56 PM, Chang Ya-Hsuan <su...@gmail.com> wrote:

> Hi all,
>
> I am trying to run pyspark with pypy, and it is work when using
> spark-1.3.1 but failed when using spark-1.4.1 and spark-1.5.1
>
> my pypy version:
>
> $ /usr/bin/pypy --version
> Python 2.7.3 (2.2.1+dfsg-1ubuntu0.3, Sep 30 2015, 15:18:40)
> [PyPy 2.2.1 with GCC 4.8.4]
>
> works with spark-1.3.1
>
> $ PYSPARK_PYTHON=/usr/bin/pypy
> ~/Tool/spark-1.3.1-bin-hadoop2.6/bin/pyspark
> Python 2.7.3 (2.2.1+dfsg-1ubuntu0.3, Sep 30 2015, 15:18:40)
> [PyPy 2.2.1 with GCC 4.8.4] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> 15/11/05 15:50:30 WARN Utils: Your hostname, xxxxxx resolves to a loopback
> address: 127.0.1.1; using xxx.xxx.xxx.xxx instead (on interface eth0)
> 15/11/05 15:50:30 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to
> another address
> 15/11/05 15:50:31 WARN NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> Welcome to
>       ____              __
>      / __/__  ___ _____/ /__
>     _\ \/ _ \/ _ `/ __/  '_/
>    /__ / .__/\_,_/_/ /_/\_\   version 1.3.1
>       /_/
>
> Using Python version 2.7.3 (2.2.1+dfsg-1ubuntu0.3, Sep 30 2015)
> SparkContext available as sc, HiveContext available as sqlContext.
> And now for something completely different: ``Armin: "Prolog is a mess.",
> CF:
> "No, it's very cool!", Armin: "Isn't this what I said?"''
> >>>
>
> error message for 1.5.1
>
> $ PYSPARK_PYTHON=/usr/bin/pypy
> ~/Tool/spark-1.5.1-bin-hadoop2.6/bin/pyspark
> Python 2.7.3 (2.2.1+dfsg-1ubuntu0.3, Sep 30 2015, 15:18:40)
> [PyPy 2.2.1 with GCC 4.8.4] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> Traceback (most recent call last):
>   File "app_main.py", line 72, in run_toplevel
>   File "app_main.py", line 614, in run_it
>   File
> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/shell.py",
> line 30, in <module>
>     import pyspark
>   File
> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/__init__.py",
> line 41, in <module>
>     from pyspark.context import SparkContext
>   File
> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/context.py",
> line 26, in <module>
>     from pyspark import accumulators
>   File
> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/accumulators.py",
> line 98, in <module>
>     from pyspark.serializers import read_int, PickleSerializer
>   File
> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/serializers.py",
> line 400, in <module>
>     _hijack_namedtuple()
>   File
> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/serializers.py",
> line 378, in _hijack_namedtuple
>     _old_namedtuple = _copy_func(collections.namedtuple)
>   File
> "/home/yahsuan/Tool/spark-1.5.1-bin-hadoop2.6/python/pyspark/serializers.py",
> line 376, in _copy_func
>     f.__defaults__, f.__closure__)
> AttributeError: 'function' object has no attribute '__closure__'
> And now for something completely different: ``the traces don't lie''
>
> is this a known issue? any suggestion to resolve it? or how can I help to
> fix this problem?
>
> Thanks.
>