You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2016/12/29 02:40:58 UTC
[jira] [Created] (SPARK-19019) PySpark does not work with Python 3.6.0

Hyukjin Kwon created SPARK-19019:
------------------------------------

             Summary: PySpark does not work with Python 3.6.0
                 Key: SPARK-19019
                 URL: https://issues.apache.org/jira/browse/SPARK-19019
             Project: Spark
          Issue Type: Bug
          Components: PySpark
            Reporter: Hyukjin Kwon
            Priority: Critical


Currently, PySpark does not work with Python 3.6.0.

Running {{./bin/pyspark}} simply throws the error as below:

{code}
Traceback (most recent call last):
  File ".../spark/python/pyspark/shell.py", line 30, in <module>
    import pyspark
  File ".../spark/python/pyspark/__init__.py", line 46, in <module>
    from pyspark.context import SparkContext
  File ".../spark/python/pyspark/context.py", line 36, in <module>
    from pyspark.java_gateway import launch_gateway
  File ".../spark/python/pyspark/java_gateway.py", line 31, in <module>
    from py4j.java_gateway import java_import, JavaGateway, GatewayClient
  File "<frozen importlib._bootstrap>", line 961, in _find_and_load
  File "<frozen importlib._bootstrap>", line 950, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 646, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 616, in _load_backward_compatible
  File ".../spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 18, in <module>
  File "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pydoc.py", line 62, in <module>
    import pkgutil
  File "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pkgutil.py", line 22, in <module>
    ModuleInfo = namedtuple('ModuleInfo', 'module_finder name ispkg')
  File ".../spark/python/pyspark/serializers.py", line 394, in namedtuple
    cls = _old_namedtuple(*args, **kwargs)
TypeError: namedtuple() missing 3 required keyword-only arguments: 'verbose', 'rename', and 'module'
{code}

The problem is in https://github.com/apache/spark/blob/3c68944b229aaaeeaee3efcbae3e3be9a2914855/python/pyspark/serializers.py#L386-L394 as the error says and the cause seems because the arguments of {{namedtuple}} are now completely keyword-only arguments from Python 3.6.0 (See https://bugs.python.org/issue25628).

We currently copy this function via {{types.FunctionType}} which does not set the default values of keyword-only arguments (meaning {{namedtuple.__kwdefaults__}}) and this seems causing internally missing values in the function (non-bound arguments).


This ends up as below:

{code}
import types
import collections

def _copy_func(f):
    return types.FunctionType(f.__code__, f.__globals__, f.__name__,
        f.__defaults__, f.__closure__)

_old_namedtuple = _copy_func(collections.namedtuple)

_old_namedtuple(, "b")
_old_namedtuple("a")
{code}


If we call as below:

{code}
>>> _old_namedtuple("a", "b")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: namedtuple() missing 3 required keyword-only arguments: 'verbose', 'rename', and 'module'
{code}


It throws an exception as above becuase {{__kwdefaults__}} for required keyword arguments seem unset in the copied function. So, if we give explicit value for these,

{code}
>>> _old_namedtuple("a", "b", verbose=False, rename=False, module=None)
<class '__main__.a'>
{code}

It works fine.

It seems now we should properly set these into the hijected one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org