You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2017/05/02 10:50:04 UTC

[jira] [Commented] (SPARK-19019) PySpark does not work with Python 3.6.0

    [ https://issues.apache.org/jira/browse/SPARK-19019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15992699#comment-15992699 ] 

Hyukjin Kwon commented on SPARK-19019:
--------------------------------------

To solve this problem fully, I had to port cloudpickle change too in the PR. Only fixing hijected one described above dose not fully solve this issue. Please refer the discussion in the PR and the change.

> PySpark does not work with Python 3.6.0
> ---------------------------------------
>
>                 Key: SPARK-19019
>                 URL: https://issues.apache.org/jira/browse/SPARK-19019
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>            Reporter: Hyukjin Kwon
>            Assignee: Hyukjin Kwon
>            Priority: Critical
>             Fix For: 1.6.4, 2.0.3, 2.1.1, 2.2.0
>
>
> Currently, PySpark does not work with Python 3.6.0.
> Running {{./bin/pyspark}} simply throws the error as below:
> {code}
> Traceback (most recent call last):
>   File ".../spark/python/pyspark/shell.py", line 30, in <module>
>     import pyspark
>   File ".../spark/python/pyspark/__init__.py", line 46, in <module>
>     from pyspark.context import SparkContext
>   File ".../spark/python/pyspark/context.py", line 36, in <module>
>     from pyspark.java_gateway import launch_gateway
>   File ".../spark/python/pyspark/java_gateway.py", line 31, in <module>
>     from py4j.java_gateway import java_import, JavaGateway, GatewayClient
>   File "<frozen importlib._bootstrap>", line 961, in _find_and_load
>   File "<frozen importlib._bootstrap>", line 950, in _find_and_load_unlocked
>   File "<frozen importlib._bootstrap>", line 646, in _load_unlocked
>   File "<frozen importlib._bootstrap>", line 616, in _load_backward_compatible
>   File ".../spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 18, in <module>
>   File "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pydoc.py", line 62, in <module>
>     import pkgutil
>   File "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pkgutil.py", line 22, in <module>
>     ModuleInfo = namedtuple('ModuleInfo', 'module_finder name ispkg')
>   File ".../spark/python/pyspark/serializers.py", line 394, in namedtuple
>     cls = _old_namedtuple(*args, **kwargs)
> TypeError: namedtuple() missing 3 required keyword-only arguments: 'verbose', 'rename', and 'module'
> {code}
> The problem is in https://github.com/apache/spark/blob/3c68944b229aaaeeaee3efcbae3e3be9a2914855/python/pyspark/serializers.py#L386-L394 as the error says and the cause seems because the arguments of {{namedtuple}} are now completely keyword-only arguments from Python 3.6.0 (See https://bugs.python.org/issue25628).
> We currently copy this function via {{types.FunctionType}} which does not set the default values of keyword-only arguments (meaning {{namedtuple.__kwdefaults__}}) and this seems causing internally missing values in the function (non-bound arguments).
> This ends up as below:
> {code}
> import types
> import collections
> def _copy_func(f):
>     return types.FunctionType(f.__code__, f.__globals__, f.__name__,
>         f.__defaults__, f.__closure__)
> _old_namedtuple = _copy_func(collections.namedtuple)
> _old_namedtuple(, "b")
> _old_namedtuple("a")
> {code}
> If we call as below:
> {code}
> >>> _old_namedtuple("a", "b")
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> TypeError: namedtuple() missing 3 required keyword-only arguments: 'verbose', 'rename', and 'module'
> {code}
> It throws an exception as above becuase {{__kwdefaults__}} for required keyword arguments seem unset in the copied function. So, if we give explicit value for these,
> {code}
> >>> _old_namedtuple("a", "b", verbose=False, rename=False, module=None)
> <class '__main__.a'>
> {code}
> It works fine.
> It seems now we should properly set these into the hijected one.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org