You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (Jira)" <ji...@apache.org> on 2019/10/11 06:25:00 UTC

[jira] [Commented] (SPARK-29041) Allow createDataFrame to accept bytes as binary type

    [ https://issues.apache.org/jira/browse/SPARK-29041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16949169#comment-16949169 ] 

Hyukjin Kwon commented on SPARK-29041:
--------------------------------------

It was discussed to not backport. See the discussion in the PR itself.


> Allow createDataFrame to accept bytes as binary type
> ----------------------------------------------------
>
>                 Key: SPARK-29041
>                 URL: https://issues.apache.org/jira/browse/SPARK-29041
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 2.4.4, 3.0.0
>            Reporter: Hyukjin Kwon
>            Assignee: Hyukjin Kwon
>            Priority: Major
>             Fix For: 3.0.0
>
>
> {code}
> spark.createDataFrame([[b"abcd"]], "col binary")
> {code}
> simply fails as below:
> in Python 3
> {code}
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/.../spark/python/pyspark/sql/session.py", line 787, in createDataFrame
>     rdd, schema = self._createFromLocal(map(prepare, data), schema)
>   File "/.../spark/python/pyspark/sql/session.py", line 442, in _createFromLocal
>     data = list(data)
>   File "/.../spark/python/pyspark/sql/session.py", line 769, in prepare
>     verify_func(obj)
>   File "/.../forked/spark/python/pyspark/sql/types.py", line 1403, in verify
>     verify_value(obj)
>   File "/.../spark/python/pyspark/sql/types.py", line 1384, in verify_struct
>     verifier(v)
>   File "/.../spark/python/pyspark/sql/types.py", line 1403, in verify
>     verify_value(obj)
>   File "/.../spark/python/pyspark/sql/types.py", line 1397, in verify_default
>     verify_acceptable_types(obj)
>   File "/.../spark/python/pyspark/sql/types.py", line 1282, in verify_acceptable_types
>     % (dataType, obj, type(obj))))
> TypeError: field col: BinaryType can not accept object b'abcd' in type <class 'bytes'>
> {code}
> in Python 2:
> {code}
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/.../spark/python/pyspark/sql/session.py", line 787, in createDataFrame
>     rdd, schema = self._createFromLocal(map(prepare, data), schema)
>   File "/.../spark/python/pyspark/sql/session.py", line 442, in _createFromLocal
>     data = list(data)
>   File "/.../spark/python/pyspark/sql/session.py", line 769, in prepare
>     verify_func(obj)
>   File "/.../spark/python/pyspark/sql/types.py", line 1403, in verify
>     verify_value(obj)
>   File "/.../spark/python/pyspark/sql/types.py", line 1384, in verify_struct
>     verifier(v)
>   File "/.../spark/python/pyspark/sql/types.py", line 1403, in verify
>     verify_value(obj)
>   File "/.../spark/python/pyspark/sql/types.py", line 1397, in verify_default
>     verify_acceptable_types(obj)
>   File "/.../spark/python/pyspark/sql/types.py", line 1282, in verify_acceptable_types
>     % (dataType, obj, type(obj))))
> TypeError: field col: BinaryType can not accept object 'abcd' in type <type 'str'>
> {code}
> {{bytes}} should also be able to accepted as binary type



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org