You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (Jira)" <ji...@apache.org> on 2019/10/11 06:25:00 UTC
[jira] [Commented] (SPARK-29041) Allow createDataFrame to accept
bytes as binary type
[ https://issues.apache.org/jira/browse/SPARK-29041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16949169#comment-16949169 ]
Hyukjin Kwon commented on SPARK-29041:
--------------------------------------
It was discussed to not backport. See the discussion in the PR itself.
> Allow createDataFrame to accept bytes as binary type
> ----------------------------------------------------
>
> Key: SPARK-29041
> URL: https://issues.apache.org/jira/browse/SPARK-29041
> Project: Spark
> Issue Type: Bug
> Components: PySpark
> Affects Versions: 2.4.4, 3.0.0
> Reporter: Hyukjin Kwon
> Assignee: Hyukjin Kwon
> Priority: Major
> Fix For: 3.0.0
>
>
> {code}
> spark.createDataFrame([[b"abcd"]], "col binary")
> {code}
> simply fails as below:
> in Python 3
> {code}
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File "/.../spark/python/pyspark/sql/session.py", line 787, in createDataFrame
> rdd, schema = self._createFromLocal(map(prepare, data), schema)
> File "/.../spark/python/pyspark/sql/session.py", line 442, in _createFromLocal
> data = list(data)
> File "/.../spark/python/pyspark/sql/session.py", line 769, in prepare
> verify_func(obj)
> File "/.../forked/spark/python/pyspark/sql/types.py", line 1403, in verify
> verify_value(obj)
> File "/.../spark/python/pyspark/sql/types.py", line 1384, in verify_struct
> verifier(v)
> File "/.../spark/python/pyspark/sql/types.py", line 1403, in verify
> verify_value(obj)
> File "/.../spark/python/pyspark/sql/types.py", line 1397, in verify_default
> verify_acceptable_types(obj)
> File "/.../spark/python/pyspark/sql/types.py", line 1282, in verify_acceptable_types
> % (dataType, obj, type(obj))))
> TypeError: field col: BinaryType can not accept object b'abcd' in type <class 'bytes'>
> {code}
> in Python 2:
> {code}
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File "/.../spark/python/pyspark/sql/session.py", line 787, in createDataFrame
> rdd, schema = self._createFromLocal(map(prepare, data), schema)
> File "/.../spark/python/pyspark/sql/session.py", line 442, in _createFromLocal
> data = list(data)
> File "/.../spark/python/pyspark/sql/session.py", line 769, in prepare
> verify_func(obj)
> File "/.../spark/python/pyspark/sql/types.py", line 1403, in verify
> verify_value(obj)
> File "/.../spark/python/pyspark/sql/types.py", line 1384, in verify_struct
> verifier(v)
> File "/.../spark/python/pyspark/sql/types.py", line 1403, in verify
> verify_value(obj)
> File "/.../spark/python/pyspark/sql/types.py", line 1397, in verify_default
> verify_acceptable_types(obj)
> File "/.../spark/python/pyspark/sql/types.py", line 1282, in verify_acceptable_types
> % (dataType, obj, type(obj))))
> TypeError: field col: BinaryType can not accept object 'abcd' in type <type 'str'>
> {code}
> {{bytes}} should also be able to accepted as binary type
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org