You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2018/10/06 09:32:00 UTC

[jira] [Assigned] (SPARK-25659) Test type inference specification for createDataFrame in PySpark

     [ https://issues.apache.org/jira/browse/SPARK-25659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Apache Spark reassigned SPARK-25659:
------------------------------------

    Assignee: Apache Spark

> Test type inference specification for createDataFrame in PySpark
> ----------------------------------------------------------------
>
>                 Key: SPARK-25659
>                 URL: https://issues.apache.org/jira/browse/SPARK-25659
>             Project: Spark
>          Issue Type: Test
>          Components: PySpark
>    Affects Versions: 2.4.0
>            Reporter: Hyukjin Kwon
>            Assignee: Apache Spark
>            Priority: Minor
>
> For instance, see https://github.com/apache/spark/blob/08c76b5d39127ae207d9d1fff99c2551e6ce2581/python/pyspark/sql/types.py#L894-L905
> Looks we intended to support {{datetime.time}} and {{None}} for type inference too but it does not work:
> {code}
> >>> spark.createDataFrame([[datetime.time()]])
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/.../spark/python/pyspark/sql/session.py", line 751, in createDataFrame
>     rdd, schema = self._createFromLocal(map(prepare, data), schema)
>   File "/.../spark/python/pyspark/sql/session.py", line 432, in _createFromLocal
>     data = [schema.toInternal(row) for row in data]
>   File "/.../spark/python/pyspark/sql/types.py", line 604, in toInternal
>     for f, v, c in zip(self.fields, obj, self._needConversion))
>   File "/.../spark/python/pyspark/sql/types.py", line 604, in <genexpr>
>     for f, v, c in zip(self.fields, obj, self._needConversion))
>   File "/.../spark/python/pyspark/sql/types.py", line 442, in toInternal
>     return self.dataType.toInternal(obj)
>   File "/.../spark/python/pyspark/sql/types.py", line 193, in toInternal
>     else time.mktime(dt.timetuple()))
> AttributeError: 'datetime.time' object has no attribute 'timetuple'
> >>> spark.createDataFrame([[None]])
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/.../spark/python/pyspark/sql/session.py", line 751, in createDataFrame
>     rdd, schema = self._createFromLocal(map(prepare, data), schema)
>   File "/.../spark/python/pyspark/sql/session.py", line 419, in _createFromLocal
>     struct = self._inferSchemaFromList(data, names=schema)
>   File "/.../python/pyspark/sql/session.py", line 353, in _inferSchemaFromList
>     raise ValueError("Some of types cannot be determined after inferring")
> ValueError: Some of types cannot be determined after inferring
> {code}
> Looks we better add supported type inference specification.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org