You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Takuya Ueshin (JIRA)" <ji...@apache.org> on 2018/01/08 05:34:00 UTC
[jira] [Resolved] (SPARK-22566) Better error message for
`_merge_type` in Pandas to Spark DF conversion
[ https://issues.apache.org/jira/browse/SPARK-22566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Takuya Ueshin resolved SPARK-22566.
-----------------------------------
Resolution: Fixed
Fix Version/s: 2.3.0
Issue resolved by pull request 19792
[https://github.com/apache/spark/pull/19792]
> Better error message for `_merge_type` in Pandas to Spark DF conversion
> -----------------------------------------------------------------------
>
> Key: SPARK-22566
> URL: https://issues.apache.org/jira/browse/SPARK-22566
> Project: Spark
> Issue Type: Improvement
> Components: PySpark
> Affects Versions: 2.2.0
> Reporter: Guilherme Berger
> Assignee: Guilherme Berger
> Priority: Minor
> Fix For: 2.3.0
>
>
> When creating a Spark DF from a Pandas DF without specifying a schema, schema inference is used. This inference can fail when a column contains values of two different types; this is ok. The problem is the error message does not tell us in which column this happened.
> When this happens, it is painful to debug since the error message is too vague.
> I plan on submitting a PR which fixes this, providing a better error message for such cases, containing the column name (and possibly the problematic values too).
> >>> spark_session.createDataFrame(pandas_df)
> File "redacted/pyspark/sql/session.py", line 541, in createDataFrame
> rdd, schema = self._createFromLocal(map(prepare, data), schema)
> File "redacted/pyspark/sql/session.py", line 401, in _createFromLocal
> struct = self._inferSchemaFromList(data)
> File "redacted/pyspark/sql/session.py", line 333, in _inferSchemaFromList
> schema = reduce(_merge_type, map(_infer_schema, data))
> File "redacted/pyspark/sql/types.py", line 1124, in _merge_type
> for f in a.fields]
> File "redacted/pyspark/sql/types.py", line 1118, in _merge_type
> raise TypeError("Can not merge type %s and %s" % (type(a), type(b)))
> TypeError: Can not merge type <class 'pyspark.sql.types.LongType'> and <class 'pyspark.sql.types.StringType'>
>
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org