You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Maciej Bryński (JIRA)" <ji...@apache.org> on 2015/09/01 10:55:45 UTC
[jira] [Created] (SPARK-10392) Pyspark - Wrong DateType support
Maciej Bryński created SPARK-10392:
--------------------------------------
Summary: Pyspark - Wrong DateType support
Key: SPARK-10392
URL: https://issues.apache.org/jira/browse/SPARK-10392
Project: Spark
Issue Type: Bug
Components: PySpark, SQL
Reporter: Maciej Bryński
I have following problem.
I created table.
{code}
CREATE TABLE `spark_test` (
`id` INT(11) NULL,
`date` DATE NULL
)
COLLATE='utf8_general_ci'
ENGINE=InnoDB
;
INSERT INTO `sandbox`.`spark_test` (`id`, `date`) VALUES (1, '1970-01-01');
{code}
Then I'm trying to read data and date '1970-01-01' is converted to int. This makes rdd incompatible with its own schema.
{code}
df = sqlCtx.read.jdbc("jdbc:mysql://host/sandbox?user=user&password=password", 'spark_test')
print(df.collect())
df = sqlCtx.createDataFrame(df.rdd, df.schema)
[Row(id=1, date=0)]
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-36-ebc1d94e0d8c> in <module>()
1 df = sqlCtx.read.jdbc("jdbc:mysql://a2.adpilot.co/sandbox?user=mbrynski&password=CebO3ax4", 'spark_test')
2 print(df.collect())
----> 3 df = sqlCtx.createDataFrame(df.rdd, df.schema)
/mnt/spark/spark/python/pyspark/sql/context.py in createDataFrame(self, data, schema, samplingRatio)
402
403 if isinstance(data, RDD):
--> 404 rdd, schema = self._createFromRDD(data, schema, samplingRatio)
405 else:
406 rdd, schema = self._createFromLocal(data, schema)
/mnt/spark/spark/python/pyspark/sql/context.py in _createFromRDD(self, rdd, schema, samplingRatio)
296 rows = rdd.take(10)
297 for row in rows:
--> 298 _verify_type(row, schema)
299
300 else:
/mnt/spark/spark/python/pyspark/sql/types.py in _verify_type(obj, dataType)
1152 "length of fields (%d)" % (len(obj), len(dataType.fields)))
1153 for v, f in zip(obj, dataType.fields):
-> 1154 _verify_type(v, f.dataType)
1155
1156
/mnt/spark/spark/python/pyspark/sql/types.py in _verify_type(obj, dataType)
1136 # subclass of them can not be fromInternald in JVM
1137 if type(obj) not in _acceptable_types[_type]:
-> 1138 raise TypeError("%s can not accept object in type %s" % (dataType, type(obj)))
1139
1140 if isinstance(dataType, ArrayType):
TypeError: DateType can not accept object in type <class 'int'>
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org