You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Keith Chapman <ke...@gmail.com> on 2018/08/17 23:50:34 UTC
Pyspark error when converting string to timestamp in map function
Hi all,
I'm trying to create a dataframe enforcing a schema so that I can write it
to a parquet file. The schema has timestamps and I get an error with
pyspark. The following is a snippet of code that exhibits the problem,
df = sqlctx.range(1000)
schema = StructType([StructField('a', TimestampType(), True)])
df1 = sqlctx.createDataFrame(df.rdd.map(row_gen_func), schema)
row_gen_func is a function that retruns timestamp strings of the form
"2018-03-21 11:09:44"
When I compile this with Spark 2.2 I get the following error,
raise TypeError("%s can not accept object %r in type %s" % (dataType, obj,
type(obj)))
TypeError: TimestampType can not accept object '2018-03-21 08:06:17' in
type <type 'str'>
Regards,
Keith.
http://keith-chapman.com