You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apoorva Sareen (JIRA)" <ji...@apache.org> on 2018/02/15 15:17:00 UTC
[jira] [Created] (SPARK-23436) Incorrect Date column Inference in
partition discovery
Apoorva Sareen created SPARK-23436:
--------------------------------------
Summary: Incorrect Date column Inference in partition discovery
Key: SPARK-23436
URL: https://issues.apache.org/jira/browse/SPARK-23436
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 2.2.1
Reporter: Apoorva Sareen
If a Partition column appears to partial date/timestamp
example : 2018-01-01-23
where it is only truncated upto an hour then the data types of the partitioning columns are automatically inferred as date however, the values are loaded as null.
Here is an example code to reproduce this behaviour
{code:java}
val data = Seq(("1", "2018-01", "2018-01-01-04", "test")).toDF("id", "date_month", "data_hour", "data")
data.write.partitionBy("id","date_month","data_hour").parquet("output/test")
val input = spark.read.parquet("output/test")
input.printSchema()
input.show()
## Result ###
root
|-- data: string (nullable = true)
|-- id: integer (nullable = true)
|-- date_month: string (nullable = true)
|-- data_hour: date (nullable = true)
+----+---+----------+---------+
|data| id|date_month|data_hour|
+----+---+----------+---------+
|test| 1| 2018-01| null|
+----+---+----------+---------+{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org