You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apoorva Sareen (JIRA)" <ji...@apache.org> on 2018/02/15 15:17:00 UTC

[jira] [Created] (SPARK-23436) Incorrect Date column Inference in partition discovery

Apoorva Sareen created SPARK-23436:
--------------------------------------

             Summary: Incorrect Date column Inference in partition discovery
                 Key: SPARK-23436
                 URL: https://issues.apache.org/jira/browse/SPARK-23436
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.2.1
            Reporter: Apoorva Sareen


If a Partition column appears to partial date/timestamp

    example : 2018-01-01-23 

where it is only truncated upto an hour then the data types of the partitioning columns are automatically inferred as date however, the values are loaded as null. 

Here is an example code to reproduce this behaviour

 

 
{code:java}
val data = Seq(("1", "2018-01", "2018-01-01-04", "test")).toDF("id", "date_month", "data_hour", "data")  

data.write.partitionBy("id","date_month","data_hour").parquet("output/test")

val input = spark.read.parquet("output/test")  

input.printSchema()

input.show()


## Result ###

root

|-- data: string (nullable = true)

|-- id: integer (nullable = true)

|-- date_month: string (nullable = true)

|-- data_hour: date (nullable = true)



+----+---+----------+---------+

|data| id|date_month|data_hour|

+----+---+----------+---------+

|test|  1|   2018-01|     null|

+----+---+----------+---------+{code}
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org