You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2017/10/11 14:42:00 UTC

[jira] [Resolved] (SPARK-21763) InferSchema option does not infer the correct schema (timestamp) from xlsx file.

     [ https://issues.apache.org/jira/browse/SPARK-21763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon resolved SPARK-21763.
----------------------------------
    Resolution: Invalid

I think it should be asked to https://github.com/crealytics/spark-excel.

> InferSchema option does not infer the correct schema (timestamp) from xlsx file.
> --------------------------------------------------------------------------------
>
>                 Key: SPARK-21763
>                 URL: https://issues.apache.org/jira/browse/SPARK-21763
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.0
>         Environment: Environment is my personal laptop.
>            Reporter: ANSHUMAN
>            Priority: Minor
>
> I have a xlsx file containing date/time filed (My Time) in following format and sample records - 
> 5/16/2017  12:19:00 AM
> 5/16/2017  12:56:00 AM
> 5/16/2017  1:17:00 PM
> 5/16/2017  5:26:00 PM
> 5/16/2017  6:26:00 PM
> I am reading the xlsx file in following manner: -
> {code:java}
> val inputDF = spark.sqlContext.read.format("com.crealytics.spark.excel")
>     .option("location","file:///C:/Users/file.xlsx")
>     .option("useHeader","true")
>     .option("treatEmptyValuesAsNulls","true")
>     .option("inferSchema","true")
>     .option("addColorColumns","false")
>     .load()
> {code}
> When I try to get schema using 
> {code:java}
> inputDF.printSchema()
> {code}
> , I get *Double*.
> Sometimes, even I get the schema as *String*.
> And when I print the data, I get the output as: -
> +------------------+
> |       My Time|
> +------------------+
> |42871.014189814814|
> | 42871.03973379629|
> |42871.553773148145|
> | 42871.72765046296|
> | 42871.76887731482|
> +------------------+
> Above output is clearly not correct for the given input.
> Moreover, if I convert the xlsx file in csv format and read it, I get the output correctly. Here is the way how I read in csv format: - 
> {code:java}
> spark.sqlContext.read.format("csv")
>       .option("header", "true")
>       .option("inferSchema", true)
>       .load(fileLocation)
> {code}
> Please look into the issue. I could not find the answer to it anywhere.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org