You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2017/10/11 14:42:00 UTC
[jira] [Resolved] (SPARK-21763) InferSchema option does not infer
the correct schema (timestamp) from xlsx file.
[ https://issues.apache.org/jira/browse/SPARK-21763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-21763.
----------------------------------
Resolution: Invalid
I think it should be asked to https://github.com/crealytics/spark-excel.
> InferSchema option does not infer the correct schema (timestamp) from xlsx file.
> --------------------------------------------------------------------------------
>
> Key: SPARK-21763
> URL: https://issues.apache.org/jira/browse/SPARK-21763
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.0.0
> Environment: Environment is my personal laptop.
> Reporter: ANSHUMAN
> Priority: Minor
>
> I have a xlsx file containing date/time filed (My Time) in following format and sample records -
> 5/16/2017 12:19:00 AM
> 5/16/2017 12:56:00 AM
> 5/16/2017 1:17:00 PM
> 5/16/2017 5:26:00 PM
> 5/16/2017 6:26:00 PM
> I am reading the xlsx file in following manner: -
> {code:java}
> val inputDF = spark.sqlContext.read.format("com.crealytics.spark.excel")
> .option("location","file:///C:/Users/file.xlsx")
> .option("useHeader","true")
> .option("treatEmptyValuesAsNulls","true")
> .option("inferSchema","true")
> .option("addColorColumns","false")
> .load()
> {code}
> When I try to get schema using
> {code:java}
> inputDF.printSchema()
> {code}
> , I get *Double*.
> Sometimes, even I get the schema as *String*.
> And when I print the data, I get the output as: -
> +------------------+
> | My Time|
> +------------------+
> |42871.014189814814|
> | 42871.03973379629|
> |42871.553773148145|
> | 42871.72765046296|
> | 42871.76887731482|
> +------------------+
> Above output is clearly not correct for the given input.
> Moreover, if I convert the xlsx file in csv format and read it, I get the output correctly. Here is the way how I read in csv format: -
> {code:java}
> spark.sqlContext.read.format("csv")
> .option("header", "true")
> .option("inferSchema", true)
> .load(fileLocation)
> {code}
> Please look into the issue. I could not find the answer to it anywhere.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org