You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "marcin pekalski (JIRA)" <ji...@apache.org> on 2016/10/24 07:32:58 UTC
[jira] [Created] (SPARK-18072) empty/null Timestamp field
marcin pekalski created SPARK-18072:
---------------------------------------
Summary: empty/null Timestamp field
Key: SPARK-18072
URL: https://issues.apache.org/jira/browse/SPARK-18072
Project: Spark
Issue Type: Question
Components: Input/Output
Affects Versions: 2.0.0
Environment: hadoop 2.7.1, ubuntu 15.10, databricks 1.5, spark-csv 1.5.0, scala 2.11.8
Reporter: marcin pekalski
I was asked by [~falaki] to create a jira here, previously it was reported as databricks' issue on github https://github.com/databricks/spark-csv/issues/388#issuecomment-255631718
I have problem with spark 2.0.0, spark-csv 1.5.0, and scala 2.11.8.
I have a csv file that I want to convert to parquet. There is a column with timestamps and some of them are missing, those are empty strings (without quotes, and it is not even a spacer, just new line straightaway as that is the last column). I get exception thrown:
{code}
16/10/23 02:46:08 ERROR Utils: Aborting task
java.lang.IllegalArgumentException
at java.sql.Date.valueOf(Date.java:143)
at org.apache.spark.sql.catalyst.util.DateTimeUtils$.stringToTime(DateTimeUtils.scala:137)
at org.apache.spark.sql.execution.datasources.csv.CSVTypeCast$.castTo(CSVInferSchema.scala:287)
at org.apache.spark.sql.execution.datasources.csv.CSVRelation$$anonfun$csvParser$3.apply(CSVRelation.scala:115)
at org.apache.spark.sql.execution.datasources.csv.CSVRelation$$anonfun$csvParser$3.apply(CSVRelation.scala:84)
...
{code}
The options I use when reading csv
{code}
"delimiter" -> ","
"header" -> "true"
"inferSchema" -> "true"
"treatEmptyValuesAsNulls" ->"true"
"nullValue"->""
{code}
The execution goes through *CSVINferSchema.scala* (lines 284-287) in **spark-sql_2.11-2.0.0-sources.jar**
{code}
case _: TimestampType =>
// This one will lose microseconds parts.
// See https://issues.apache.org/jira/browse/SPARK-10681.
DateTimeUtils.stringToTime(datum).getTime * 1000L
{code}
it invokes `Date.valueOf(s)` in *DateTimeUtils.scala* *spark-catalyst_2.11-2.0.0-sources.jar* that then throws excepion in *java.sql.Date.valueOf*.
Is that a bug, I am doing something wrong, or there is a way to pass a default value?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org