You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Valentino (JIRA)" <ji...@apache.org> on 2018/07/30 09:36:00 UTC
[jira] [Created] (SPARK-24969) SQL: to_date function can't parse
date strings in different locales.
Valentino created SPARK-24969:
---------------------------------
Summary: SQL: to_date function can't parse date strings in different locales.
Key: SPARK-24969
URL: https://issues.apache.org/jira/browse/SPARK-24969
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 2.2.1
Environment: Bare Spark 2.2.1 installation, on RHEL 6.
Reporter: Valentino
The locale for DateTimeUtils, that is internally used by to_date SQL function, is set in code to be Locale.US.
This causes problems parsing a dataset which has dates in a different (italian in this case) language.
{code}
spark.read.format("csv")
.option("sep", ";")
.csv(logFile)
.toDF("DATA", .....)
.withColumn("DATA2", to_date(col("DATA"), "yyyy MMM"))
.show(10)
{code}
Results from example dataset:
|*DATA*|*DATA2*|
|2018 giu|null|
|2018 mag|null|
|2018 apr|2018-04-01|
|2018 mar|2018-03-01|
|2018 feb|2018-02-01|
|2018 gen|null|
|2017 dic|null|
|2017 nov|2017-11-01|
|2017 ott|null|
|2017 set|null|
Expected results: All values converted.
TEMPORARY WORKAROUND:
In object {{org.apache.spark.sql.catalyst.util.DateTimeUtils}}, replace all instances of {{Locale.US}} with {{Locale.<your locale>}}
ADDITIONAL NOTES:
I can make a pull request available on GitHub.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org