You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by sergey-rubtsov <gi...@git.apache.org> on 2018/02/06 11:13:46 UTC
[GitHub] spark pull request #20140: [SPARK-19228][SQL] Introduce tryParseDate method ...
Github user sergey-rubtsov commented on a diff in the pull request:
https://github.com/apache/spark/pull/20140#discussion_r166261313
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala ---
@@ -150,6 +151,16 @@ class CSVOptions(
val isCommentSet = this.comment != '\u0000'
+ def dateFormatter: DateTimeFormatter = {
+ DateTimeFormatter.ofPattern(dateFormat.getPattern)
+ .withLocale(Locale.US).withZone(timeZone.toZoneId).withResolverStyle(ResolverStyle.SMART)
+ }
+
+ def timestampFormatter: DateTimeFormatter = {
+ DateTimeFormatter.ofPattern(timestampFormat.getPattern)
--- End diff --
DateTimeFormatter is a standard time library from java 8. FastDateFormat can't properly parse date and timestamp.
I can create some test cases to prove it, but I need many time for that.
Also, FastDateFormat does not meet the ISO8601: https://en.wikipedia.org/wiki/ISO_8601
Current implementation of CSVInferSchema contains other bugs. For example, test test("Timestamp field types are inferred correctly via custom date format") in class CSVInferSchemaSuite must not pass, because timestampFormat "yyyy-mm" is wrong format for year and month. It should be "yyyy-MM".
It is better to make refactor of date types and change deprecated types on new ones for the whole project.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org