You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by sergey-rubtsov <gi...@git.apache.org> on 2018/02/06 11:13:46 UTC

[GitHub] spark pull request #20140: [SPARK-19228][SQL] Introduce tryParseDate method ...

Github user sergey-rubtsov commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20140#discussion_r166261313
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala ---
    @@ -150,6 +151,16 @@ class CSVOptions(
     
       val isCommentSet = this.comment != '\u0000'
     
    +  def dateFormatter: DateTimeFormatter = {
    +    DateTimeFormatter.ofPattern(dateFormat.getPattern)
    +      .withLocale(Locale.US).withZone(timeZone.toZoneId).withResolverStyle(ResolverStyle.SMART)
    +  }
    +
    +  def timestampFormatter: DateTimeFormatter = {
    +    DateTimeFormatter.ofPattern(timestampFormat.getPattern)
    --- End diff --
    
    DateTimeFormatter is a standard time library from java 8. FastDateFormat can't properly parse date and timestamp. 
    
    I can create some test cases to prove it, but I need many time for that.
    
    Also, FastDateFormat does not meet the ISO8601: https://en.wikipedia.org/wiki/ISO_8601
    Current implementation of CSVInferSchema contains other bugs. For example, test test("Timestamp field types are inferred correctly via custom date format") in class CSVInferSchemaSuite must not pass, because timestampFormat "yyyy-mm" is wrong format for year and month. It should be "yyyy-MM".
    It is better to make refactor of date types and change deprecated types on new ones for the whole project.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org