You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by sergey-rubtsov <gi...@git.apache.org> on 2018/05/18 12:19:37 UTC

[GitHub] spark pull request #21363: [SPARK-19228][SQL] Migrate on Java 8 time from Fa...

GitHub user sergey-rubtsov opened a pull request:

    https://github.com/apache/spark/pull/21363

    [SPARK-19228][SQL] Migrate on Java 8 time from FastDateFormat for meet the ISO8601

    ## What changes were proposed in this pull request?
    
    Add support for inferring DateType and custom "dateFormat" option.
    Fix an old bug with parse string to SQL's timestamp value in microseconds accuracy.
    Add a type-widening rule in findTightestCommonType between DateType and TimestampType.
    
    ## How was this patch tested?
    
    Fix some tests to accord with an internationally agreed way to represent dates.
    Add an end-to-end test case and unit tests.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/sergey-rubtsov/spark master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21363.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21363
    
----
commit 65179a2bdd1623fb7f4077cdc316de5a7436c49d
Author: sergei.rubtcov <se...@...>
Date:   2018-05-18T12:13:05Z

    [SPARK-19228][SQL] Migrate on Java 8 time from FastDateFormat for meet the ISO8601 and parsing dates in csv correctly. Add support for inferring DateType and custom "dateFormat" option.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21363: [SPARK-19228][SQL] Migrate on Java 8 time from Fa...

Posted by sergey-rubtsov <gi...@git.apache.org>.
Github user sergey-rubtsov commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21363#discussion_r189255920
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParserSuite.scala ---
    @@ -107,20 +107,26 @@ class UnivocityParserSuite extends SparkFunSuite {
         assert(parser.makeConverter("_1", BooleanType, options = options).apply("true") == true)
     
         val timestampsOptions =
    -      new CSVOptions(Map("timestampFormat" -> "dd/MM/yyyy hh:mm"), "GMT")
    +      new CSVOptions(Map("timestampFormat" -> "dd/MM/yyyy HH:mm"), "GMT")
    --- End diff --
    
    In accordind to official documentation, this test must not pass, because "hh" for hours means that hour in am/pm (1-12):
    https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html
    https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21363: [SPARK-19228][SQL] Migrate on Java 8 time from FastDateF...

Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on the issue:

    https://github.com/apache/spark/pull/21363
  
    @HyukjinKwon Great thanks for ping me, I'll try to work on this and cc all reviewer in this PR.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21363: [SPARK-19228][SQL] Migrate on Java 8 time from FastDateF...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21363
  
    **[Test build #90870 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90870/testReport)** for PR 21363 at commit [`65179a2`](https://github.com/apache/spark/commit/65179a2bdd1623fb7f4077cdc316de5a7436c49d).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21363: [SPARK-19228][SQL] Migrate on Java 8 time from FastDateF...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/21363
  
    ok to test


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21363: [SPARK-19228][SQL] Migrate on Java 8 time from FastDateF...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21363
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21363: [SPARK-19228][SQL] Migrate on Java 8 time from Fa...

Posted by sergey-rubtsov <gi...@git.apache.org>.
Github user sergey-rubtsov commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21363#discussion_r189583203
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParserSuite.scala ---
    @@ -107,20 +107,26 @@ class UnivocityParserSuite extends SparkFunSuite {
         assert(parser.makeConverter("_1", BooleanType, options = options).apply("true") == true)
     
         val timestampsOptions =
    -      new CSVOptions(Map("timestampFormat" -> "dd/MM/yyyy hh:mm"), "GMT")
    +      new CSVOptions(Map("timestampFormat" -> "dd/MM/yyyy HH:mm"), "GMT")
         val customTimestamp = "31/01/2015 00:00"
    -    val expectedTime = timestampsOptions.timestampFormat.parse(customTimestamp).getTime
    +
    +    val expectedTime = LocalDateTime.parse(customTimestamp, timestampsOptions.timestampFormatter)
    +      .atZone(options.timeZone.toZoneId)
    +      .toInstant.toEpochMilli
         val castedTimestamp =
    -      parser.makeConverter("_1", TimestampType, nullable = true, options = timestampsOptions)
    +      parser.makeConverter("_1", TimestampType, nullable = true, timestampsOptions)
             .apply(customTimestamp)
         assert(castedTimestamp == expectedTime * 1000L)
     
    -    val customDate = "31/01/2015"
         val dateOptions = new CSVOptions(Map("dateFormat" -> "dd/MM/yyyy"), "GMT")
    -    val expectedDate = dateOptions.dateFormat.parse(customDate).getTime
    +    val customDate = "31/01/2015"
    +
    +    val expectedDate = LocalDate.parse(customDate, dateOptions.dateFormatter)
    +      .atStartOfDay(options.timeZone.toZoneId)
    +      .toInstant.toEpochMilli
         val castedDate =
    -      parser.makeConverter("_1", DateType, nullable = true, options = dateOptions)
    --- End diff --
    
    ok


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21363: [SPARK-19228][SQL] Migrate on Java 8 time from FastDateF...

Posted by sergey-rubtsov <gi...@git.apache.org>.
Github user sergey-rubtsov commented on the issue:

    https://github.com/apache/spark/pull/21363
  
    cc @viirya @mgaido91 @dongjoon-hyun


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21363: [SPARK-19228][SQL] Migrate on Java 8 time from FastDateF...

Posted by sergey-rubtsov <gi...@git.apache.org>.
Github user sergey-rubtsov commented on the issue:

    https://github.com/apache/spark/pull/21363
  
    @HyukjinKwon please, clarify, how to add a configuration to control this behaviour? Do you mean to keep backward compatibility? 
    But it seems, now, we don't handle a Spark "DataType" at all. 
    I can change the behaviour in next way: 
    Hanlde "DataType" if option "dateFormat" was set by customer, if it is not set, ignore it at all, is it okay?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21363: [SPARK-19228][SQL] Migrate on Java 8 time from Fa...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21363#discussion_r189489932
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala ---
    @@ -140,14 +141,23 @@ private[csv] object CSVInferSchema {
       private def tryParseDouble(field: String, options: CSVOptions): DataType = {
         if ((allCatch opt field.toDouble).isDefined || isInfOrNan(field, options)) {
           DoubleType
    +    } else {
    +      tryParseDate(field, options)
    --- End diff --
    
    Is this a behavior change? Previously timestamp type, now date type/timestamp type?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21363: [SPARK-19228][SQL] Migrate on Java 8 time from FastDateF...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21363
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21363: [SPARK-19228][SQL] Migrate on Java 8 time from FastDateF...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21363
  
    **[Test build #91599 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91599/testReport)** for PR 21363 at commit [`65179a2`](https://github.com/apache/spark/commit/65179a2bdd1623fb7f4077cdc316de5a7436c49d).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21363: [SPARK-19228][SQL] Migrate on Java 8 time from Fa...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21363#discussion_r189487172
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala ---
    @@ -143,6 +145,12 @@ object DateTimeUtils {
         millisLocal - getOffsetFromLocalMillis(millisLocal, timeZone)
       }
     
    +  def dateTimeToMicroseconds(localDateTime: LocalDateTime, timeZone: TimeZone): Long = {
    +    val microOfSecond = localDateTime.getLong(ChronoField.MICRO_OF_SECOND)
    +    val epochSecond = localDateTime.atZone(timeZone.toZoneId).toInstant.getEpochSecond
    +    epochSecond * 1000000L + microOfSecond
    --- End diff --
    
    `1000000L` -> `MICROS_PER_SECOND`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21363: [SPARK-19228][SQL] Migrate on Java 8 time from Fa...

Posted by sergey-rubtsov <gi...@git.apache.org>.
Github user sergey-rubtsov commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21363#discussion_r189597989
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala ---
    @@ -140,14 +141,23 @@ private[csv] object CSVInferSchema {
       private def tryParseDouble(field: String, options: CSVOptions): DataType = {
         if ((allCatch opt field.toDouble).isDefined || isInfOrNan(field, options)) {
           DoubleType
    +    } else {
    +      tryParseDate(field, options)
    --- End diff --
    
    At the moment, DateType here is ignored at all, I'm not sure that it was conceived when the type was created


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21363: [SPARK-19228][SQL] Migrate on Java 8 time from Fa...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21363#discussion_r189483592
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala ---
    @@ -119,7 +119,6 @@ class CSVOptions(
       val positiveInf = parameters.getOrElse("positiveInf", "Inf")
       val negativeInf = parameters.getOrElse("negativeInf", "-Inf")
     
    -
    --- End diff --
    
    Sounds unrelated change.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21363: [SPARK-19228][SQL] Migrate on Java 8 time from Fa...

Posted by sergey-rubtsov <gi...@git.apache.org>.
Github user sergey-rubtsov commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21363#discussion_r189583023
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala ---
    @@ -90,6 +90,7 @@ private[csv] object CSVInferSchema {
               // DecimalTypes have different precisions and scales, so we try to find the common type.
               findTightestCommonType(typeSoFar, tryParseDecimal(field, options)).getOrElse(StringType)
             case DoubleType => tryParseDouble(field, options)
    +        case DateType => tryParseDate(field, options)
    --- End diff --
    
    I can do it, but where exactly it should be documented?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21363: [SPARK-19228][SQL] Migrate on Java 8 time from FastDateF...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/21363
  
    ok to test


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21363: [SPARK-19228][SQL] Migrate on Java 8 time from FastDateF...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21363
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21363: [SPARK-19228][SQL] Migrate on Java 8 time from FastDateF...

Posted by sergey-rubtsov <gi...@git.apache.org>.
Github user sergey-rubtsov commented on the issue:

    https://github.com/apache/spark/pull/21363
  
    @HyukjinKwon, please take a look


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21363: [SPARK-19228][SQL] Migrate on Java 8 time from FastDateF...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21363
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21363: [SPARK-19228][SQL] Migrate on Java 8 time from Fa...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21363#discussion_r189490878
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchemaSuite.scala ---
    @@ -59,13 +59,21 @@ class CSVInferSchemaSuite extends SparkFunSuite {
         assert(CSVInferSchema.inferField(IntegerType, textValueOne, options) == expectedTypeOne)
       }
     
    -  test("Timestamp field types are inferred correctly via custom data format") {
    -    var options = new CSVOptions(Map("timestampFormat" -> "yyyy-mm"), "GMT")
    +  test("Timestamp field types are inferred correctly via custom date format") {
    +    var options = new CSVOptions(Map("timestampFormat" -> "yyyy-MM"), "GMT")
    --- End diff --
    
    Why we need to change this?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21363: [SPARK-19228][SQL] Migrate on Java 8 time from Fa...

Posted by sergey-rubtsov <gi...@git.apache.org>.
Github user sergey-rubtsov commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21363#discussion_r189596275
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala ---
    @@ -143,6 +145,12 @@ object DateTimeUtils {
         millisLocal - getOffsetFromLocalMillis(millisLocal, timeZone)
       }
     
    +  def dateTimeToMicroseconds(localDateTime: LocalDateTime, timeZone: TimeZone): Long = {
    +    val microOfSecond = localDateTime.getLong(ChronoField.MICRO_OF_SECOND)
    +    val epochSecond = localDateTime.atZone(timeZone.toZoneId).toInstant.getEpochSecond
    +    epochSecond * 1000000L + microOfSecond
    --- End diff --
    
    thanks


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21363: [SPARK-19228][SQL] Migrate on Java 8 time from FastDateF...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/21363
  
    Looks difficult because the behaviours themselves are different. One possibility is a fallback and the other possibility is configuration.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21363: [SPARK-19228][SQL] Migrate on Java 8 time from FastDateF...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21363
  
    **[Test build #90870 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90870/testReport)** for PR 21363 at commit [`65179a2`](https://github.com/apache/spark/commit/65179a2bdd1623fb7f4077cdc316de5a7436c49d).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21363: [SPARK-19228][SQL] Migrate on Java 8 time from FastDateF...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/21363
  
    Here is the PR: https://github.com/apache/spark/pull/23150 which allows to switch on java.time API (in CSV so far).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21363: [SPARK-19228][SQL] Migrate on Java 8 time from FastDateF...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21363
  
    Build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21363: [SPARK-19228][SQL] Migrate on Java 8 time from Fa...

Posted by sergey-rubtsov <gi...@git.apache.org>.
Github user sergey-rubtsov commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21363#discussion_r189255997
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParserSuite.scala ---
    @@ -107,20 +107,26 @@ class UnivocityParserSuite extends SparkFunSuite {
         assert(parser.makeConverter("_1", BooleanType, options = options).apply("true") == true)
     
         val timestampsOptions =
    -      new CSVOptions(Map("timestampFormat" -> "dd/MM/yyyy hh:mm"), "GMT")
    --- End diff --
    
    In accordind to official documentation, this test must not pass, because "hh" for hours means that hour in am/pm (1-12):
    https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html
    https://docs.oracle.com/javase/7/docs/api/java/text/SimpleDateFormat.html


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21363: [SPARK-19228][SQL] Migrate on Java 8 time from FastDateF...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/21363
  
    @sergey-rubtsov, would we be able to add a configuration to control this behaviour? Sounds we should better have a configuration to control this behaviour for now since the date / timestamp parsing logic is affected.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21363: [SPARK-19228][SQL] Migrate on Java 8 time from FastDateF...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/21363
  
    @xuanyuanking Are you still working on this? If not, I could continue.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21363: [SPARK-19228][SQL] Migrate on Java 8 time from FastDateF...

Posted by xuanyuanking <gi...@git.apache.org>.
Github user xuanyuanking commented on the issue:

    https://github.com/apache/spark/pull/21363
  
    @MaxGekk Sorry for the late, something inserted in the my scheduler, I plan to start this PR in this weekend, if its too late please just take it, sorry for the late again.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21363: [SPARK-19228][SQL] Migrate on Java 8 time from Fa...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21363#discussion_r189483493
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParserSuite.scala ---
    @@ -107,20 +107,26 @@ class UnivocityParserSuite extends SparkFunSuite {
         assert(parser.makeConverter("_1", BooleanType, options = options).apply("true") == true)
     
         val timestampsOptions =
    -      new CSVOptions(Map("timestampFormat" -> "dd/MM/yyyy hh:mm"), "GMT")
    +      new CSVOptions(Map("timestampFormat" -> "dd/MM/yyyy HH:mm"), "GMT")
         val customTimestamp = "31/01/2015 00:00"
    -    val expectedTime = timestampsOptions.timestampFormat.parse(customTimestamp).getTime
    +
    +    val expectedTime = LocalDateTime.parse(customTimestamp, timestampsOptions.timestampFormatter)
    +      .atZone(options.timeZone.toZoneId)
    +      .toInstant.toEpochMilli
         val castedTimestamp =
    -      parser.makeConverter("_1", TimestampType, nullable = true, options = timestampsOptions)
    +      parser.makeConverter("_1", TimestampType, nullable = true, timestampsOptions)
             .apply(customTimestamp)
         assert(castedTimestamp == expectedTime * 1000L)
     
    -    val customDate = "31/01/2015"
         val dateOptions = new CSVOptions(Map("dateFormat" -> "dd/MM/yyyy"), "GMT")
    -    val expectedDate = dateOptions.dateFormat.parse(customDate).getTime
    +    val customDate = "31/01/2015"
    +
    +    val expectedDate = LocalDate.parse(customDate, dateOptions.dateFormatter)
    +      .atStartOfDay(options.timeZone.toZoneId)
    +      .toInstant.toEpochMilli
         val castedDate =
    -      parser.makeConverter("_1", DateType, nullable = true, options = dateOptions)
    --- End diff --
    
    I would keep this line as was. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21363: [SPARK-19228][SQL] Migrate on Java 8 time from FastDateF...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/21363
  
    I am going to suggest to close this since it's being active more then few weeks. It should be good to fix. Let me leave some cc's who might be interested in this just FYI. Feel free to take over this when you guys find some time or are interested in this. 
    
    Adding @xuanyuanking, @mgaido91, @viirya, @MaxGekk, @softmanu who I could think of for now. Feel free to ignore my cc if you guys are busy or having more important fixes you guys are working on.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21363: [SPARK-19228][SQL] Migrate on Java 8 time from FastDateF...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21363
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21363: [SPARK-19228][SQL] Migrate on Java 8 time from FastDateF...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21363
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21363: [SPARK-19228][SQL] Migrate on Java 8 time from Fa...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21363#discussion_r189526889
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala ---
    @@ -90,6 +90,7 @@ private[csv] object CSVInferSchema {
               // DecimalTypes have different precisions and scales, so we try to find the common type.
               findTightestCommonType(typeSoFar, tryParseDecimal(field, options)).getOrElse(StringType)
             case DoubleType => tryParseDouble(field, options)
    +        case DateType => tryParseDate(field, options)
    --- End diff --
    
    this also is a behavior change. Shall we document it?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21363: [SPARK-19228][SQL] Migrate on Java 8 time from FastDateF...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/21363
  
    @sergey-rubtsov we have to keep backward compatibility. If a user upgrades, with your change a running application may break because of data not being anymore timestamp but date. We can add a new entry in `SQLConf` for that.
    
    Moreover, we have to mention in the migration guide all the behavioral changes we introduce.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21363: [SPARK-19228][SQL] Migrate on Java 8 time from FastDateF...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21363
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90870/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21363: [SPARK-19228][SQL] Migrate on Java 8 time from FastDateF...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21363
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21363: [SPARK-19228][SQL] Migrate on Java 8 time from FastDateF...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21363
  
    **[Test build #91599 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91599/testReport)** for PR 21363 at commit [`65179a2`](https://github.com/apache/spark/commit/65179a2bdd1623fb7f4077cdc316de5a7436c49d).
     * This patch **fails Scala style tests**.
     * This patch **does not merge cleanly**.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21363: [SPARK-19228][SQL] Migrate on Java 8 time from Fa...

Posted by sergey-rubtsov <gi...@git.apache.org>.
Github user sergey-rubtsov commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21363#discussion_r189582909
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchemaSuite.scala ---
    @@ -59,13 +59,21 @@ class CSVInferSchemaSuite extends SparkFunSuite {
         assert(CSVInferSchema.inferField(IntegerType, textValueOne, options) == expectedTypeOne)
       }
     
    -  test("Timestamp field types are inferred correctly via custom data format") {
    -    var options = new CSVOptions(Map("timestampFormat" -> "yyyy-mm"), "GMT")
    +  test("Timestamp field types are inferred correctly via custom date format") {
    +    var options = new CSVOptions(Map("timestampFormat" -> "yyyy-MM"), "GMT")
    --- End diff --
    
    "yyyy-mm" means years and minutes, this is date format, this is time format
    yyyy-MM" means years and months, but I do not insist on this change


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21363: [SPARK-19228][SQL] Migrate on Java 8 time from FastDateF...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21363
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91599/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21363: [SPARK-19228][SQL] Migrate on Java 8 time from FastDateF...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/21363
  
    @MaxGekk now that your change is merge, can this proceed, @xuanyuanking ? or is it obsolete?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21363: [SPARK-19228][SQL] Migrate on Java 8 time from FastDateF...

Posted by sergey-rubtsov <gi...@git.apache.org>.
Github user sergey-rubtsov commented on the issue:

    https://github.com/apache/spark/pull/21363
  
    Previous pull request https://github.com/apache/spark/pull/20140 is closed


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21363: [SPARK-19228][SQL] Migrate on Java 8 time from FastDateF...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/21363
  
    @srowen There is another PR with related changes - inferring `DateType` from CSV: https://github.com/apache/spark/pull/23202


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21363: [SPARK-19228][SQL] Migrate on Java 8 time from FastDateF...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21363
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21363: [SPARK-19228][SQL] Migrate on Java 8 time from FastDateF...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21363
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21363: [SPARK-19228][SQL] Migrate on Java 8 time from Fa...

Posted by sergey-rubtsov <gi...@git.apache.org>.
Github user sergey-rubtsov commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21363#discussion_r189588135
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala ---
    @@ -140,14 +141,23 @@ private[csv] object CSVInferSchema {
       private def tryParseDouble(field: String, options: CSVOptions): DataType = {
         if ((allCatch opt field.toDouble).isDefined || isInfOrNan(field, options)) {
           DoubleType
    +    } else {
    +      tryParseDate(field, options)
    --- End diff --
    
    For example, by mistake we have identical "timestampFormat" and "dateFormat" options.
    Let it be "yyyy-MM-dd"
    'TimestampType' (8 bytes) is larger than 'DateType' (4 bytes)
    So if they can overlap, we need to try parse it as date firstly, because both of these types are suitable, but you need to try to use a more compact by default and it will be correct inferring of type


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21363: [SPARK-19228][SQL] Migrate on Java 8 time from FastDateF...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21363
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org