You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by sergey-rubtsov <gi...@git.apache.org> on 2017/01/29 21:05:58 UTC

[GitHub] spark pull request #16735: [SPARK-19228][SQL] Introduce tryParseDate method ...

GitHub user sergey-rubtsov opened a pull request:

    https://github.com/apache/spark/pull/16735

    [SPARK-19228][SQL] Introduce tryParseDate method to process csv date \u2026

    \u2026column with custom format as date
    
    ## What changes were proposed in this pull request?
    
    (Please fill in changes proposed in this fix)
    
    ## How was this patch tested?
    
    (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
    (If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
    
    Please review http://spark.apache.org/contributing.html before opening a pull request.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/sergey-rubtsov/spark master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16735.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16735
    
----
commit 8f78e1129efe8277c190edd5016c1e06b5aeef65
Author: Sergey Rubtsov <se...@gmail.com>
Date:   2017-01-28T20:21:55Z

    [SPARK-19228][SQL] Introduce tryParseDate method to process csv date column with custom format as date

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16735: [SPARK-19228][SQL] Introduce tryParseDate method to proc...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16735
  
    **[Test build #77975 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77975/testReport)** for PR 16735 at commit [`3e250f5`](https://github.com/apache/spark/commit/3e250f56ee5f3a5c1ce5542d56670973233e62b7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16735: [SPARK-19228][SQL] Introduce tryParseDate method ...

Posted by sergey-rubtsov <gi...@git.apache.org>.
Github user sergey-rubtsov closed the pull request at:

    https://github.com/apache/spark/pull/16735


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16735: [SPARK-19228][SQL] Introduce tryParseDate method to proc...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16735
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16735: [SPARK-19228][SQL] Introduce tryParseDate method to proc...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16735
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16735: [SPARK-19228][SQL] Introduce tryParseDate method to proc...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16735
  
    **[Test build #72142 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72142/testReport)** for PR 16735 at commit [`8f78e11`](https://github.com/apache/spark/commit/8f78e1129efe8277c190edd5016c1e06b5aeef65).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16735: [SPARK-19228][SQL] Introduce tryParseDate method to proc...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16735
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16735: [SPARK-19228][SQL] Introduce tryParseDate method to proc...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16735
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16735: [SPARK-19228][SQL] Introduce tryParseDate method to proc...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16735
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72142/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16735: [SPARK-19228][SQL] Introduce tryParseDate method ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16735#discussion_r98795601
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala ---
    @@ -140,12 +137,21 @@ private[csv] object CSVInferSchema {
         }
       }
     
    +  private def tryParseDate(field: String, options: CSVOptions): DataType = {
    +    // This case infers a custom `dateFormat` is set.
    +    if ((allCatch opt options.dateFormat.parse(field)).isDefined) {
    +      DateType
    +    } else {
    +      tryParseTimestamp(field, options)
    +    }
    +  }
    +
       private def tryParseTimestamp(field: String, options: CSVOptions): DataType = {
    -    // This case infers a custom `dataFormat` is set.
    +    // This case infers a custom `timestampFormat` is set.
         if ((allCatch opt options.timestampFormat.parse(field)).isDefined) {
           TimestampType
         } else if ((allCatch opt DateTimeUtils.stringToTime(field)).isDefined) {
    -      // We keep this for backwords competibility.
    +      // We keep this for backwards compatibility.
           TimestampType
         } else {
           tryParseBoolean(field, options)
    --- End diff --
    
    Ah, I just checked that changing that line causes test failures. The reason seems the default format in `FastDateFormat`, `yyyy-MM-dd` seems parsing the input , `2015-08-20 15:57:00` fine if we change that line.
    
    But if we change this line, then, It seems we are inferring `TimestampType` from `yyyy-MM-dd` in `DateTimeUtils.stringToTime(field)`, which seems not respecting the default format in `dateFormat`, assuming you meant trying `TimestampType` first, which sounds a bit odd to try wider type first.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16735: [SPARK-19228][SQL] Introduce tryParseDate method to proc...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/16735
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16735: [SPARK-19228][SQL] Introduce tryParseDate method to proc...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16735
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73766/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16735: [SPARK-19228][SQL] Introduce tryParseDate method to proc...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16735
  
    **[Test build #73766 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73766/testReport)** for PR 16735 at commit [`ae14f12`](https://github.com/apache/spark/commit/ae14f1275fc8a4dad06508750a52db3c73f63f10).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16735: [SPARK-19228][SQL] Introduce tryParseDate method to proc...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16735
  
    **[Test build #73766 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73766/testReport)** for PR 16735 at commit [`ae14f12`](https://github.com/apache/spark/commit/ae14f1275fc8a4dad06508750a52db3c73f63f10).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16735: [SPARK-19228][SQL] Introduce tryParseDate method to proc...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16735
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16735: [SPARK-19228][SQL] Introduce tryParseDate method ...

Posted by sergey-rubtsov <gi...@git.apache.org>.
Github user sergey-rubtsov commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16735#discussion_r98523879
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala ---
    @@ -140,12 +137,21 @@ private[csv] object CSVInferSchema {
         }
       }
     
    +  private def tryParseDate(field: String, options: CSVOptions): DataType = {
    +    // This case infers a custom `dateFormat` is set.
    +    if ((allCatch opt options.dateFormat.parse(field)).isDefined) {
    +      DateType
    +    } else {
    +      tryParseTimestamp(field, options)
    +    }
    +  }
    +
       private def tryParseTimestamp(field: String, options: CSVOptions): DataType = {
    -    // This case infers a custom `dataFormat` is set.
    +    // This case infers a custom `timestampFormat` is set.
         if ((allCatch opt options.timestampFormat.parse(field)).isDefined) {
           TimestampType
         } else if ((allCatch opt DateTimeUtils.stringToTime(field)).isDefined) {
    -      // We keep this for backwords competibility.
    +      // We keep this for backwards compatibility.
           TimestampType
         } else {
           tryParseBoolean(field, options)
    --- End diff --
    
    okey, I will test it again, thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16735: [SPARK-19228][SQL] Introduce tryParseDate method to proc...

Posted by sergey-rubtsov <gi...@git.apache.org>.
Github user sergey-rubtsov commented on the issue:

    https://github.com/apache/spark/pull/16735
  
    Couldn't run tests in CSVSuite locally on my Windows OS, apologize for the possible test fails


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16735: [SPARK-19228][SQL] Introduce tryParseDate method to proc...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16735
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16735: [SPARK-19228][SQL] Introduce tryParseDate method to proc...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/16735
  
    @sergey-rubtsov this PR seems pretty broken... probably needs to be closed and a new one opened.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16735: [SPARK-19228][SQL] Introduce tryParseDate method to proc...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16735
  
    **[Test build #77998 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77998/testReport)** for PR 16735 at commit [`3e250f5`](https://github.com/apache/spark/commit/3e250f56ee5f3a5c1ce5542d56670973233e62b7).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16735: [SPARK-19228][SQL] Introduce tryParseDate method to proc...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16735
  
    **[Test build #77998 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77998/testReport)** for PR 16735 at commit [`3e250f5`](https://github.com/apache/spark/commit/3e250f56ee5f3a5c1ce5542d56670973233e62b7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16735: [SPARK-19228][SQL] Introduce tryParseDate method to proc...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/16735
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16735: [SPARK-19228][SQL] Introduce tryParseDate method to proc...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16735
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16735: [SPARK-19228][SQL] Introduce tryParseDate method to proc...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/16735
  
    ping @sergey-rubtsov 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16735: [SPARK-19228][SQL] Introduce tryParseDate method to proc...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/16735
  
    ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16735: [SPARK-19228][SQL] Introduce tryParseDate method ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16735#discussion_r98729386
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala ---
    @@ -140,12 +137,21 @@ private[csv] object CSVInferSchema {
         }
       }
     
    +  private def tryParseDate(field: String, options: CSVOptions): DataType = {
    +    // This case infers a custom `dateFormat` is set.
    +    if ((allCatch opt options.dateFormat.parse(field)).isDefined) {
    +      DateType
    +    } else {
    +      tryParseTimestamp(field, options)
    +    }
    +  }
    +
       private def tryParseTimestamp(field: String, options: CSVOptions): DataType = {
    -    // This case infers a custom `dataFormat` is set.
    +    // This case infers a custom `timestampFormat` is set.
         if ((allCatch opt options.timestampFormat.parse(field)).isDefined) {
           TimestampType
         } else if ((allCatch opt DateTimeUtils.stringToTime(field)).isDefined) {
    -      // We keep this for backwords competibility.
    +      // We keep this for backwards compatibility.
           TimestampType
         } else {
           tryParseBoolean(field, options)
    --- End diff --
    
    @HyukjinKwon That does not work correctly, if we put the change there. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16735: [SPARK-19228][SQL] Introduce tryParseDate method to proc...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16735
  
    **[Test build #72142 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72142/testReport)** for PR 16735 at commit [`8f78e11`](https://github.com/apache/spark/commit/8f78e1129efe8277c190edd5016c1e06b5aeef65).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16735: [SPARK-19228][SQL] Introduce tryParseDate method ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16735#discussion_r98661043
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala ---
    @@ -140,12 +137,21 @@ private[csv] object CSVInferSchema {
         }
       }
     
    +  private def tryParseDate(field: String, options: CSVOptions): DataType = {
    +    // This case infers a custom `dateFormat` is set.
    +    if ((allCatch opt options.dateFormat.parse(field)).isDefined) {
    +      DateType
    +    } else {
    +      tryParseTimestamp(field, options)
    +    }
    +  }
    +
       private def tryParseTimestamp(field: String, options: CSVOptions): DataType = {
    -    // This case infers a custom `dataFormat` is set.
    +    // This case infers a custom `timestampFormat` is set.
         if ((allCatch opt options.timestampFormat.parse(field)).isDefined) {
           TimestampType
         } else if ((allCatch opt DateTimeUtils.stringToTime(field)).isDefined) {
    -      // We keep this for backwords competibility.
    +      // We keep this for backwards compatibility.
           TimestampType
         } else {
           tryParseBoolean(field, options)
    --- End diff --
    
    (Maybe, you meant L136)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16735: [SPARK-19228][SQL] Introduce tryParseDate method to proc...

Posted by sergey-rubtsov <gi...@git.apache.org>.
Github user sergey-rubtsov commented on the issue:

    https://github.com/apache/spark/pull/16735
  
    Hi @HyukjinKwon, @gatorsmile 
    Could you take a look, please?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16735: [SPARK-19228][SQL] Introduce tryParseDate method to proc...

Posted by sergey-rubtsov <gi...@git.apache.org>.
Github user sergey-rubtsov commented on the issue:

    https://github.com/apache/spark/pull/16735
  
    I wil try to complete it in this month


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16735: [SPARK-19228][SQL] Introduce tryParseDate method ...

Posted by sergey-rubtsov <gi...@git.apache.org>.
Github user sergey-rubtsov closed the pull request at:

    https://github.com/apache/spark/pull/16735


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16735: [SPARK-19228][SQL] Introduce tryParseDate method ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16735#discussion_r98386122
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala ---
    @@ -140,12 +137,21 @@ private[csv] object CSVInferSchema {
         }
       }
     
    +  private def tryParseDate(field: String, options: CSVOptions): DataType = {
    +    // This case infers a custom `dateFormat` is set.
    +    if ((allCatch opt options.dateFormat.parse(field)).isDefined) {
    +      DateType
    +    } else {
    +      tryParseTimestamp(field, options)
    +    }
    +  }
    +
       private def tryParseTimestamp(field: String, options: CSVOptions): DataType = {
    -    // This case infers a custom `dataFormat` is set.
    +    // This case infers a custom `timestampFormat` is set.
         if ((allCatch opt options.timestampFormat.parse(field)).isDefined) {
           TimestampType
         } else if ((allCatch opt DateTimeUtils.stringToTime(field)).isDefined) {
    -      // We keep this for backwords competibility.
    +      // We keep this for backwards compatibility.
           TimestampType
         } else {
           tryParseBoolean(field, options)
    --- End diff --
    
    You need to change this line. : ) Otherwise, when we infer the schema, we will not enter `tryParseDate`. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16735: [SPARK-19228][SQL] Introduce tryParseDate method to proc...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/16735
  
    BTW, @sergey-rubtsov, could you check if we should add a type-widening rule in `findTightestCommonType` between `DateType` and `TimestampType`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16735: [SPARK-19228][SQL] Introduce tryParseDate method to proc...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/16735
  
    I checked your JIRA description. Your test case does not cover the scenario you mentioned. You can add an end-to-end test case by following the existing test case [`Load date types via custom date format`](https://github.com/databricks/spark-csv/blob/master/src/test/scala/com/databricks/spark/csv/CsvSuite.scala#L831-L857)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16735: [SPARK-19228][SQL] Introduce tryParseDate method to proc...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16735
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77998/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16735: [SPARK-19228][SQL] Introduce tryParseDate method to proc...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/16735
  
    The initial data type is NullType when we infer the schema. Let me know if you still hit an issue. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16735: [SPARK-19228][SQL] Introduce tryParseDate method to proc...

Posted by sergey-rubtsov <gi...@git.apache.org>.
Github user sergey-rubtsov commented on the issue:

    https://github.com/apache/spark/pull/16735
  
    @HyukjinKwon yes, sure, I will check it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16735: [SPARK-19228][SQL] Introduce tryParseDate method to proc...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16735
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77975/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16735: [SPARK-19228][SQL] Introduce tryParseDate method ...

Posted by sergey-rubtsov <gi...@git.apache.org>.
GitHub user sergey-rubtsov reopened a pull request:

    https://github.com/apache/spark/pull/16735

    [SPARK-19228][SQL] Introduce tryParseDate method to process csv date \u2026

    \u2026column with custom format as date
    
    ## What changes were proposed in this pull request?
    
    This patch fixes bugs:
    
    1) All the dates parsed as timestamps.
    2) Option "dateFormat" is ignored when read csv files with date data. 
    Instead of this option default date format ("yyyy-MM-dd") is using.
    
    For other details, please, read the ticket
    https://issues.apache.org/jira/browse/SPARK-19228
    
    ## How was this patch tested?
    
    Tested with unit tests only. Add new test


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/sergey-rubtsov/spark master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16735.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16735
    
----
commit 287005809cec5388dcb75a3d99bc0f0461b9bb69
Author: sergei.rubtcov <se...@accenture.com>
Date:   2017-03-10T13:03:27Z

    [SPARK-19228][SQL] Introduce tryParseDate method to process csv date, add a type-widening rule in findTightestCommonType between DateType and TimestampType, add an end-to-end test case

commit 3e250f56ee5f3a5c1ce5542d56670973233e62b7
Author: sergei.rubtcov <se...@accenture.com>
Date:   2017-03-13T14:13:17Z

    Merge branch 'master' of https://github.com/apache/spark

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org