You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by softmanu <gi...@git.apache.org> on 2018/09/24 19:56:34 UTC

[GitHub] spark pull request #22539: detect date type in csv file

GitHub user softmanu opened a pull request:

    https://github.com/apache/spark/pull/22539

    detect date type in csv file

    This fix is with reference to the below JIRA Issue which I've created just hours before:
    
    [SPARK-25517](https://issues.apache.org/jira/browse/SPARK-25517)
    
    This is about spark.read.format("csv").option("inferSchema", "true").option("dateFormat", "MM/dd/yyyy").load(/path/to/csvfile). Assume /path/to/csvfile has a column which contains just date information such as employee joining date, for example:- 02/22/2018 which is 22nd of feb 2018, is a **date** but the spark always incorrectly reads this joining_date column as **string**, whereas the same analogy works perfectly fine with timestampFormat or the timestamp column values in csv.
    
    ## What changes were proposed in this pull request?
    
    to support for detecting date type from the csv files,
    
    ## How was this patch tested?
    
    manual test
    
    Please review http://spark.apache.org/contributing.html before opening a pull request.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/softmanu/spark master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22539.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22539
    
----
commit e15a3722afe780f06c8f7079dbd734b3be2a8b70
Author: softmanu <26...@...>
Date:   2018-09-24T19:38:35Z

    detect date type in csv file
    
    This fix is with reference to the below JIRA Issue which I've created just hours before:
    
    https://issues.apache.org/jira/browse/SPARK-25517
    
    This is about spark.read.format("csv").option("inferSchema", "true").option("dateFormat", "MM/dd/yyyy").load(/path/to/csvfile). Assume /path/to/csvfile has date type column such as employee joining date, for example:- 02/22/2018 which is 22nd of feb 2018 is a date but the spark always read this joining_date column as string, whereas this works perfectly fine with timestampFormat.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22539: detect date type in csv file

Posted by NiharS <gi...@git.apache.org>.
Github user NiharS commented on the issue:

    https://github.com/apache/spark/pull/22539
  
    Could you edit your title to include the jira number and component?
    
    e.g. [SPARK-25517][Core] Detect ...
    
    Helps with bookkeeping, plus it'll add a link to the jira so people can see your PR from there.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22539: detect date type in csv file

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22539
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22539: [SPARK-25517][SQL] Detect/Infer date type in CSV file

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22539
  
    I think this is a duplicate of https://github.com/apache/spark/pull/21363


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22539: detect date type in csv file

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22539
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22539: [SPARK-25517][SQL] Detect/Infer date type in CSV file

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22539
  
    Looks https://github.com/apache/spark/pull/21363 getting inactive. Can you take this over instead? You can pick up the commits there and open another PR.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22539: detect date type in csv file

Posted by softmanu <gi...@git.apache.org>.
Github user softmanu commented on the issue:

    https://github.com/apache/spark/pull/22539
  
    Hi, 
    
    Please review the changes for the bug which is described and documented here at the below JIRA location in detail:
    
    https://issues.apache.org/jira/browse/SPARK-25517
    
    Thanks,
    Manoranjan


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22539: [SPARK-25517][SQL] Detect/Infer date type in CSV file

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/22539
  
    Thank you for review, @HyukjinKwon .
    
    @softmanu . Could you take a look at [SPARK-19228](https://github.com/apache/spark/pull/21363) and close this PR and Apache Spark JIRA?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22539: [SPARK-25517][SQL] Detect/Infer date type in CSV ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/22539


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22539: [SPARK-25517][SQL] Detect/Infer date type in CSV file

Posted by softmanu <gi...@git.apache.org>.
Github user softmanu commented on the issue:

    https://github.com/apache/spark/pull/22539
  
    @dongjoon-hyun @HyukjinKwon 
    Hi,
    i was not well whole last week, now I am back, so yes, thanks for reviewing and all the comments. whether my PR is a duplicate or not we can see out later, all I worry here is the fact that **it's not working as expected**, and the whole steps of execution I have explained/captured at granular level in a very well structured and detailed manner so that it could be easy to understand, under this JIRA SPARK-25517
    
    And sure, I will add a test case, and work upon it.
    
    P.S. I've found other different issues in spark same around date/timestamp which is not working at all because the implementation itself is missing totally. On this I will get back later, first let me resolve this current issue.
    
    Thanks,
    Manoranjan : )



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22539: [SPARK-25517][SQL] Detect/Infer date type in CSV file

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/22539
  
    Ping, @softmanu .


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org