You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by softmanu <gi...@git.apache.org> on 2018/09/24 19:56:34 UTC
[GitHub] spark pull request #22539: detect date type in csv file
GitHub user softmanu opened a pull request:
https://github.com/apache/spark/pull/22539
detect date type in csv file
This fix is with reference to the below JIRA Issue which I've created just hours before:
[SPARK-25517](https://issues.apache.org/jira/browse/SPARK-25517)
This is about spark.read.format("csv").option("inferSchema", "true").option("dateFormat", "MM/dd/yyyy").load(/path/to/csvfile). Assume /path/to/csvfile has a column which contains just date information such as employee joining date, for example:- 02/22/2018 which is 22nd of feb 2018, is a **date** but the spark always incorrectly reads this joining_date column as **string**, whereas the same analogy works perfectly fine with timestampFormat or the timestamp column values in csv.
## What changes were proposed in this pull request?
to support for detecting date type from the csv files,
## How was this patch tested?
manual test
Please review http://spark.apache.org/contributing.html before opening a pull request.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/softmanu/spark master
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22539.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22539
----
commit e15a3722afe780f06c8f7079dbd734b3be2a8b70
Author: softmanu <26...@...>
Date: 2018-09-24T19:38:35Z
detect date type in csv file
This fix is with reference to the below JIRA Issue which I've created just hours before:
https://issues.apache.org/jira/browse/SPARK-25517
This is about spark.read.format("csv").option("inferSchema", "true").option("dateFormat", "MM/dd/yyyy").load(/path/to/csvfile). Assume /path/to/csvfile has date type column such as employee joining date, for example:- 02/22/2018 which is 22nd of feb 2018 is a date but the spark always read this joining_date column as string, whereas this works perfectly fine with timestampFormat.
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22539: detect date type in csv file
Posted by NiharS <gi...@git.apache.org>.
Github user NiharS commented on the issue:
https://github.com/apache/spark/pull/22539
Could you edit your title to include the jira number and component?
e.g. [SPARK-25517][Core] Detect ...
Helps with bookkeeping, plus it'll add a link to the jira so people can see your PR from there.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22539: detect date type in csv file
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22539
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22539: [SPARK-25517][SQL] Detect/Infer date type in CSV file
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22539
I think this is a duplicate of https://github.com/apache/spark/pull/21363
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22539: detect date type in csv file
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22539
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22539: [SPARK-25517][SQL] Detect/Infer date type in CSV file
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22539
Looks https://github.com/apache/spark/pull/21363 getting inactive. Can you take this over instead? You can pick up the commits there and open another PR.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22539: detect date type in csv file
Posted by softmanu <gi...@git.apache.org>.
Github user softmanu commented on the issue:
https://github.com/apache/spark/pull/22539
Hi,
Please review the changes for the bug which is described and documented here at the below JIRA location in detail:
https://issues.apache.org/jira/browse/SPARK-25517
Thanks,
Manoranjan
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22539: [SPARK-25517][SQL] Detect/Infer date type in CSV file
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/22539
Thank you for review, @HyukjinKwon .
@softmanu . Could you take a look at [SPARK-19228](https://github.com/apache/spark/pull/21363) and close this PR and Apache Spark JIRA?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22539: [SPARK-25517][SQL] Detect/Infer date type in CSV ...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/22539
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22539: [SPARK-25517][SQL] Detect/Infer date type in CSV file
Posted by softmanu <gi...@git.apache.org>.
Github user softmanu commented on the issue:
https://github.com/apache/spark/pull/22539
@dongjoon-hyun @HyukjinKwon
Hi,
i was not well whole last week, now I am back, so yes, thanks for reviewing and all the comments. whether my PR is a duplicate or not we can see out later, all I worry here is the fact that **it's not working as expected**, and the whole steps of execution I have explained/captured at granular level in a very well structured and detailed manner so that it could be easy to understand, under this JIRA SPARK-25517
And sure, I will add a test case, and work upon it.
P.S. I've found other different issues in spark same around date/timestamp which is not working at all because the implementation itself is missing totally. On this I will get back later, first let me resolve this current issue.
Thanks,
Manoranjan : )
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22539: [SPARK-25517][SQL] Detect/Infer date type in CSV file
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/22539
Ping, @softmanu .
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org