You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by MaxGekk <gi...@git.apache.org> on 2018/11/05 19:53:40 UTC
[GitHub] spark pull request #22951: [SPARK-25945][SQL] Support locale while parsing d...
GitHub user MaxGekk opened a pull request:
https://github.com/apache/spark/pull/22951
[SPARK-25945][SQL] Support locale while parsing date/timestamp from CSV/JSON
## What changes were proposed in this pull request?
In the PR, I propose to add new option `locale` into CSVOptions/JSONOptions to make parsing date/timestamps in local languages possible. Currently the locale is hard coded to `Locale.US`.
## How was this patch tested?
Added two tests for parsing a date from CSV/JSON - `ноя 2018`.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/MaxGekk/spark-1 locale
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22951.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22951
----
commit c71cd4f219cfdca9bbc85305782ce0d7d9215dcf
Author: Maxim Gekk <ma...@...>
Date: 2018-11-05T19:09:32Z
Added a test for from_csv
commit e55a3d326cc320b8d0223697f87c2a701f515a2c
Author: Maxim Gekk <ma...@...>
Date: 2018-11-05T19:28:49Z
lang -> langTag
commit 83c6317b2a9b7b0696da6bdc37e1d49dcf994687
Author: Maxim Gekk <ma...@...>
Date: 2018-11-05T19:32:05Z
Test for from_json
commit fa019ec0c9b3cb02cb9abf21597bffb8c337197a
Author: Maxim Gekk <ma...@...>
Date: 2018-11-05T19:40:53Z
Added locale option for JSON and CSV
commit 41154bdce8e61bea208772bbe948b71b23220e8d
Author: Maxim Gekk <ma...@...>
Date: 2018-11-05T19:43:34Z
Fix imports
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22951
**[Test build #98543 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98543/testReport)** for PR 22951 at commit [`6ab8501`](https://github.com/apache/spark/commit/6ab850164182565c2cd8cffe99f5c4bb09ead660).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22951
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98541/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/22951
Could you take a look once more, @HyukjinKwon ?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22951
**[Test build #98541 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98541/testReport)** for PR 22951 at commit [`6ab8501`](https://github.com/apache/spark/commit/6ab850164182565c2cd8cffe99f5c4bb09ead660).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22951
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98489/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22951
**[Test build #98507 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98507/testReport)** for PR 22951 at commit [`93da760`](https://github.com/apache/spark/commit/93da7604c2d74f97b12e9210f0bcaf038774ea41).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22951
**[Test build #98583 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98583/testReport)** for PR 22951 at commit [`8834b4b`](https://github.com/apache/spark/commit/8834b4b804f99d2a31654a4700359bb4f32e6dba).
* This patch **fails due to an unknown error code, -9**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22951: [SPARK-25945][SQL] Support locale while parsing d...
Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/22951#discussion_r231832597
--- Diff: python/pyspark/sql/readwriter.py ---
@@ -446,6 +450,9 @@ def csv(self, path, schema=None, sep=None, encoding=None, quote=None, escape=Non
If None is set, it uses the default value, ``1.0``.
:param emptyValue: sets the string representation of an empty value. If None is set, it uses
the default value, empty string.
+ :param locale: sets a locale as language tag in IETF BCP 47 format. If None is set,
+ it uses the default value, ``en-US``. For instance, ``locale`` is used while
+ parsing dates and timestamps.
--- End diff --
It seems parsing decimals using `locale` will be slightly tricky in JSON case because we leave this to Jackson by calling its method `getCurrentToken` and `getDecimalValue`, and I haven't found how to pass locale to it. Probably we will need a custom deserialiser?
In the CSV case, it should be easier since we convert strings ourselves. I will try to do that for CSV first of all when this PR be merged.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22951
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98567/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22951: [SPARK-25945][SQL] Support locale while parsing d...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/22951#discussion_r231641960
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala ---
@@ -578,4 +581,20 @@ class JsonFunctionsSuite extends QueryTest with SharedSQLContext {
"Acceptable modes are PERMISSIVE and FAILFAST."))
}
}
+
+ test("use locale while parsing timestamps") {
+ Seq("en-US", "ko-KR", "zh-CN", "ru-RU").foreach { langTag =>
+ val locale = Locale.forLanguageTag(langTag)
+ val ts = new SimpleDateFormat("dd/MM/yyyy HH:mm").parse("06/11/2018 18:00")
+ val timestampFormat = "dd MMM yyyy HH:mm"
+ val sdf = new SimpleDateFormat(timestampFormat, locale)
+ val input = Seq(s"""{"time": "${sdf.format(ts)}"}""").toDS()
+ val schema = new StructType().add("time", TimestampType)
+ val options = Map("timestampFormat" -> timestampFormat, "locale" -> langTag)
+ val df = input.select(from_json($"value", schema, options))
--- End diff --
This one can be simplified like this, too.
```scala
- val schema = new StructType().add("time", TimestampType)
val options = Map("timestampFormat" -> timestampFormat, "locale" -> langTag)
- val df = input.select(from_json($"value", schema, options))
+ val df = input.select(from_json($"value", "time timestamp", options))
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22951
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22951
Merged to master.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22951
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22951
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98587/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22951: [SPARK-25945][SQL] Support locale while parsing d...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/22951#discussion_r231646713
--- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CsvExpressionsSuite.scala ---
@@ -209,4 +210,20 @@ class CsvExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper with P
"2015-12-31T16:00:00"
)
}
+
+ test("take into account locale while parsing date") {
--- End diff --
nit. Can we use more simple test case names like the following?
```
`take into account locale while parsing date` -> `parse date with locale`
`use locale while parsing timestamps` -> `parse timestamps with locale`
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22951
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22951
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98583/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/22951
Could you rebase this once again, @MaxGekk ?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22951
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98507/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22951: [SPARK-25945][SQL] Support locale while parsing d...
Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/22951#discussion_r231800213
--- Diff: python/pyspark/sql/readwriter.py ---
@@ -349,7 +353,7 @@ def csv(self, path, schema=None, sep=None, encoding=None, quote=None, escape=Non
negativeInf=None, dateFormat=None, timestampFormat=None, maxColumns=None,
maxCharsPerColumn=None, maxMalformedLogPerPartition=None, mode=None,
columnNameOfCorruptRecord=None, multiLine=None, charToEscapeQuoteEscaping=None,
- samplingRatio=None, enforceSchema=None, emptyValue=None):
+ samplingRatio=None, enforceSchema=None, emptyValue=None, locale=None):
--- End diff --
It seems it exists in `streaming.py`: https://github.com/apache/spark/blob/08c76b5d39127ae207d9d1fff99c2551e6ce2581/python/pyspark/sql/streaming.py#L567
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22951
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22951
**[Test build #98543 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98543/testReport)** for PR 22951 at commit [`6ab8501`](https://github.com/apache/spark/commit/6ab850164182565c2cd8cffe99f5c4bb09ead660).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22951: [SPARK-25945][SQL] Support locale while parsing d...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22951#discussion_r231775987
--- Diff: python/pyspark/sql/readwriter.py ---
@@ -446,6 +450,9 @@ def csv(self, path, schema=None, sep=None, encoding=None, quote=None, escape=Non
If None is set, it uses the default value, ``1.0``.
:param emptyValue: sets the string representation of an empty value. If None is set, it uses
the default value, empty string.
+ :param locale: sets a locale as language tag in IETF BCP 47 format. If None is set,
+ it uses the default value, ``en-US``. For instance, ``locale`` is used while
+ parsing dates and timestamps.
--- End diff --
I think ideally we should apply to decimal parsing too actually. But yea we can leave it separate.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...
Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/22951
> OMG, what does ноя 2018 mean BTW? haha
It is 3 letters prefix of `Ноябрь` which is November in Russian.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22951
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98543/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22951
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...
Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/22951
jenkins, retest this, please
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22951
**[Test build #98567 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98567/testReport)** for PR 22951 at commit [`759bca6`](https://github.com/apache/spark/commit/759bca62903b8624dda91ef081e93cd4a30969fe).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22951
**[Test build #98489 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98489/testReport)** for PR 22951 at commit [`41154bd`](https://github.com/apache/spark/commit/41154bdce8e61bea208772bbe948b71b23220e8d).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...
Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/22951
I will update docs soon.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22951
**[Test build #98507 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98507/testReport)** for PR 22951 at commit [`93da760`](https://github.com/apache/spark/commit/93da7604c2d74f97b12e9210f0bcaf038774ea41).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22951
**[Test build #98587 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98587/testReport)** for PR 22951 at commit [`8834b4b`](https://github.com/apache/spark/commit/8834b4b804f99d2a31654a4700359bb4f32e6dba).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22951
**[Test build #98587 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98587/testReport)** for PR 22951 at commit [`8834b4b`](https://github.com/apache/spark/commit/8834b4b804f99d2a31654a4700359bb4f32e6dba).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22951
retest this please
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22951: [SPARK-25945][SQL] Support locale while parsing d...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22951#discussion_r231776396
--- Diff: python/pyspark/sql/readwriter.py ---
@@ -267,7 +270,8 @@ def json(self, path, schema=None, primitivesAsString=None, prefersDecimal=None,
mode=mode, columnNameOfCorruptRecord=columnNameOfCorruptRecord, dateFormat=dateFormat,
timestampFormat=timestampFormat, multiLine=multiLine,
allowUnquotedControlChars=allowUnquotedControlChars, lineSep=lineSep,
- samplingRatio=samplingRatio, dropFieldIfAllNull=dropFieldIfAllNull, encoding=encoding)
+ samplingRatio=samplingRatio, dropFieldIfAllNull=dropFieldIfAllNull, encoding=encoding,
+ locale=locale)
--- End diff --
@MaxGekk, looks `sql/streaming.py` is missed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22951
**[Test build #98583 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98583/testReport)** for PR 22951 at commit [`8834b4b`](https://github.com/apache/spark/commit/8834b4b804f99d2a31654a4700359bb4f32e6dba).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22951: [SPARK-25945][SQL] Support locale while parsing d...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/22951
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22951
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22951
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22951: [SPARK-25945][SQL] Support locale while parsing d...
Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/22951#discussion_r231800381
--- Diff: python/pyspark/sql/readwriter.py ---
@@ -267,7 +270,8 @@ def json(self, path, schema=None, primitivesAsString=None, prefersDecimal=None,
mode=mode, columnNameOfCorruptRecord=columnNameOfCorruptRecord, dateFormat=dateFormat,
timestampFormat=timestampFormat, multiLine=multiLine,
allowUnquotedControlChars=allowUnquotedControlChars, lineSep=lineSep,
- samplingRatio=samplingRatio, dropFieldIfAllNull=dropFieldIfAllNull, encoding=encoding)
+ samplingRatio=samplingRatio, dropFieldIfAllNull=dropFieldIfAllNull, encoding=encoding,
--- End diff --
https://github.com/apache/spark/pull/22973
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...
Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/22951
@HyukjinKwon @dongjoon-hyun Please, review the changes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22951
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22951
**[Test build #98567 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98567/testReport)** for PR 22951 at commit [`759bca6`](https://github.com/apache/spark/commit/759bca62903b8624dda91ef081e93cd4a30969fe).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22951
**[Test build #98598 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98598/testReport)** for PR 22951 at commit [`8834b4b`](https://github.com/apache/spark/commit/8834b4b804f99d2a31654a4700359bb4f32e6dba).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22951
**[Test build #98598 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98598/testReport)** for PR 22951 at commit [`8834b4b`](https://github.com/apache/spark/commit/8834b4b804f99d2a31654a4700359bb4f32e6dba).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22951
**[Test build #98591 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98591/testReport)** for PR 22951 at commit [`8834b4b`](https://github.com/apache/spark/commit/8834b4b804f99d2a31654a4700359bb4f32e6dba).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22951
**[Test build #98541 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98541/testReport)** for PR 22951 at commit [`6ab8501`](https://github.com/apache/spark/commit/6ab850164182565c2cd8cffe99f5c4bb09ead660).
* This patch **fails due to an unknown error code, -9**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22951
Actually let me leave a cc for @srowen. I remember we talked about it before.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22951
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98591/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22951
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22951
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98598/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22951: [SPARK-25945][SQL] Support locale while parsing d...
Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/22951#discussion_r231873877
--- Diff: python/pyspark/sql/readwriter.py ---
@@ -446,6 +450,9 @@ def csv(self, path, schema=None, sep=None, encoding=None, quote=None, escape=Non
If None is set, it uses the default value, ``1.0``.
:param emptyValue: sets the string representation of an empty value. If None is set, it uses
the default value, empty string.
+ :param locale: sets a locale as language tag in IETF BCP 47 format. If None is set,
+ it uses the default value, ``en-US``. For instance, ``locale`` is used while
+ parsing dates and timestamps.
--- End diff --
Here is the PR for parsing decimals from CSV: https://github.com/apache/spark/pull/22979
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22951
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22951
retest this please
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22951
Looks good. I or someone else should take a closer look before getting this in.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22951: [SPARK-25945][SQL] Support locale while parsing d...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22951#discussion_r231776739
--- Diff: python/pyspark/sql/readwriter.py ---
@@ -349,7 +353,7 @@ def csv(self, path, schema=None, sep=None, encoding=None, quote=None, escape=Non
negativeInf=None, dateFormat=None, timestampFormat=None, maxColumns=None,
maxCharsPerColumn=None, maxMalformedLogPerPartition=None, mode=None,
columnNameOfCorruptRecord=None, multiLine=None, charToEscapeQuoteEscaping=None,
- samplingRatio=None, enforceSchema=None, emptyValue=None):
+ samplingRatio=None, enforceSchema=None, emptyValue=None, locale=None):
--- End diff --
Let's add `emptyValue` in `streaming.py` in the same separate PR.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22951
**[Test build #98489 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98489/testReport)** for PR 22951 at commit [`41154bd`](https://github.com/apache/spark/commit/41154bdce8e61bea208772bbe948b71b23220e8d).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22951
OMG, what does `ноя 2018` mean BTW? haha
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22951: [SPARK-25945][SQL] Support locale while parsing d...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22951#discussion_r231776568
--- Diff: python/pyspark/sql/readwriter.py ---
@@ -267,7 +270,8 @@ def json(self, path, schema=None, primitivesAsString=None, prefersDecimal=None,
mode=mode, columnNameOfCorruptRecord=columnNameOfCorruptRecord, dateFormat=dateFormat,
timestampFormat=timestampFormat, multiLine=multiLine,
allowUnquotedControlChars=allowUnquotedControlChars, lineSep=lineSep,
- samplingRatio=samplingRatio, dropFieldIfAllNull=dropFieldIfAllNull, encoding=encoding)
+ samplingRatio=samplingRatio, dropFieldIfAllNull=dropFieldIfAllNull, encoding=encoding,
--- End diff --
@MaxGekk, let's also add `dropFieldIfAllNull` and `encoding` in `sql/streaming.py` in a separate PR.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22951
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22951: [SPARK-25945][SQL] Support locale while parsing d...
Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/22951#discussion_r231640870
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/CsvFunctionsSuite.scala ---
@@ -117,4 +120,20 @@ class CsvFunctionsSuite extends QueryTest with SharedSQLContext {
"Acceptable modes are PERMISSIVE and FAILFAST."))
}
}
+
+ test("use locale while parsing timestamps") {
+ Seq("en-US", "ko-KR", "zh-CN", "ru-RU").foreach { langTag =>
+ val locale = Locale.forLanguageTag(langTag)
+ val ts = new SimpleDateFormat("dd/MM/yyyy HH:mm").parse("06/11/2018 18:00")
+ val timestampFormat = "dd MMM yyyy HH:mm"
+ val sdf = new SimpleDateFormat(timestampFormat, locale)
+ val input = Seq(s"""${sdf.format(ts)}""").toDS()
+ val schema = new StructType().add("time", TimestampType)
+ val options = Map("timestampFormat" -> timestampFormat, "locale" -> langTag)
+ val df = input.select(from_csv($"value", schema, options))
--- End diff --
Can we simplify like this?
```scala
- val schema = new StructType().add("time", TimestampType)
val options = Map("timestampFormat" -> timestampFormat, "locale" -> langTag)
- val df = input.select(from_csv($"value", schema, options))
+ val df = input.select(from_csv($"value", lit("time timestamp"), options.asJava))
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22951
**[Test build #98591 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98591/testReport)** for PR 22951 at commit [`8834b4b`](https://github.com/apache/spark/commit/8834b4b804f99d2a31654a4700359bb4f32e6dba).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...
Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/22951
jenkins, retest this, please
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org