You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by MaxGekk <gi...@git.apache.org> on 2018/11/05 19:53:40 UTC

[GitHub] spark pull request #22951: [SPARK-25945][SQL] Support locale while parsing d...

GitHub user MaxGekk opened a pull request:

    https://github.com/apache/spark/pull/22951

    [SPARK-25945][SQL] Support locale while parsing date/timestamp from CSV/JSON

    ## What changes were proposed in this pull request?
    
    In the PR, I propose to add new option `locale` into CSVOptions/JSONOptions to make parsing date/timestamps in local languages possible. Currently the locale is hard coded to `Locale.US`. 
    
    ## How was this patch tested?
    
    Added two tests for parsing a date from CSV/JSON - `ноя 2018`.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/MaxGekk/spark-1 locale

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22951.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22951
    
----
commit c71cd4f219cfdca9bbc85305782ce0d7d9215dcf
Author: Maxim Gekk <ma...@...>
Date:   2018-11-05T19:09:32Z

    Added a test for from_csv

commit e55a3d326cc320b8d0223697f87c2a701f515a2c
Author: Maxim Gekk <ma...@...>
Date:   2018-11-05T19:28:49Z

    lang -> langTag

commit 83c6317b2a9b7b0696da6bdc37e1d49dcf994687
Author: Maxim Gekk <ma...@...>
Date:   2018-11-05T19:32:05Z

    Test for from_json

commit fa019ec0c9b3cb02cb9abf21597bffb8c337197a
Author: Maxim Gekk <ma...@...>
Date:   2018-11-05T19:40:53Z

    Added locale option for JSON and CSV

commit 41154bdce8e61bea208772bbe948b71b23220e8d
Author: Maxim Gekk <ma...@...>
Date:   2018-11-05T19:43:34Z

    Fix imports

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22951
  
    **[Test build #98543 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98543/testReport)** for PR 22951 at commit [`6ab8501`](https://github.com/apache/spark/commit/6ab850164182565c2cd8cffe99f5c4bb09ead660).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22951
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98541/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/22951
  
    Could you take a look once more, @HyukjinKwon ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22951
  
    **[Test build #98541 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98541/testReport)** for PR 22951 at commit [`6ab8501`](https://github.com/apache/spark/commit/6ab850164182565c2cd8cffe99f5c4bb09ead660).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22951
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98489/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22951
  
    **[Test build #98507 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98507/testReport)** for PR 22951 at commit [`93da760`](https://github.com/apache/spark/commit/93da7604c2d74f97b12e9210f0bcaf038774ea41).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22951
  
    **[Test build #98583 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98583/testReport)** for PR 22951 at commit [`8834b4b`](https://github.com/apache/spark/commit/8834b4b804f99d2a31654a4700359bb4f32e6dba).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22951: [SPARK-25945][SQL] Support locale while parsing d...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22951#discussion_r231832597
  
    --- Diff: python/pyspark/sql/readwriter.py ---
    @@ -446,6 +450,9 @@ def csv(self, path, schema=None, sep=None, encoding=None, quote=None, escape=Non
                                   If None is set, it uses the default value, ``1.0``.
             :param emptyValue: sets the string representation of an empty value. If None is set, it uses
                                the default value, empty string.
    +        :param locale: sets a locale as language tag in IETF BCP 47 format. If None is set,
    +                       it uses the default value, ``en-US``. For instance, ``locale`` is used while
    +                       parsing dates and timestamps.
    --- End diff --
    
    It seems parsing decimals using `locale` will be slightly tricky in JSON case because we leave this to Jackson by calling its method `getCurrentToken` and `getDecimalValue`, and I haven't found how to pass locale to it. Probably we will need a custom deserialiser?
    
    In the CSV case, it should be easier since we convert strings ourselves. I will try to do that for CSV first of all when this PR be merged. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22951
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98567/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22951: [SPARK-25945][SQL] Support locale while parsing d...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22951#discussion_r231641960
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala ---
    @@ -578,4 +581,20 @@ class JsonFunctionsSuite extends QueryTest with SharedSQLContext {
               "Acceptable modes are PERMISSIVE and FAILFAST."))
         }
       }
    +
    +  test("use locale while parsing timestamps") {
    +    Seq("en-US", "ko-KR", "zh-CN", "ru-RU").foreach { langTag =>
    +      val locale = Locale.forLanguageTag(langTag)
    +      val ts = new SimpleDateFormat("dd/MM/yyyy HH:mm").parse("06/11/2018 18:00")
    +      val timestampFormat = "dd MMM yyyy HH:mm"
    +      val sdf = new SimpleDateFormat(timestampFormat, locale)
    +      val input = Seq(s"""{"time": "${sdf.format(ts)}"}""").toDS()
    +      val schema = new StructType().add("time", TimestampType)
    +      val options = Map("timestampFormat" -> timestampFormat, "locale" -> langTag)
    +      val df = input.select(from_json($"value", schema, options))
    --- End diff --
    
    This one can be simplified like this, too.
    ```scala
    -      val schema = new StructType().add("time", TimestampType)
           val options = Map("timestampFormat" -> timestampFormat, "locale" -> langTag)
    -      val df = input.select(from_json($"value", schema, options))
    +      val df = input.select(from_json($"value", "time timestamp", options))
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22951
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22951
  
    Merged to master.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22951
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22951
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98587/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22951: [SPARK-25945][SQL] Support locale while parsing d...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22951#discussion_r231646713
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CsvExpressionsSuite.scala ---
    @@ -209,4 +210,20 @@ class CsvExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper with P
           "2015-12-31T16:00:00"
         )
       }
    +
    +  test("take into account locale while parsing date") {
    --- End diff --
    
    nit. Can we use more simple test case names like the following?
    ```
    `take into account locale while parsing date` -> `parse date with locale`
    `use locale while parsing timestamps` -> `parse timestamps with locale`
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22951
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22951
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98583/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/22951
  
    Could you rebase this once again, @MaxGekk ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22951
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98507/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22951: [SPARK-25945][SQL] Support locale while parsing d...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22951#discussion_r231800213
  
    --- Diff: python/pyspark/sql/readwriter.py ---
    @@ -349,7 +353,7 @@ def csv(self, path, schema=None, sep=None, encoding=None, quote=None, escape=Non
                 negativeInf=None, dateFormat=None, timestampFormat=None, maxColumns=None,
                 maxCharsPerColumn=None, maxMalformedLogPerPartition=None, mode=None,
                 columnNameOfCorruptRecord=None, multiLine=None, charToEscapeQuoteEscaping=None,
    -            samplingRatio=None, enforceSchema=None, emptyValue=None):
    +            samplingRatio=None, enforceSchema=None, emptyValue=None, locale=None):
    --- End diff --
    
    It seems it exists in `streaming.py`: https://github.com/apache/spark/blob/08c76b5d39127ae207d9d1fff99c2551e6ce2581/python/pyspark/sql/streaming.py#L567


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22951
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22951
  
    **[Test build #98543 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98543/testReport)** for PR 22951 at commit [`6ab8501`](https://github.com/apache/spark/commit/6ab850164182565c2cd8cffe99f5c4bb09ead660).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22951: [SPARK-25945][SQL] Support locale while parsing d...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22951#discussion_r231775987
  
    --- Diff: python/pyspark/sql/readwriter.py ---
    @@ -446,6 +450,9 @@ def csv(self, path, schema=None, sep=None, encoding=None, quote=None, escape=Non
                                   If None is set, it uses the default value, ``1.0``.
             :param emptyValue: sets the string representation of an empty value. If None is set, it uses
                                the default value, empty string.
    +        :param locale: sets a locale as language tag in IETF BCP 47 format. If None is set,
    +                       it uses the default value, ``en-US``. For instance, ``locale`` is used while
    +                       parsing dates and timestamps.
    --- End diff --
    
    I think ideally we should apply to decimal parsing too actually. But yea we can leave it separate.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/22951
  
    > OMG, what does ноя 2018 mean BTW? haha
    
    It is 3 letters prefix of `Ноябрь` which is November in Russian. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22951
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98543/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22951
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/22951
  
    jenkins, retest this, please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22951
  
    **[Test build #98567 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98567/testReport)** for PR 22951 at commit [`759bca6`](https://github.com/apache/spark/commit/759bca62903b8624dda91ef081e93cd4a30969fe).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22951
  
    **[Test build #98489 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98489/testReport)** for PR 22951 at commit [`41154bd`](https://github.com/apache/spark/commit/41154bdce8e61bea208772bbe948b71b23220e8d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/22951
  
    I will update docs soon.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22951
  
    **[Test build #98507 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98507/testReport)** for PR 22951 at commit [`93da760`](https://github.com/apache/spark/commit/93da7604c2d74f97b12e9210f0bcaf038774ea41).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22951
  
    **[Test build #98587 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98587/testReport)** for PR 22951 at commit [`8834b4b`](https://github.com/apache/spark/commit/8834b4b804f99d2a31654a4700359bb4f32e6dba).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22951
  
    **[Test build #98587 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98587/testReport)** for PR 22951 at commit [`8834b4b`](https://github.com/apache/spark/commit/8834b4b804f99d2a31654a4700359bb4f32e6dba).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22951
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22951: [SPARK-25945][SQL] Support locale while parsing d...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22951#discussion_r231776396
  
    --- Diff: python/pyspark/sql/readwriter.py ---
    @@ -267,7 +270,8 @@ def json(self, path, schema=None, primitivesAsString=None, prefersDecimal=None,
                 mode=mode, columnNameOfCorruptRecord=columnNameOfCorruptRecord, dateFormat=dateFormat,
                 timestampFormat=timestampFormat, multiLine=multiLine,
                 allowUnquotedControlChars=allowUnquotedControlChars, lineSep=lineSep,
    -            samplingRatio=samplingRatio, dropFieldIfAllNull=dropFieldIfAllNull, encoding=encoding)
    +            samplingRatio=samplingRatio, dropFieldIfAllNull=dropFieldIfAllNull, encoding=encoding,
    +            locale=locale)
    --- End diff --
    
    @MaxGekk, looks `sql/streaming.py` is missed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22951
  
    **[Test build #98583 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98583/testReport)** for PR 22951 at commit [`8834b4b`](https://github.com/apache/spark/commit/8834b4b804f99d2a31654a4700359bb4f32e6dba).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22951: [SPARK-25945][SQL] Support locale while parsing d...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/22951


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22951
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22951
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22951: [SPARK-25945][SQL] Support locale while parsing d...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22951#discussion_r231800381
  
    --- Diff: python/pyspark/sql/readwriter.py ---
    @@ -267,7 +270,8 @@ def json(self, path, schema=None, primitivesAsString=None, prefersDecimal=None,
                 mode=mode, columnNameOfCorruptRecord=columnNameOfCorruptRecord, dateFormat=dateFormat,
                 timestampFormat=timestampFormat, multiLine=multiLine,
                 allowUnquotedControlChars=allowUnquotedControlChars, lineSep=lineSep,
    -            samplingRatio=samplingRatio, dropFieldIfAllNull=dropFieldIfAllNull, encoding=encoding)
    +            samplingRatio=samplingRatio, dropFieldIfAllNull=dropFieldIfAllNull, encoding=encoding,
    --- End diff --
    
    https://github.com/apache/spark/pull/22973


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/22951
  
    @HyukjinKwon @dongjoon-hyun Please, review the changes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22951
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22951
  
    **[Test build #98567 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98567/testReport)** for PR 22951 at commit [`759bca6`](https://github.com/apache/spark/commit/759bca62903b8624dda91ef081e93cd4a30969fe).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22951
  
    **[Test build #98598 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98598/testReport)** for PR 22951 at commit [`8834b4b`](https://github.com/apache/spark/commit/8834b4b804f99d2a31654a4700359bb4f32e6dba).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22951
  
    **[Test build #98598 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98598/testReport)** for PR 22951 at commit [`8834b4b`](https://github.com/apache/spark/commit/8834b4b804f99d2a31654a4700359bb4f32e6dba).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22951
  
    **[Test build #98591 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98591/testReport)** for PR 22951 at commit [`8834b4b`](https://github.com/apache/spark/commit/8834b4b804f99d2a31654a4700359bb4f32e6dba).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22951
  
    **[Test build #98541 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98541/testReport)** for PR 22951 at commit [`6ab8501`](https://github.com/apache/spark/commit/6ab850164182565c2cd8cffe99f5c4bb09ead660).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22951
  
    Actually let me leave a cc for @srowen. I remember we talked about it before.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22951
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98591/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22951
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22951
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98598/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22951: [SPARK-25945][SQL] Support locale while parsing d...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22951#discussion_r231873877
  
    --- Diff: python/pyspark/sql/readwriter.py ---
    @@ -446,6 +450,9 @@ def csv(self, path, schema=None, sep=None, encoding=None, quote=None, escape=Non
                                   If None is set, it uses the default value, ``1.0``.
             :param emptyValue: sets the string representation of an empty value. If None is set, it uses
                                the default value, empty string.
    +        :param locale: sets a locale as language tag in IETF BCP 47 format. If None is set,
    +                       it uses the default value, ``en-US``. For instance, ``locale`` is used while
    +                       parsing dates and timestamps.
    --- End diff --
    
    Here is the PR for parsing decimals from CSV: https://github.com/apache/spark/pull/22979


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22951
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22951
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22951
  
    Looks good. I or someone else should take a closer look before getting this in.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22951: [SPARK-25945][SQL] Support locale while parsing d...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22951#discussion_r231776739
  
    --- Diff: python/pyspark/sql/readwriter.py ---
    @@ -349,7 +353,7 @@ def csv(self, path, schema=None, sep=None, encoding=None, quote=None, escape=Non
                 negativeInf=None, dateFormat=None, timestampFormat=None, maxColumns=None,
                 maxCharsPerColumn=None, maxMalformedLogPerPartition=None, mode=None,
                 columnNameOfCorruptRecord=None, multiLine=None, charToEscapeQuoteEscaping=None,
    -            samplingRatio=None, enforceSchema=None, emptyValue=None):
    +            samplingRatio=None, enforceSchema=None, emptyValue=None, locale=None):
    --- End diff --
    
    Let's add `emptyValue` in `streaming.py` in the same separate PR.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22951
  
    **[Test build #98489 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98489/testReport)** for PR 22951 at commit [`41154bd`](https://github.com/apache/spark/commit/41154bdce8e61bea208772bbe948b71b23220e8d).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22951
  
    OMG, what does `ноя 2018` mean BTW? haha


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22951: [SPARK-25945][SQL] Support locale while parsing d...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22951#discussion_r231776568
  
    --- Diff: python/pyspark/sql/readwriter.py ---
    @@ -267,7 +270,8 @@ def json(self, path, schema=None, primitivesAsString=None, prefersDecimal=None,
                 mode=mode, columnNameOfCorruptRecord=columnNameOfCorruptRecord, dateFormat=dateFormat,
                 timestampFormat=timestampFormat, multiLine=multiLine,
                 allowUnquotedControlChars=allowUnquotedControlChars, lineSep=lineSep,
    -            samplingRatio=samplingRatio, dropFieldIfAllNull=dropFieldIfAllNull, encoding=encoding)
    +            samplingRatio=samplingRatio, dropFieldIfAllNull=dropFieldIfAllNull, encoding=encoding,
    --- End diff --
    
    @MaxGekk, let's also add `dropFieldIfAllNull` and `encoding` in `sql/streaming.py` in a separate PR.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22951
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22951: [SPARK-25945][SQL] Support locale while parsing d...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22951#discussion_r231640870
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/CsvFunctionsSuite.scala ---
    @@ -117,4 +120,20 @@ class CsvFunctionsSuite extends QueryTest with SharedSQLContext {
               "Acceptable modes are PERMISSIVE and FAILFAST."))
         }
       }
    +
    +  test("use locale while parsing timestamps") {
    +    Seq("en-US", "ko-KR", "zh-CN", "ru-RU").foreach { langTag =>
    +      val locale = Locale.forLanguageTag(langTag)
    +      val ts = new SimpleDateFormat("dd/MM/yyyy HH:mm").parse("06/11/2018 18:00")
    +      val timestampFormat = "dd MMM yyyy HH:mm"
    +      val sdf = new SimpleDateFormat(timestampFormat, locale)
    +      val input = Seq(s"""${sdf.format(ts)}""").toDS()
    +      val schema = new StructType().add("time", TimestampType)
    +      val options = Map("timestampFormat" -> timestampFormat, "locale" -> langTag)
    +      val df = input.select(from_csv($"value", schema, options))
    --- End diff --
    
    Can we simplify like this?
    ```scala
    -      val schema = new StructType().add("time", TimestampType)
           val options = Map("timestampFormat" -> timestampFormat, "locale" -> langTag)
    -      val df = input.select(from_csv($"value", schema, options))
    +      val df = input.select(from_csv($"value", lit("time timestamp"), options.asJava))
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22951
  
    **[Test build #98591 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98591/testReport)** for PR 22951 at commit [`8834b4b`](https://github.com/apache/spark/commit/8834b4b804f99d2a31654a4700359bb4f32e6dba).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22951: [SPARK-25945][SQL] Support locale while parsing date/tim...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/22951
  
    jenkins, retest this, please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org