You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by MaxGekk <gi...@git.apache.org> on 2018/11/08 08:33:44 UTC

[GitHub] spark pull request #22973: [SPARK-25972][SQL] Missed JSON options in streami...

GitHub user MaxGekk opened a pull request:

    https://github.com/apache/spark/pull/22973

    [SPARK-25972][SQL] Missed JSON options in streaming.py

    ## What changes were proposed in this pull request?
    
    Added JSON options for `json()` in streaming.py that are presented in the similar method in readwriter.py. In particular, missed options are `dropFieldIfAllNull` and `encoding`.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/MaxGekk/spark-1 streaming-missed-options

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22973.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22973
    
----
commit da697acc60326017ce0b9a999df124ffb1ca9276
Author: Maxim Gekk <ma...@...>
Date:   2018-11-08T08:28:35Z

    Adding missed options

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22973: [SPARK-25972][PYTHON] Missed JSON options in stre...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22973#discussion_r232484720
  
    --- Diff: python/pyspark/sql/streaming.py ---
    @@ -467,11 +468,18 @@ def json(self, path, schema=None, primitivesAsString=None, prefersDecimal=None,
             :param allowUnquotedControlChars: allows JSON Strings to contain unquoted control
                                               characters (ASCII characters with value less than 32,
                                               including tab and line feed characters) or not.
    +        :param encoding: allows to forcibly set one of standard basic or extended encoding for
    +                         the JSON files. For example UTF-16BE, UTF-32LE. If None is set,
    +                         the encoding of input JSON will be detected automatically
    +                         when the multiLine option is set to ``true``.
             :param lineSep: defines the line separator that should be used for parsing. If None is
                             set, it covers all ``\\r``, ``\\r\\n`` and ``\\n``.
             :param locale: sets a locale as language tag in IETF BCP 47 format. If None is set,
                            it uses the default value, ``en-US``. For instance, ``locale`` is used while
                            parsing dates and timestamps.
    +        :param dropFieldIfAllNull: whether to ignore column of all null values or empty
    +                                   array/struct during schema inference. If None is set, it
    +                                   uses the default value, ``false``.
    --- End diff --
    
    @MaxGekk, let's match its order (the doc and parameters).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22973: [SPARK-25972][PYTHON] Missed JSON options in stre...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/22973


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22973: [SPARK-25972][SQL] Missed JSON options in streaming.py

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22973
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22973: [SPARK-25972][SQL] Missed JSON options in streaming.py

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22973
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98589/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22973: [SPARK-25972][PYTHON] Missed JSON options in streaming.p...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22973
  
    **[Test build #98695 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98695/testReport)** for PR 22973 at commit [`5b132ce`](https://github.com/apache/spark/commit/5b132ce59c5bdfab72bebe69123d84a52d261e1d).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22973: [SPARK-25972][PYTHON] Missed JSON options in streaming.p...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22973
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98695/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22973: [SPARK-25972][PYTHON] Missed JSON options in streaming.p...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22973
  
    **[Test build #98695 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98695/testReport)** for PR 22973 at commit [`5b132ce`](https://github.com/apache/spark/commit/5b132ce59c5bdfab72bebe69123d84a52d261e1d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22973: [SPARK-25972][PYTHON] Missed JSON options in streaming.p...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/22973
  
    > Oh, let's name it [PYTHON] BTW.
    
    I see some of on-going PR have tag `[PYSPARK]`. Is this somewhere documented?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22973: [SPARK-25972][PYTHON] Missed JSON options in streaming.p...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22973
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98639/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22973: [SPARK-25972][PYTHON] Missed JSON options in streaming.p...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22973
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22973: [SPARK-25972][SQL] Missed JSON options in streaming.py

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22973
  
    Oh, let's name it `[PYTHON]` BTW.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22973: [SPARK-25972][PYTHON] Missed JSON options in streaming.p...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22973
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22973: [SPARK-25972][PYTHON] Missed JSON options in streaming.p...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22973
  
    **[Test build #98639 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98639/testReport)** for PR 22973 at commit [`4ca71fc`](https://github.com/apache/spark/commit/4ca71fc75d0a25ced9803372b0594ae8342b5eb9).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22973: [SPARK-25972][PYTHON] Missed JSON options in streaming.p...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22973
  
    **[Test build #98639 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98639/testReport)** for PR 22973 at commit [`4ca71fc`](https://github.com/apache/spark/commit/4ca71fc75d0a25ced9803372b0594ae8342b5eb9).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22973: [SPARK-25972][SQL] Missed JSON options in streaming.py

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22973
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22973: [SPARK-25972][SQL] Missed JSON options in streaming.py

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22973
  
    **[Test build #98589 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98589/testReport)** for PR 22973 at commit [`da697ac`](https://github.com/apache/spark/commit/da697acc60326017ce0b9a999df124ffb1ca9276).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22973: [SPARK-25972][SQL] Missed JSON options in streaming.py

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22973
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22973: [SPARK-25972][PYTHON] Missed JSON options in stre...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22973#discussion_r232485771
  
    --- Diff: python/pyspark/sql/streaming.py ---
    @@ -467,11 +468,18 @@ def json(self, path, schema=None, primitivesAsString=None, prefersDecimal=None,
             :param allowUnquotedControlChars: allows JSON Strings to contain unquoted control
                                               characters (ASCII characters with value less than 32,
                                               including tab and line feed characters) or not.
    +        :param encoding: allows to forcibly set one of standard basic or extended encoding for
    +                         the JSON files. For example UTF-16BE, UTF-32LE. If None is set,
    +                         the encoding of input JSON will be detected automatically
    +                         when the multiLine option is set to ``true``.
             :param lineSep: defines the line separator that should be used for parsing. If None is
                             set, it covers all ``\\r``, ``\\r\\n`` and ``\\n``.
             :param locale: sets a locale as language tag in IETF BCP 47 format. If None is set,
                            it uses the default value, ``en-US``. For instance, ``locale`` is used while
                            parsing dates and timestamps.
    +        :param dropFieldIfAllNull: whether to ignore column of all null values or empty
    +                                   array/struct during schema inference. If None is set, it
    +                                   uses the default value, ``false``.
    --- End diff --
    
    re-ordered


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22973: [SPARK-25972][SQL] Missed JSON options in streaming.py

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/22973
  
    @HyukjinKwon Thank you for pointing out the missed options. Please, have a look at the PR.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22973: [SPARK-25972][PYTHON] Missed JSON options in streaming.p...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22973
  
    Merged to master.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22973: [SPARK-25972][PYTHON] Missed JSON options in streaming.p...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22973
  
    Yea .. actually that's documented in https://spark.apache.org/contributing.html . Strictly it should be` PYTHON`
    
    > The PR title should be of the form [SPARK-xxxx][COMPONENT] Title, where SPARK-xxxx is the relevant JIRA number, COMPONENT is one of the PR categories shown at spark-prs.appspot.com and Title may be the JIRA’s title or a more specific title describing the PR itself.
    
    ![screen shot 2018-11-08 at 7 19 07 pm](https://user-images.githubusercontent.com/6477701/48195647-2e353b80-e38b-11e8-8ac0-73a458ee00c0.png)
    
    I tried to be clear for myself so I took a look before but .. not a big deal at all :D.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22973: [SPARK-25972][SQL] Missed JSON options in streaming.py

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22973
  
    **[Test build #98589 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98589/testReport)** for PR 22973 at commit [`da697ac`](https://github.com/apache/spark/commit/da697acc60326017ce0b9a999df124ffb1ca9276).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org