You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by MaxGekk <gi...@git.apache.org> on 2018/09/24 08:55:48 UTC

[GitHub] spark pull request #22534: [SPARK-25514][SQL] Pretty JSON

GitHub user MaxGekk opened a pull request:

    https://github.com/apache/spark/pull/22534

    [SPARK-25514][SQL] Pretty JSON

    ## What changes were proposed in this pull request?
    
    The PR introduces new JSON option `pretty` which allows to turn on `DefaultPrettyPrinter` of `Jackson`'s Json generator. New option is useful in exploring of deep nested columns and in converting of JSON columns in more readable representation (look at the added test).
    
    ## How was this patch tested?
    
    Added rount trip test which convert an JSON string to pretty representation via `from_json()` and `to_json()`.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/MaxGekk/spark-1 pretty-json

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22534.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22534
    
----
commit 826e8c3dd5b7e54ebdceee74f30798e1b01bcaed
Author: Maxim Gekk <ma...@...>
Date:   2018-09-23T18:49:58Z

    Added a round trip tests - from_json and to_json

commit 051c8fd47741637fc9ace6afc059b4b1d18471f5
Author: Maxim Gekk <ma...@...>
Date:   2018-09-23T19:02:05Z

    Support the pretty option

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22534: [SPARK-25514][SQL] Pretty JSON

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22534
  
    I'm supportive of this idea.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22534: [SPARK-25514][SQL] Pretty JSON

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22534
  
    **[Test build #96504 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96504/testReport)** for PR 22534 at commit [`051c8fd`](https://github.com/apache/spark/commit/051c8fd47741637fc9ace6afc059b4b1d18471f5).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22534: [SPARK-25514][SQL] Generating pretty JSON by to_json

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22534
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96545/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22534: [SPARK-25514][SQL] Generating pretty JSON by to_j...

Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22534#discussion_r220104540
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala ---
    @@ -113,6 +113,11 @@ private[sql] class JSONOptions(
       }
       val lineSeparatorInWrite: String = lineSeparator.getOrElse("\n")
     
    +  /**
    +   * Generating JSON strings in pretty representation if the parameter enabled.
    --- End diff --
    
    "if the parameter enabled" => "if the parameter is enabled"


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22534: [SPARK-25514][SQL] Pretty JSON

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22534
  
    **[Test build #96514 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96514/testReport)** for PR 22534 at commit [`f2d7b7e`](https://github.com/apache/spark/commit/f2d7b7e72b8dbba4043cce7d99ae63870479269c).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22534: [SPARK-25514][SQL] Generating pretty JSON by to_json

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22534
  
    **[Test build #96545 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96545/testReport)** for PR 22534 at commit [`80bb0a1`](https://github.com/apache/spark/commit/80bb0a180173f0f84ac2f1638b067c82f3c96a25).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22534: [SPARK-25514][SQL] Generating pretty JSON by to_json

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22534
  
    LGTM
    
    Merged to master.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22534: [SPARK-25514][SQL] Pretty JSON

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22534#discussion_r219903090
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala ---
    @@ -2515,4 +2515,35 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData {
         checkCount(2)
         countForMalformedJSON(0, Seq(""))
       }
    +
    +  test("saving pretty JSON in the mutliLine mode") {
    +    withTempPath { path =>
    +      val df = spark.range(1).select(
    --- End diff --
    
    Hm .. wait .. does this work for multiple records?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22534: [SPARK-25514][SQL] Pretty JSON

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22534
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22534: [SPARK-25514][SQL] Pretty JSON

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22534
  
    **[Test build #96504 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96504/testReport)** for PR 22534 at commit [`051c8fd`](https://github.com/apache/spark/commit/051c8fd47741637fc9ace6afc059b4b1d18471f5).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22534: [SPARK-25514][SQL] Pretty JSON

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22534
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22534: [SPARK-25514][SQL] Pretty JSON

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22534#discussion_r219903500
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala ---
    @@ -2515,4 +2515,35 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData {
         checkCount(2)
         countForMalformedJSON(0, Seq(""))
       }
    +
    +  test("saving pretty JSON in the mutliLine mode") {
    +    withTempPath { path =>
    +      val df = spark.range(1).select(
    --- End diff --
    
    let's just allow this option in JSON function only for now if not (I guess it's not from a cursory look).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22534: [SPARK-25514][SQL] Pretty JSON

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22534#discussion_r219866091
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala ---
    @@ -113,6 +113,11 @@ private[sql] class JSONOptions(
       }
       val lineSeparatorInWrite: String = lineSeparator.getOrElse("\n")
     
    +  /**
    +   * Generating JSON strings in pretty representation if the parameter enabled.
    +   */
    +  val pretty: Boolean = parameters.get("pretty").map(_.toBoolean).getOrElse(false)
    --- End diff --
    
    Updated comments for the `json()` method of `DataFrameWriter` since we reference to it from `to_json`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22534: [SPARK-25514][SQL] Pretty JSON

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22534
  
    Oh I rushed to read. Shall we then document? Let's add simple set of end to end tests for then as well.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22534: [SPARK-25514][SQL] Generating pretty JSON by to_json

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22534
  
    **[Test build #96514 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96514/testReport)** for PR 22534 at commit [`f2d7b7e`](https://github.com/apache/spark/commit/f2d7b7e72b8dbba4043cce7d99ae63870479269c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22534: [SPARK-25514][SQL] Generating pretty JSON by to_j...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22534#discussion_r220405853
  
    --- Diff: R/pkg/R/functions.R ---
    @@ -198,8 +198,9 @@ NULL
     #'          }
     #' @param ... additional argument(s). In \code{to_json} and \code{from_json}, this contains
     #'            additional named properties to control how it is converted, accepts the same
    -#'            options as the JSON data source.  In \code{arrays_zip}, this contains additional
    -#'            Columns of arrays to be merged.
    +#'            options as the JSON data source. Additionally \code{to_json} supports the "pretty"
    --- End diff --
    
    nit: I would say `\code{pretty}`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22534: [SPARK-25514][SQL] Pretty JSON

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22534
  
    **[Test build #96510 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96510/testReport)** for PR 22534 at commit [`0f099e3`](https://github.com/apache/spark/commit/0f099e336bbb563a58528b9c12f3ed5c542b0159).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22534: [SPARK-25514][SQL] Generating pretty JSON by to_json

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22534
  
    **[Test build #96545 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96545/testReport)** for PR 22534 at commit [`80bb0a1`](https://github.com/apache/spark/commit/80bb0a180173f0f84ac2f1638b067c82f3c96a25).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22534: [SPARK-25514][SQL] Pretty JSON

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22534
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22534: [SPARK-25514][SQL] Pretty JSON

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/22534
  
    > Let's clarify this options is only for json functions in PR title
    
    In general, the option can be used to produce pretty printed files with JSON, for example in the multi-line mode. I wouldn't restrict it by `to_json` only.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22534: [SPARK-25514][SQL] Generating pretty JSON by to_json

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22534
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96514/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22534: [SPARK-25514][SQL] Generating pretty JSON by to_json

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22534
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22534: [SPARK-25514][SQL] Pretty JSON

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22534
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96504/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22534: [SPARK-25514][SQL] Pretty JSON

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22534#discussion_r219901199
  
    --- Diff: python/pyspark/sql/readwriter.py ---
    @@ -776,7 +776,7 @@ def saveAsTable(self, name, format=None, mode=None, partitionBy=None, **options)
     
         @since(1.4)
         def json(self, path, mode=None, compression=None, dateFormat=None, timestampFormat=None,
    -             lineSep=None, encoding=None):
    +             lineSep=None, encoding=None, pretty=None):
    --- End diff --
    
    How about adding to R too? I don't know as much about how R works and whether it just works to add `pretty=True` in the R API. It could use some docs if it works.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22534: [SPARK-25514][SQL] Generating pretty JSON by to_j...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/22534


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22534: [SPARK-25514][SQL] Pretty JSON

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22534#discussion_r219809938
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala ---
    @@ -113,6 +113,11 @@ private[sql] class JSONOptions(
       }
       val lineSeparatorInWrite: String = lineSeparator.getOrElse("\n")
     
    +  /**
    +   * Generating JSON strings in pretty representation if the parameter enabled.
    +   */
    +  val pretty: Boolean = parameters.get("pretty").map(_.toBoolean).getOrElse(false)
    --- End diff --
    
    hm .. so now finally this became an actual problem. This is specifically for JSON functions and documented nowhere. Can we deal with this problem?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22534: [SPARK-25514][SQL] Generating pretty JSON by to_json

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22534
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22534: [SPARK-25514][SQL] Pretty JSON

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22534
  
    Let's clarify this options is only for json functions in PR title


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22534: [SPARK-25514][SQL] Generating pretty JSON by to_j...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22534#discussion_r220405798
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
    @@ -3635,6 +3637,8 @@ object functions {
        * @param e a column containing a struct, an array or a map.
        * @param options options to control how the struct column is converted into a json string.
        *                accepts the same options and the json data source.
    +   *                Additionally the function supports the `pretty` option which enables
    --- End diff --
    
    For clarification, we don't support this in JSON datasource officially since it's not documented.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org