You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by MaxGekk <gi...@git.apache.org> on 2018/09/24 08:55:48 UTC
[GitHub] spark pull request #22534: [SPARK-25514][SQL] Pretty JSON
GitHub user MaxGekk opened a pull request:
https://github.com/apache/spark/pull/22534
[SPARK-25514][SQL] Pretty JSON
## What changes were proposed in this pull request?
The PR introduces new JSON option `pretty` which allows to turn on `DefaultPrettyPrinter` of `Jackson`'s Json generator. New option is useful in exploring of deep nested columns and in converting of JSON columns in more readable representation (look at the added test).
## How was this patch tested?
Added rount trip test which convert an JSON string to pretty representation via `from_json()` and `to_json()`.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/MaxGekk/spark-1 pretty-json
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22534.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22534
----
commit 826e8c3dd5b7e54ebdceee74f30798e1b01bcaed
Author: Maxim Gekk <ma...@...>
Date: 2018-09-23T18:49:58Z
Added a round trip tests - from_json and to_json
commit 051c8fd47741637fc9ace6afc059b4b1d18471f5
Author: Maxim Gekk <ma...@...>
Date: 2018-09-23T19:02:05Z
Support the pretty option
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22534: [SPARK-25514][SQL] Pretty JSON
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22534
I'm supportive of this idea.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22534: [SPARK-25514][SQL] Pretty JSON
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22534
**[Test build #96504 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96504/testReport)** for PR 22534 at commit [`051c8fd`](https://github.com/apache/spark/commit/051c8fd47741637fc9ace6afc059b4b1d18471f5).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22534: [SPARK-25514][SQL] Generating pretty JSON by to_json
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22534
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96545/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22534: [SPARK-25514][SQL] Generating pretty JSON by to_j...
Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:
https://github.com/apache/spark/pull/22534#discussion_r220104540
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala ---
@@ -113,6 +113,11 @@ private[sql] class JSONOptions(
}
val lineSeparatorInWrite: String = lineSeparator.getOrElse("\n")
+ /**
+ * Generating JSON strings in pretty representation if the parameter enabled.
--- End diff --
"if the parameter enabled" => "if the parameter is enabled"
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22534: [SPARK-25514][SQL] Pretty JSON
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22534
**[Test build #96514 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96514/testReport)** for PR 22534 at commit [`f2d7b7e`](https://github.com/apache/spark/commit/f2d7b7e72b8dbba4043cce7d99ae63870479269c).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22534: [SPARK-25514][SQL] Generating pretty JSON by to_json
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22534
**[Test build #96545 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96545/testReport)** for PR 22534 at commit [`80bb0a1`](https://github.com/apache/spark/commit/80bb0a180173f0f84ac2f1638b067c82f3c96a25).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22534: [SPARK-25514][SQL] Generating pretty JSON by to_json
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22534
LGTM
Merged to master.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22534: [SPARK-25514][SQL] Pretty JSON
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22534#discussion_r219903090
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala ---
@@ -2515,4 +2515,35 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData {
checkCount(2)
countForMalformedJSON(0, Seq(""))
}
+
+ test("saving pretty JSON in the mutliLine mode") {
+ withTempPath { path =>
+ val df = spark.range(1).select(
--- End diff --
Hm .. wait .. does this work for multiple records?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22534: [SPARK-25514][SQL] Pretty JSON
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22534
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22534: [SPARK-25514][SQL] Pretty JSON
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22534
**[Test build #96504 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96504/testReport)** for PR 22534 at commit [`051c8fd`](https://github.com/apache/spark/commit/051c8fd47741637fc9ace6afc059b4b1d18471f5).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22534: [SPARK-25514][SQL] Pretty JSON
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22534
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22534: [SPARK-25514][SQL] Pretty JSON
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22534#discussion_r219903500
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala ---
@@ -2515,4 +2515,35 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData {
checkCount(2)
countForMalformedJSON(0, Seq(""))
}
+
+ test("saving pretty JSON in the mutliLine mode") {
+ withTempPath { path =>
+ val df = spark.range(1).select(
--- End diff --
let's just allow this option in JSON function only for now if not (I guess it's not from a cursory look).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22534: [SPARK-25514][SQL] Pretty JSON
Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/22534#discussion_r219866091
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala ---
@@ -113,6 +113,11 @@ private[sql] class JSONOptions(
}
val lineSeparatorInWrite: String = lineSeparator.getOrElse("\n")
+ /**
+ * Generating JSON strings in pretty representation if the parameter enabled.
+ */
+ val pretty: Boolean = parameters.get("pretty").map(_.toBoolean).getOrElse(false)
--- End diff --
Updated comments for the `json()` method of `DataFrameWriter` since we reference to it from `to_json`.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22534: [SPARK-25514][SQL] Pretty JSON
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22534
Oh I rushed to read. Shall we then document? Let's add simple set of end to end tests for then as well.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22534: [SPARK-25514][SQL] Generating pretty JSON by to_json
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22534
**[Test build #96514 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96514/testReport)** for PR 22534 at commit [`f2d7b7e`](https://github.com/apache/spark/commit/f2d7b7e72b8dbba4043cce7d99ae63870479269c).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22534: [SPARK-25514][SQL] Generating pretty JSON by to_j...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22534#discussion_r220405853
--- Diff: R/pkg/R/functions.R ---
@@ -198,8 +198,9 @@ NULL
#' }
#' @param ... additional argument(s). In \code{to_json} and \code{from_json}, this contains
#' additional named properties to control how it is converted, accepts the same
-#' options as the JSON data source. In \code{arrays_zip}, this contains additional
-#' Columns of arrays to be merged.
+#' options as the JSON data source. Additionally \code{to_json} supports the "pretty"
--- End diff --
nit: I would say `\code{pretty}`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22534: [SPARK-25514][SQL] Pretty JSON
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22534
**[Test build #96510 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96510/testReport)** for PR 22534 at commit [`0f099e3`](https://github.com/apache/spark/commit/0f099e336bbb563a58528b9c12f3ed5c542b0159).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22534: [SPARK-25514][SQL] Generating pretty JSON by to_json
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22534
**[Test build #96545 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96545/testReport)** for PR 22534 at commit [`80bb0a1`](https://github.com/apache/spark/commit/80bb0a180173f0f84ac2f1638b067c82f3c96a25).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22534: [SPARK-25514][SQL] Pretty JSON
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22534
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22534: [SPARK-25514][SQL] Pretty JSON
Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/22534
> Let's clarify this options is only for json functions in PR title
In general, the option can be used to produce pretty printed files with JSON, for example in the multi-line mode. I wouldn't restrict it by `to_json` only.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22534: [SPARK-25514][SQL] Generating pretty JSON by to_json
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22534
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96514/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22534: [SPARK-25514][SQL] Generating pretty JSON by to_json
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22534
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22534: [SPARK-25514][SQL] Pretty JSON
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22534
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96504/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22534: [SPARK-25514][SQL] Pretty JSON
Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/22534#discussion_r219901199
--- Diff: python/pyspark/sql/readwriter.py ---
@@ -776,7 +776,7 @@ def saveAsTable(self, name, format=None, mode=None, partitionBy=None, **options)
@since(1.4)
def json(self, path, mode=None, compression=None, dateFormat=None, timestampFormat=None,
- lineSep=None, encoding=None):
+ lineSep=None, encoding=None, pretty=None):
--- End diff --
How about adding to R too? I don't know as much about how R works and whether it just works to add `pretty=True` in the R API. It could use some docs if it works.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22534: [SPARK-25514][SQL] Generating pretty JSON by to_j...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/22534
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22534: [SPARK-25514][SQL] Pretty JSON
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22534#discussion_r219809938
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala ---
@@ -113,6 +113,11 @@ private[sql] class JSONOptions(
}
val lineSeparatorInWrite: String = lineSeparator.getOrElse("\n")
+ /**
+ * Generating JSON strings in pretty representation if the parameter enabled.
+ */
+ val pretty: Boolean = parameters.get("pretty").map(_.toBoolean).getOrElse(false)
--- End diff --
hm .. so now finally this became an actual problem. This is specifically for JSON functions and documented nowhere. Can we deal with this problem?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22534: [SPARK-25514][SQL] Generating pretty JSON by to_json
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22534
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22534: [SPARK-25514][SQL] Pretty JSON
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/22534
Let's clarify this options is only for json functions in PR title
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22534: [SPARK-25514][SQL] Generating pretty JSON by to_j...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/22534#discussion_r220405798
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -3635,6 +3637,8 @@ object functions {
* @param e a column containing a struct, an array or a map.
* @param options options to control how the struct column is converted into a json string.
* accepts the same options and the json data source.
+ * Additionally the function supports the `pretty` option which enables
--- End diff --
For clarification, we don't support this in JSON datasource officially since it's not documented.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org