You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by gatorsmile <gi...@git.apache.org> on 2017/12/15 07:04:14 UTC
[GitHub] spark pull request #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain
GitHub user gatorsmile opened a pull request:
https://github.com/apache/spark/pull/19985
[SPARK-22791] [SQL] [SS] Redact Output of Explain
## What changes were proposed in this pull request?
When calling explain on a query, the output can contain sensitive information. We should provide an admin/user to redact such information.
```
== Physical Plan ==
*HashAggregate(keys=[value#6], functions=[count(1)], output=[value#6, count(1)#12L])
+- StateStoreSave [value#6], state info [ checkpoint = file:/private/var/folders/vx/j0ydl5rn0gd9mgrh1pljnw900000gn/T/temporary-91c6fac0-609f-4bc8-ad57-52c189f06797/state, runId = 05a4b3af-f02c-40f8-9ff9-a3e18bae496f, opId = 0, ver = 0, numPartitions = 5], Complete, 0
+- *HashAggregate(keys=[value#6], functions=[merge_count(1)], output=[value#6, count#18L])
+- StateStoreRestore [value#6], state info [ checkpoint = file:/private/var/folders/vx/j0ydl5rn0gd9mgrh1pljnw900000gn/T/temporary-91c6fac0-609f-4bc8-ad57-52c189f06797/state, runId = 05a4b3af-f02c-40f8-9ff9-a3e18bae496f, opId = 0, ver = 0, numPartitions = 5]
+- *HashAggregate(keys=[value#6], functions=[merge_count(1)], output=[value#6, count#18L])
+- Exchange hashpartitioning(value#6, 5)
+- *HashAggregate(keys=[value#6], functions=[partial_count(1)], output=[value#6, count#18L])
+- *SerializeFromObject [staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, input[0, java.lang.String, true], true, false) AS value#6]
+- *MapElements <function1>, obj#5: java.lang.String
+- *DeserializeToObject value#30.toString, obj#4: java.lang.String
+- LocalTableScan [value#30]
```
## How was this patch tested?
Added a test case
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/gatorsmile/spark redactPlan
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19985.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19985
----
commit 8f44edf5a4edcc8f1c42331cf3ab9b694fb01925
Author: gatorsmile <ga...@gmail.com>
Date: 2017-12-15T06:56:32Z
fix.
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19985
**[Test build #85011 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85011/testReport)** for PR 19985 at commit [`e246096`](https://github.com/apache/spark/commit/e246096cab0b0905d7952556ba3e8ecd15c0a2a5).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain
Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:
https://github.com/apache/spark/pull/19985#discussion_r157271250
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala ---
@@ -231,6 +231,13 @@ class QueryExecution(val sparkSession: SparkSession, val logical: LogicalPlan) {
""".stripMargin.trim
}
+ /**
+ * Redact the sensitive information in the given string.
+ */
+ private def withRedaction(message: => String): String = {
+ Utils.redact(SparkSession.getActiveSession.map(_.sparkContext.conf).orNull, message)
--- End diff --
Why not make `Utils.redact` accept a regex parameter so that we can use session conf?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19985
**[Test build #85011 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85011/testReport)** for PR 19985 at commit [`e246096`](https://github.com/apache/spark/commit/e246096cab0b0905d7952556ba3e8ecd15c0a2a5).
* This patch **fails SparkR unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19985
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84949/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/19985
cc @marmbrus @hvanhovell @zsxwing
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19985
**[Test build #85090 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85090/testReport)** for PR 19985 at commit [`75c1479`](https://github.com/apache/spark/commit/75c1479d62045f3b8f987d38d88fd2e6be88e57b).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19985
**[Test build #85090 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85090/testReport)** for PR 19985 at commit [`75c1479`](https://github.com/apache/spark/commit/75c1479d62045f3b8f987d38d88fd2e6be88e57b).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19985
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/19985
thanks, merging to master!
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19985
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19985
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85077/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/19985
retest this please
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/19985#discussion_r157135279
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala ---
@@ -418,6 +418,35 @@ class StreamSuite extends StreamTest {
assert(OutputMode.Update === InternalOutputModes.Update)
}
+ override protected def sparkConf: SparkConf = super.sparkConf
+ .set("spark.redaction.string.regex", "file:/[\\w_]+")
+
+ test("explain - redaction") {
--- End diff --
This is just a SS example. I believe we have more such cases.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain
Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:
https://github.com/apache/spark/pull/19985#discussion_r157272556
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala ---
@@ -418,6 +418,35 @@ class StreamSuite extends StreamTest {
assert(OutputMode.Update === InternalOutputModes.Update)
}
+ override protected def sparkConf: SparkConf = super.sparkConf
+ .set("spark.redaction.string.regex", "file:/[\\w_]+")
+
+ test("explain - redaction") {
+ val replacement = "*********"
+
+ val inputData = MemoryStream[String]
+ val df = inputData.toDS().map(_ + "foo").groupBy("value").agg(count("*"))
+ // Test StreamingQuery.display
+ val q = df.writeStream.queryName("memory_explain").outputMode("complete").format("memory")
+ .start()
+ .asInstanceOf[StreamingQueryWrapper]
+ .streamingQuery
+ try {
+ inputData.addData("abc")
+ q.processAllAvailable()
+
+ val explainWithoutExtended = q.explainInternal(false)
+ assert(explainWithoutExtended.contains(replacement))
+ assert(explainWithoutExtended.contains("StateStoreRestore"))
+
+ val explainWithExtended = q.explainInternal(true)
+ assert(explainWithExtended.contains(replacement))
+ assert(explainWithExtended.contains("StateStoreRestore"))
--- End diff --
ditto
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19985
**[Test build #85002 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85002/testReport)** for PR 19985 at commit [`e246096`](https://github.com/apache/spark/commit/e246096cab0b0905d7952556ba3e8ecd15c0a2a5).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19985
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain
Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:
https://github.com/apache/spark/pull/19985#discussion_r157272194
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala ---
@@ -231,6 +231,13 @@ class QueryExecution(val sparkSession: SparkSession, val logical: LogicalPlan) {
""".stripMargin.trim
}
+ /**
+ * Redact the sensitive information in the given string.
+ */
+ private def withRedaction(message: => String): String = {
+ Utils.redact(SparkSession.getActiveSession.map(_.sparkContext.conf).orNull, message)
--- End diff --
A different query may need to use a different `regex`, a session conf will be better and the user can change it during runtime.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19985
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84946/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19985
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85011/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19985
**[Test build #85077 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85077/testReport)** for PR 19985 at commit [`d50d594`](https://github.com/apache/spark/commit/d50d5947c32b8b06f03fa933a26547e67c317ee2).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19985
**[Test build #84946 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84946/testReport)** for PR 19985 at commit [`8f44edf`](https://github.com/apache/spark/commit/8f44edf5a4edcc8f1c42331cf3ab9b694fb01925).
* This patch **fails due to an unknown error code, -9**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/19985#discussion_r157337723
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala ---
@@ -418,6 +418,35 @@ class StreamSuite extends StreamTest {
assert(OutputMode.Update === InternalOutputModes.Update)
}
+ override protected def sparkConf: SparkConf = super.sparkConf
+ .set("spark.redaction.string.regex", "file:/[\\w_]+")
+
+ test("explain - redaction") {
+ val replacement = "*********"
+
+ val inputData = MemoryStream[String]
+ val df = inputData.toDS().map(_ + "foo").groupBy("value").agg(count("*"))
+ // Test StreamingQuery.display
+ val q = df.writeStream.queryName("memory_explain").outputMode("complete").format("memory")
+ .start()
+ .asInstanceOf[StreamingQueryWrapper]
+ .streamingQuery
+ try {
+ inputData.addData("abc")
+ q.processAllAvailable()
+
+ val explainWithoutExtended = q.explainInternal(false)
+ assert(explainWithoutExtended.contains(replacement))
+ assert(explainWithoutExtended.contains("StateStoreRestore"))
--- End diff --
done.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19985
**[Test build #84946 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84946/testReport)** for PR 19985 at commit [`8f44edf`](https://github.com/apache/spark/commit/8f44edf5a4edcc8f1c42331cf3ab9b694fb01925).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/19985
cc @zsxwing
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain
Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:
https://github.com/apache/spark/pull/19985#discussion_r157272539
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala ---
@@ -418,6 +418,35 @@ class StreamSuite extends StreamTest {
assert(OutputMode.Update === InternalOutputModes.Update)
}
+ override protected def sparkConf: SparkConf = super.sparkConf
+ .set("spark.redaction.string.regex", "file:/[\\w_]+")
+
+ test("explain - redaction") {
+ val replacement = "*********"
+
+ val inputData = MemoryStream[String]
+ val df = inputData.toDS().map(_ + "foo").groupBy("value").agg(count("*"))
+ // Test StreamingQuery.display
+ val q = df.writeStream.queryName("memory_explain").outputMode("complete").format("memory")
+ .start()
+ .asInstanceOf[StreamingQueryWrapper]
+ .streamingQuery
+ try {
+ inputData.addData("abc")
+ q.processAllAvailable()
+
+ val explainWithoutExtended = q.explainInternal(false)
+ assert(explainWithoutExtended.contains(replacement))
+ assert(explainWithoutExtended.contains("StateStoreRestore"))
--- End diff --
nit: add `assert(!explainWithoutExtended.contains("file:/"))` to verify it does replace the correct content.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain
Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the issue:
https://github.com/apache/spark/pull/19985
Looks good. My major comment is using a session conf instead.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19985
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/19985
retest this please
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19985
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain
Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:
https://github.com/apache/spark/pull/19985#discussion_r157271920
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala ---
@@ -418,6 +418,35 @@ class StreamSuite extends StreamTest {
assert(OutputMode.Update === InternalOutputModes.Update)
}
+ override protected def sparkConf: SparkConf = super.sparkConf
+ .set("spark.redaction.string.regex", "file:/[\\w_]+")
+
+ test("explain - redaction") {
--- End diff --
Is it because you want to use `explainInternal`, so you write a SS test?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19985
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84968/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain
Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:
https://github.com/apache/spark/pull/19985#discussion_r157624496
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -1035,6 +1036,14 @@ object SQLConf {
.booleanConf
.createWithDefault(true)
+ val SQL_STRING_REDACTION_PATTERN =
+ ConfigBuilder("spark.sql.redaction.string.regex")
+ .doc("Regex to decide which parts of strings produced by Spark contain sensitive " +
+ "information. When this regex matches a string part, that string part is replaced by a " +
+ "dummy value. This is currently used to redact the output of SQL explain commands. " +
+ "When this conf is not set, the value from `spark.sql.redaction.string.regex` is used.")
--- End diff --
nit: `spark.sql.redaction.string.regex` -> `spark.redaction.string.regex`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19985
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85090/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19985
**[Test build #84949 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84949/testReport)** for PR 19985 at commit [`8f44edf`](https://github.com/apache/spark/commit/8f44edf5a4edcc8f1c42331cf3ab9b694fb01925).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19985
**[Test build #85002 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85002/testReport)** for PR 19985 at commit [`e246096`](https://github.com/apache/spark/commit/e246096cab0b0905d7952556ba3e8ecd15c0a2a5).
* This patch **fails PySpark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19985
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85002/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19985
**[Test build #85077 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85077/testReport)** for PR 19985 at commit [`d50d594`](https://github.com/apache/spark/commit/d50d5947c32b8b06f03fa933a26547e67c317ee2).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/19985
retest this please
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/19985#discussion_r157652242
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/DataSourceScanExecRedactionSuite.scala ---
@@ -52,4 +53,40 @@ class DataSourceScanExecRedactionSuite extends QueryTest with SharedSQLContext {
assert(df.queryExecution.simpleString.contains(replacement))
}
}
+
+ private def isIncluded(queryExecution: QueryExecution, msg: String): Boolean = {
+ queryExecution.toString.contains(msg) ||
+ queryExecution.simpleString.contains(msg) ||
+ queryExecution.stringWithStats.contains(msg)
+ }
+
+ test("explain is redacted using SQLConf") {
+ withTempDir { dir =>
+ val basePath = dir.getCanonicalPath
+ spark.range(0, 10).toDF("a").write.parquet(new Path(basePath, "foo=1").toString)
+ val df = spark.read.parquet(basePath)
+ val replacement = "*********"
+
+ // Respect SparkConf and replace file:/
+ assert(isIncluded(df.queryExecution, replacement))
+
+ assert(isIncluded(df.queryExecution, "FileScan"))
+ assert(!isIncluded(df.queryExecution, "file:/"))
+
+ withSQLConf(SQLConf.SQL_STRING_REDACTION_PATTERN.key -> "(?i)FileScan") {
+ // Respect SQLConf and replace FileScan
+ assert(isIncluded(df.queryExecution, replacement))
+
+ assert(!isIncluded(df.queryExecution, "FileScan"))
+ assert(isIncluded(df.queryExecution, "file:/"))
+ }
+
+ // Respect SparkConf and replace file:/
+ assert(isIncluded(df.queryExecution, replacement))
+
+ assert(isIncluded(df.queryExecution, "FileScan"))
+ assert(!isIncluded(df.queryExecution, "file:/"))
--- End diff --
what the difference between these 3 lines and https://github.com/apache/spark/pull/19985/files#diff-0c515221ed6e6eadcec71b3b9ad3a3e1R70 ?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/19985#discussion_r157337721
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala ---
@@ -418,6 +418,35 @@ class StreamSuite extends StreamTest {
assert(OutputMode.Update === InternalOutputModes.Update)
}
+ override protected def sparkConf: SparkConf = super.sparkConf
+ .set("spark.redaction.string.regex", "file:/[\\w_]+")
+
+ test("explain - redaction") {
--- End diff --
Just because I found a potential security issue in SS.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19985
**[Test build #84968 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84968/testReport)** for PR 19985 at commit [`8f44edf`](https://github.com/apache/spark/commit/8f44edf5a4edcc8f1c42331cf3ab9b694fb01925).
* This patch **fails SparkR unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/19985
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/19985#discussion_r157652152
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala ---
@@ -231,6 +231,13 @@ class QueryExecution(val sparkSession: SparkSession, val logical: LogicalPlan) {
""".stripMargin.trim
}
+ /**
+ * Redact the sensitive information in the given string.
+ */
+ private def withRedaction(message: => String): String = {
--- End diff --
`=> String` looks not very useful here, we need to materialize anyway when calling `Utils.redact`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19985
**[Test build #84949 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84949/testReport)** for PR 19985 at commit [`8f44edf`](https://github.com/apache/spark/commit/8f44edf5a4edcc8f1c42331cf3ab9b694fb01925).
* This patch **fails SparkR unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/19985
retest this please
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19985
**[Test build #84968 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84968/testReport)** for PR 19985 at commit [`8f44edf`](https://github.com/apache/spark/commit/8f44edf5a4edcc8f1c42331cf3ab9b694fb01925).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19985
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19985
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/19985#discussion_r157667492
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/DataSourceScanExecRedactionSuite.scala ---
@@ -52,4 +53,40 @@ class DataSourceScanExecRedactionSuite extends QueryTest with SharedSQLContext {
assert(df.queryExecution.simpleString.contains(replacement))
}
}
+
+ private def isIncluded(queryExecution: QueryExecution, msg: String): Boolean = {
+ queryExecution.toString.contains(msg) ||
+ queryExecution.simpleString.contains(msg) ||
+ queryExecution.stringWithStats.contains(msg)
+ }
+
+ test("explain is redacted using SQLConf") {
+ withTempDir { dir =>
+ val basePath = dir.getCanonicalPath
+ spark.range(0, 10).toDF("a").write.parquet(new Path(basePath, "foo=1").toString)
+ val df = spark.read.parquet(basePath)
+ val replacement = "*********"
+
+ // Respect SparkConf and replace file:/
+ assert(isIncluded(df.queryExecution, replacement))
+
+ assert(isIncluded(df.queryExecution, "FileScan"))
+ assert(!isIncluded(df.queryExecution, "file:/"))
+
+ withSQLConf(SQLConf.SQL_STRING_REDACTION_PATTERN.key -> "(?i)FileScan") {
+ // Respect SQLConf and replace FileScan
+ assert(isIncluded(df.queryExecution, replacement))
+
+ assert(!isIncluded(df.queryExecution, "FileScan"))
+ assert(isIncluded(df.queryExecution, "file:/"))
+ }
+
+ // Respect SparkConf and replace file:/
+ assert(isIncluded(df.queryExecution, replacement))
+
+ assert(isIncluded(df.queryExecution, "FileScan"))
+ assert(!isIncluded(df.queryExecution, "file:/"))
--- End diff --
Removed
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org