You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by gatorsmile <gi...@git.apache.org> on 2017/12/15 07:04:14 UTC

[GitHub] spark pull request #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain

GitHub user gatorsmile opened a pull request:

    https://github.com/apache/spark/pull/19985

    [SPARK-22791] [SQL] [SS] Redact Output of Explain

    ## What changes were proposed in this pull request?
    
    When calling explain on a query, the output can contain sensitive information. We should provide an admin/user to redact such information.
    
    ```
    == Physical Plan ==
    *HashAggregate(keys=[value#6], functions=[count(1)], output=[value#6, count(1)#12L])
    +- StateStoreSave [value#6], state info [ checkpoint = file:/private/var/folders/vx/j0ydl5rn0gd9mgrh1pljnw900000gn/T/temporary-91c6fac0-609f-4bc8-ad57-52c189f06797/state, runId = 05a4b3af-f02c-40f8-9ff9-a3e18bae496f, opId = 0, ver = 0, numPartitions = 5], Complete, 0
       +- *HashAggregate(keys=[value#6], functions=[merge_count(1)], output=[value#6, count#18L])
          +- StateStoreRestore [value#6], state info [ checkpoint = file:/private/var/folders/vx/j0ydl5rn0gd9mgrh1pljnw900000gn/T/temporary-91c6fac0-609f-4bc8-ad57-52c189f06797/state, runId = 05a4b3af-f02c-40f8-9ff9-a3e18bae496f, opId = 0, ver = 0, numPartitions = 5]
             +- *HashAggregate(keys=[value#6], functions=[merge_count(1)], output=[value#6, count#18L])
                +- Exchange hashpartitioning(value#6, 5)
                   +- *HashAggregate(keys=[value#6], functions=[partial_count(1)], output=[value#6, count#18L])
                      +- *SerializeFromObject [staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, input[0, java.lang.String, true], true, false) AS value#6]
                         +- *MapElements <function1>, obj#5: java.lang.String
                            +- *DeserializeToObject value#30.toString, obj#4: java.lang.String
                               +- LocalTableScan [value#30]
    ```
    ## How was this patch tested?
    Added a test case

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gatorsmile/spark redactPlan

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19985.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19985
    
----
commit 8f44edf5a4edcc8f1c42331cf3ab9b694fb01925
Author: gatorsmile <ga...@gmail.com>
Date:   2017-12-15T06:56:32Z

    fix.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19985
  
    **[Test build #85011 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85011/testReport)** for PR 19985 at commit [`e246096`](https://github.com/apache/spark/commit/e246096cab0b0905d7952556ba3e8ecd15c0a2a5).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19985#discussion_r157271250
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala ---
    @@ -231,6 +231,13 @@ class QueryExecution(val sparkSession: SparkSession, val logical: LogicalPlan) {
         """.stripMargin.trim
       }
     
    +  /**
    +   * Redact the sensitive information in the given string.
    +   */
    +  private def withRedaction(message: => String): String = {
    +    Utils.redact(SparkSession.getActiveSession.map(_.sparkContext.conf).orNull, message)
    --- End diff --
    
    Why not make `Utils.redact` accept a regex parameter so that we can use session conf?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19985
  
    **[Test build #85011 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85011/testReport)** for PR 19985 at commit [`e246096`](https://github.com/apache/spark/commit/e246096cab0b0905d7952556ba3e8ecd15c0a2a5).
     * This patch **fails SparkR unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19985
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84949/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/19985
  
    cc @marmbrus @hvanhovell @zsxwing 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19985
  
    **[Test build #85090 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85090/testReport)** for PR 19985 at commit [`75c1479`](https://github.com/apache/spark/commit/75c1479d62045f3b8f987d38d88fd2e6be88e57b).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19985
  
    **[Test build #85090 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85090/testReport)** for PR 19985 at commit [`75c1479`](https://github.com/apache/spark/commit/75c1479d62045f3b8f987d38d88fd2e6be88e57b).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19985
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/19985
  
    thanks, merging to master!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19985
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19985
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85077/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/19985
  
    retest this please



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19985#discussion_r157135279
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala ---
    @@ -418,6 +418,35 @@ class StreamSuite extends StreamTest {
         assert(OutputMode.Update === InternalOutputModes.Update)
       }
     
    +  override protected def sparkConf: SparkConf = super.sparkConf
    +    .set("spark.redaction.string.regex", "file:/[\\w_]+")
    +
    +  test("explain - redaction") {
    --- End diff --
    
    This is just a SS example. I believe we have more such cases. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19985#discussion_r157272556
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala ---
    @@ -418,6 +418,35 @@ class StreamSuite extends StreamTest {
         assert(OutputMode.Update === InternalOutputModes.Update)
       }
     
    +  override protected def sparkConf: SparkConf = super.sparkConf
    +    .set("spark.redaction.string.regex", "file:/[\\w_]+")
    +
    +  test("explain - redaction") {
    +    val replacement = "*********"
    +
    +    val inputData = MemoryStream[String]
    +    val df = inputData.toDS().map(_ + "foo").groupBy("value").agg(count("*"))
    +    // Test StreamingQuery.display
    +    val q = df.writeStream.queryName("memory_explain").outputMode("complete").format("memory")
    +      .start()
    +      .asInstanceOf[StreamingQueryWrapper]
    +      .streamingQuery
    +    try {
    +      inputData.addData("abc")
    +      q.processAllAvailable()
    +
    +      val explainWithoutExtended = q.explainInternal(false)
    +      assert(explainWithoutExtended.contains(replacement))
    +      assert(explainWithoutExtended.contains("StateStoreRestore"))
    +
    +      val explainWithExtended = q.explainInternal(true)
    +      assert(explainWithExtended.contains(replacement))
    +      assert(explainWithExtended.contains("StateStoreRestore"))
    --- End diff --
    
    ditto


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19985
  
    **[Test build #85002 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85002/testReport)** for PR 19985 at commit [`e246096`](https://github.com/apache/spark/commit/e246096cab0b0905d7952556ba3e8ecd15c0a2a5).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19985
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19985#discussion_r157272194
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala ---
    @@ -231,6 +231,13 @@ class QueryExecution(val sparkSession: SparkSession, val logical: LogicalPlan) {
         """.stripMargin.trim
       }
     
    +  /**
    +   * Redact the sensitive information in the given string.
    +   */
    +  private def withRedaction(message: => String): String = {
    +    Utils.redact(SparkSession.getActiveSession.map(_.sparkContext.conf).orNull, message)
    --- End diff --
    
    A different query may need to use a different `regex`, a session conf will be better and the user can change it during runtime.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19985
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84946/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19985
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85011/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19985
  
    **[Test build #85077 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85077/testReport)** for PR 19985 at commit [`d50d594`](https://github.com/apache/spark/commit/d50d5947c32b8b06f03fa933a26547e67c317ee2).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19985
  
    **[Test build #84946 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84946/testReport)** for PR 19985 at commit [`8f44edf`](https://github.com/apache/spark/commit/8f44edf5a4edcc8f1c42331cf3ab9b694fb01925).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19985#discussion_r157337723
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala ---
    @@ -418,6 +418,35 @@ class StreamSuite extends StreamTest {
         assert(OutputMode.Update === InternalOutputModes.Update)
       }
     
    +  override protected def sparkConf: SparkConf = super.sparkConf
    +    .set("spark.redaction.string.regex", "file:/[\\w_]+")
    +
    +  test("explain - redaction") {
    +    val replacement = "*********"
    +
    +    val inputData = MemoryStream[String]
    +    val df = inputData.toDS().map(_ + "foo").groupBy("value").agg(count("*"))
    +    // Test StreamingQuery.display
    +    val q = df.writeStream.queryName("memory_explain").outputMode("complete").format("memory")
    +      .start()
    +      .asInstanceOf[StreamingQueryWrapper]
    +      .streamingQuery
    +    try {
    +      inputData.addData("abc")
    +      q.processAllAvailable()
    +
    +      val explainWithoutExtended = q.explainInternal(false)
    +      assert(explainWithoutExtended.contains(replacement))
    +      assert(explainWithoutExtended.contains("StateStoreRestore"))
    --- End diff --
    
    done.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19985
  
    **[Test build #84946 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84946/testReport)** for PR 19985 at commit [`8f44edf`](https://github.com/apache/spark/commit/8f44edf5a4edcc8f1c42331cf3ab9b694fb01925).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/19985
  
    cc @zsxwing 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19985#discussion_r157272539
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala ---
    @@ -418,6 +418,35 @@ class StreamSuite extends StreamTest {
         assert(OutputMode.Update === InternalOutputModes.Update)
       }
     
    +  override protected def sparkConf: SparkConf = super.sparkConf
    +    .set("spark.redaction.string.regex", "file:/[\\w_]+")
    +
    +  test("explain - redaction") {
    +    val replacement = "*********"
    +
    +    val inputData = MemoryStream[String]
    +    val df = inputData.toDS().map(_ + "foo").groupBy("value").agg(count("*"))
    +    // Test StreamingQuery.display
    +    val q = df.writeStream.queryName("memory_explain").outputMode("complete").format("memory")
    +      .start()
    +      .asInstanceOf[StreamingQueryWrapper]
    +      .streamingQuery
    +    try {
    +      inputData.addData("abc")
    +      q.processAllAvailable()
    +
    +      val explainWithoutExtended = q.explainInternal(false)
    +      assert(explainWithoutExtended.contains(replacement))
    +      assert(explainWithoutExtended.contains("StateStoreRestore"))
    --- End diff --
    
    nit: add `assert(!explainWithoutExtended.contains("file:/"))` to verify it does replace the correct content.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the issue:

    https://github.com/apache/spark/pull/19985
  
    Looks good. My major comment is using a session conf instead.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19985
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/19985
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19985
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19985#discussion_r157271920
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala ---
    @@ -418,6 +418,35 @@ class StreamSuite extends StreamTest {
         assert(OutputMode.Update === InternalOutputModes.Update)
       }
     
    +  override protected def sparkConf: SparkConf = super.sparkConf
    +    .set("spark.redaction.string.regex", "file:/[\\w_]+")
    +
    +  test("explain - redaction") {
    --- End diff --
    
    Is it because you want to use `explainInternal`, so you write a SS test?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19985
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84968/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19985#discussion_r157624496
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -1035,6 +1036,14 @@ object SQLConf {
         .booleanConf
         .createWithDefault(true)
     
    +  val SQL_STRING_REDACTION_PATTERN =
    +    ConfigBuilder("spark.sql.redaction.string.regex")
    +      .doc("Regex to decide which parts of strings produced by Spark contain sensitive " +
    +        "information. When this regex matches a string part, that string part is replaced by a " +
    +        "dummy value. This is currently used to redact the output of SQL explain commands. " +
    +        "When this conf is not set, the value from `spark.sql.redaction.string.regex` is used.")
    --- End diff --
    
    nit: `spark.sql.redaction.string.regex` -> `spark.redaction.string.regex`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19985
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85090/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19985
  
    **[Test build #84949 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84949/testReport)** for PR 19985 at commit [`8f44edf`](https://github.com/apache/spark/commit/8f44edf5a4edcc8f1c42331cf3ab9b694fb01925).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19985
  
    **[Test build #85002 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85002/testReport)** for PR 19985 at commit [`e246096`](https://github.com/apache/spark/commit/e246096cab0b0905d7952556ba3e8ecd15c0a2a5).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19985
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85002/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19985
  
    **[Test build #85077 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85077/testReport)** for PR 19985 at commit [`d50d594`](https://github.com/apache/spark/commit/d50d5947c32b8b06f03fa933a26547e67c317ee2).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/19985
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19985#discussion_r157652242
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/DataSourceScanExecRedactionSuite.scala ---
    @@ -52,4 +53,40 @@ class DataSourceScanExecRedactionSuite extends QueryTest with SharedSQLContext {
           assert(df.queryExecution.simpleString.contains(replacement))
         }
       }
    +
    +  private def isIncluded(queryExecution: QueryExecution, msg: String): Boolean = {
    +    queryExecution.toString.contains(msg) ||
    +    queryExecution.simpleString.contains(msg) ||
    +    queryExecution.stringWithStats.contains(msg)
    +  }
    +
    +  test("explain is redacted using SQLConf") {
    +    withTempDir { dir =>
    +      val basePath = dir.getCanonicalPath
    +      spark.range(0, 10).toDF("a").write.parquet(new Path(basePath, "foo=1").toString)
    +      val df = spark.read.parquet(basePath)
    +      val replacement = "*********"
    +
    +      // Respect SparkConf and replace file:/
    +      assert(isIncluded(df.queryExecution, replacement))
    +
    +      assert(isIncluded(df.queryExecution, "FileScan"))
    +      assert(!isIncluded(df.queryExecution, "file:/"))
    +
    +      withSQLConf(SQLConf.SQL_STRING_REDACTION_PATTERN.key -> "(?i)FileScan") {
    +        // Respect SQLConf and replace FileScan
    +        assert(isIncluded(df.queryExecution, replacement))
    +
    +        assert(!isIncluded(df.queryExecution, "FileScan"))
    +        assert(isIncluded(df.queryExecution, "file:/"))
    +      }
    +
    +      // Respect SparkConf and replace file:/
    +      assert(isIncluded(df.queryExecution, replacement))
    +
    +      assert(isIncluded(df.queryExecution, "FileScan"))
    +      assert(!isIncluded(df.queryExecution, "file:/"))
    --- End diff --
    
    what the difference between these 3 lines and https://github.com/apache/spark/pull/19985/files#diff-0c515221ed6e6eadcec71b3b9ad3a3e1R70 ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19985#discussion_r157337721
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala ---
    @@ -418,6 +418,35 @@ class StreamSuite extends StreamTest {
         assert(OutputMode.Update === InternalOutputModes.Update)
       }
     
    +  override protected def sparkConf: SparkConf = super.sparkConf
    +    .set("spark.redaction.string.regex", "file:/[\\w_]+")
    +
    +  test("explain - redaction") {
    --- End diff --
    
    Just because I found a potential security issue in SS. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19985
  
    **[Test build #84968 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84968/testReport)** for PR 19985 at commit [`8f44edf`](https://github.com/apache/spark/commit/8f44edf5a4edcc8f1c42331cf3ab9b694fb01925).
     * This patch **fails SparkR unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/19985


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19985#discussion_r157652152
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala ---
    @@ -231,6 +231,13 @@ class QueryExecution(val sparkSession: SparkSession, val logical: LogicalPlan) {
         """.stripMargin.trim
       }
     
    +  /**
    +   * Redact the sensitive information in the given string.
    +   */
    +  private def withRedaction(message: => String): String = {
    --- End diff --
    
    `=> String` looks not very useful here, we need to materialize anyway when calling `Utils.redact`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19985
  
    **[Test build #84949 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84949/testReport)** for PR 19985 at commit [`8f44edf`](https://github.com/apache/spark/commit/8f44edf5a4edcc8f1c42331cf3ab9b694fb01925).
     * This patch **fails SparkR unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/19985
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19985
  
    **[Test build #84968 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84968/testReport)** for PR 19985 at commit [`8f44edf`](https://github.com/apache/spark/commit/8f44edf5a4edcc8f1c42331cf3ab9b694fb01925).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19985
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19985
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19985#discussion_r157667492
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/DataSourceScanExecRedactionSuite.scala ---
    @@ -52,4 +53,40 @@ class DataSourceScanExecRedactionSuite extends QueryTest with SharedSQLContext {
           assert(df.queryExecution.simpleString.contains(replacement))
         }
       }
    +
    +  private def isIncluded(queryExecution: QueryExecution, msg: String): Boolean = {
    +    queryExecution.toString.contains(msg) ||
    +    queryExecution.simpleString.contains(msg) ||
    +    queryExecution.stringWithStats.contains(msg)
    +  }
    +
    +  test("explain is redacted using SQLConf") {
    +    withTempDir { dir =>
    +      val basePath = dir.getCanonicalPath
    +      spark.range(0, 10).toDF("a").write.parquet(new Path(basePath, "foo=1").toString)
    +      val df = spark.read.parquet(basePath)
    +      val replacement = "*********"
    +
    +      // Respect SparkConf and replace file:/
    +      assert(isIncluded(df.queryExecution, replacement))
    +
    +      assert(isIncluded(df.queryExecution, "FileScan"))
    +      assert(!isIncluded(df.queryExecution, "file:/"))
    +
    +      withSQLConf(SQLConf.SQL_STRING_REDACTION_PATTERN.key -> "(?i)FileScan") {
    +        // Respect SQLConf and replace FileScan
    +        assert(isIncluded(df.queryExecution, replacement))
    +
    +        assert(!isIncluded(df.queryExecution, "FileScan"))
    +        assert(isIncluded(df.queryExecution, "file:/"))
    +      }
    +
    +      // Respect SparkConf and replace file:/
    +      assert(isIncluded(df.queryExecution, replacement))
    +
    +      assert(isIncluded(df.queryExecution, "FileScan"))
    +      assert(!isIncluded(df.queryExecution, "file:/"))
    --- End diff --
    
    Removed


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org