You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by sa...@apache.org on 2021/08/18 04:32:22 UTC

[spark] branch master updated: [SPARK-36400][SPARK-36398][SQL][WEBUI] Make ThriftServer recognize spark.sql.redaction.string.regex

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new b914ff7  [SPARK-36400][SPARK-36398][SQL][WEBUI] Make ThriftServer recognize spark.sql.redaction.string.regex
b914ff7 is described below

commit b914ff7d54bd7c07e7313bb06a1fa22c36b628d2
Author: Kousuke Saruta <sa...@oss.nttdata.com>
AuthorDate: Wed Aug 18 13:31:22 2021 +0900

    [SPARK-36400][SPARK-36398][SQL][WEBUI] Make ThriftServer recognize spark.sql.redaction.string.regex
    
    ### What changes were proposed in this pull request?
    
    This PR fixes an issue that ThriftServer doesn't recognize `spark.sql.redaction.string.regex`.
    The problem is that sensitive information included in queries can be exposed.
    ![thrift-password1](https://user-images.githubusercontent.com/4736016/129440772-46379cc5-987b-41ac-adce-aaf2139f6955.png)
    ![thrift-password2](https://user-images.githubusercontent.com/4736016/129440775-fd328c0f-d128-4a20-82b0-46c331b9fd64.png)
    
    ### Why are the changes needed?
    
    Bug fix.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No.
    
    ### How was this patch tested?
    
    Ran ThriftServer, connect to it and execute `CREATE TABLE mytbl2(a int) OPTIONS(url="jdbc:mysql//example.com:3306", driver="com.mysql.jdbc.Driver", dbtable="test_tbl", user="test_usr", password="abcde");` with `spark.sql.redaction.string.regex=((?i)(?<=password=))(".*")|('.*')`
    Then, confirmed UI.
    
    ![thrift-hide-password1](https://user-images.githubusercontent.com/4736016/129440863-cabea247-d51f-41a4-80ac-6c64141e1fb7.png)
    ![thrift-hide-password2](https://user-images.githubusercontent.com/4736016/129440874-96cd0f0c-720b-4010-968a-cffbc85d2be5.png)
    
    Closes #33743 from sarutak/thrift-redact.
    
    Authored-by: Kousuke Saruta <sa...@oss.nttdata.com>
    Signed-off-by: Kousuke Saruta <sa...@oss.nttdata.com>
---
 .../spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala   | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
index f43f8e7..0df5885 100644
--- a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
+++ b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
@@ -186,10 +186,11 @@ private[hive] class SparkExecuteStatementOperation(
   override def runInternal(): Unit = {
     setState(OperationState.PENDING)
     logInfo(s"Submitting query '$statement' with $statementId")
+    val redactedStatement = SparkUtils.redact(sqlContext.conf.stringRedactionPattern, statement)
     HiveThriftServer2.eventManager.onStatementStart(
       statementId,
       parentSession.getSessionHandle.getSessionId.toString,
-      statement,
+      redactedStatement,
       statementId,
       parentSession.getUsername)
     setHasResultSet(true) // avoid no resultset for async run

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org