You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by sa...@apache.org on 2021/08/25 12:32:15 UTC

[spark] branch branch-3.2 updated: [SPARK-36398][SQL] Redact sensitive information in Spark Thrift Server log

This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch branch-3.2
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.2 by this push:
     new fb38887  [SPARK-36398][SQL] Redact sensitive information in Spark Thrift Server log
fb38887 is described below

commit fb38887e001d33adef519d0288bd0844dcfe2bd5
Author: Kousuke Saruta <sa...@oss.nttdata.com>
AuthorDate: Wed Aug 25 21:30:43 2021 +0900

    [SPARK-36398][SQL] Redact sensitive information in Spark Thrift Server log
    
    ### What changes were proposed in this pull request?
    
    This PR fixes an issue that there is no way to redact sensitive information in Spark Thrift Server log.
    For example, JDBC password can be exposed in the log.
    ```
    21/08/25 18:52:37 INFO SparkExecuteStatementOperation: Submitting query 'CREATE TABLE mytbl2(a int) OPTIONS(url="jdbc:mysql//example.com:3306", driver="com.mysql.jdbc.Driver", dbtable="test_tbl", user="test_usr", password="abcde")' with ca14ae38-1aaf-4bf4-a099-06b8e5337613
    ```
    
    ### Why are the changes needed?
    
    Bug fix.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No.
    
    ### How was this patch tested?
    
    Ran ThriftServer, connect to it and execute `CREATE TABLE mytbl2(a int) OPTIONS(url="jdbc:mysql//example.com:3306", driver="com.mysql.jdbc.Driver", dbtable="test_tbl", user="test_usr", password="abcde");` with `spark.sql.redaction.string.regex=((?i)(?<=password=))(".*")|('.*')`
    Then, confirmed the log.
    ```
    21/08/25 18:54:11 INFO SparkExecuteStatementOperation: Submitting query 'CREATE TABLE mytbl2(a int) OPTIONS(url="jdbc:mysql//example.com:3306", driver="com.mysql.jdbc.Driver", dbtable="test_tbl", user="test_usr", password=*********(redacted))' with ffc627e2-b1a8-4d83-ab6d-d819b3ccd909
    ```
    
    Closes #33832 from sarutak/fix-SPARK-36398.
    
    Authored-by: Kousuke Saruta <sa...@oss.nttdata.com>
    Signed-off-by: Kousuke Saruta <sa...@oss.nttdata.com>
    (cherry picked from commit b2ff01608f5ecdba19630e12478bd370f9766f7b)
    Signed-off-by: Kousuke Saruta <sa...@oss.nttdata.com>
---
 .../spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala    | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
index 0df5885..4f40889 100644
--- a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
+++ b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
@@ -185,8 +185,8 @@ private[hive] class SparkExecuteStatementOperation(
 
   override def runInternal(): Unit = {
     setState(OperationState.PENDING)
-    logInfo(s"Submitting query '$statement' with $statementId")
     val redactedStatement = SparkUtils.redact(sqlContext.conf.stringRedactionPattern, statement)
+    logInfo(s"Submitting query '$redactedStatement' with $statementId")
     HiveThriftServer2.eventManager.onStatementStart(
       statementId,
       parentSession.getSessionHandle.getSessionId.toString,

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org