You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by dmvieira <gi...@git.apache.org> on 2017/07/29 01:07:15 UTC

[GitHub] spark pull request #18765: [SPARK-19720][CORE] Redact sensitive information ...

GitHub user dmvieira opened a pull request:

    https://github.com/apache/spark/pull/18765

    [SPARK-19720][CORE] Redact sensitive information from SparkSubmit con…

    …sole
    
    This change redacts senstive information (based on default password and secret regex)
    from the Spark Submit console logs. Such sensitive information is already being
    redacted from event logs and yarn logs, etc.
    
    Testing was done manually to make sure that the console logs were not printing any
    sensitive information.
    
    Here's some output from the console:
    
    ```
    Spark properties used, including those specified through
     --conf and those from the properties file /etc/spark2/conf/spark-defaults.conf:
      (spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted))
      (spark.authenticate,false)
      (spark.executorEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted))
    ```
    
    ```
    System properties:
    (spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted))
    (spark.authenticate,false)
    (spark.executorEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted))
    ```
    There is a risk if new print statements were added to the console down the road, sensitive information may still get leaked, since there is no test that asserts on the console log output. I considered it out of the scope of this JIRA to write an integration test to make sure new leaks don't happen in the future.
    
    Running unit tests to make sure nothing else is broken by this change.
    
    Using reference from Mark Grover <ma...@apache.org>
    
    Closes #17047 for 2.1.2 spark vesion.
    
    ## What changes were proposed in this pull request?
    
    (Please fill in changes proposed in this fix)
    
    ## How was this patch tested?
    
    (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
    (If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
    
    Please review http://spark.apache.org/contributing.html before opening a pull request.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dmvieira/spark branch-2.1

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18765.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18765
    
----
commit 9e757820af7990f37d1cb5f8cd9c989fcf815cdf
Author: Mark Grover <ma...@apache.org>
Date:   2017-03-02T18:33:56Z

    [SPARK-19720][CORE] Redact sensitive information from SparkSubmit console
    
    This change redacts senstive information (based on default password and secret regex)
    from the Spark Submit console logs. Such sensitive information is already being
    redacted from event logs and yarn logs, etc.
    
    Testing was done manually to make sure that the console logs were not printing any
    sensitive information.
    
    Here's some output from the console:
    
    ```
    Spark properties used, including those specified through
     --conf and those from the properties file /etc/spark2/conf/spark-defaults.conf:
      (spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted))
      (spark.authenticate,false)
      (spark.executorEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted))
    ```
    
    ```
    System properties:
    (spark.yarn.appMasterEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted))
    (spark.authenticate,false)
    (spark.executorEnv.HADOOP_CREDSTORE_PASSWORD,*********(redacted))
    ```
    There is a risk if new print statements were added to the console down the road, sensitive information may still get leaked, since there is no test that asserts on the console log output. I considered it out of the scope of this JIRA to write an integration test to make sure new leaks don't happen in the future.
    
    Running unit tests to make sure nothing else is broken by this change.
    
    Using reference from Mark Grover <ma...@apache.org>
    
    Closes #17047 for 2.1.2 spark vesion.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18765: [SPARK-19720][CORE][BACKPORT-2.1] Redact sensitiv...

Posted by dmvieira <gi...@git.apache.org>.
Github user dmvieira commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18765#discussion_r130572412
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -2571,6 +2572,23 @@ private[spark] object Utils extends Logging {
           sparkJars.map(_.split(",")).map(_.filter(_.nonEmpty)).toSeq.flatten
         }
       }
    +
    +  private[util] val REDACTION_REPLACEMENT_TEXT = "*********(redacted)"
    +  private[util] val SECRET_REDACTION_PATTERN = "(?i)secret|password".r
    --- End diff --
    
    But I'm following UI logic at spark 2.1 version: https://github.com/apache/spark/blob/branch-2.1/core/src/main/scala/org/apache/spark/ui/env/EnvironmentPage.scala


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18765: [SPARK-19720][CORE] Redact sensitive information from Sp...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on the issue:

    https://github.com/apache/spark/pull/18765
  
    Should we backport this to 2.1 since it's a major bugfix(as described in the JIRA)? @vanzin @srowen 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18765: [SPARK-19720][CORE][BACKPORT-2.1] Redact sensitiv...

Posted by dmvieira <gi...@git.apache.org>.
Github user dmvieira commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18765#discussion_r131154059
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -2571,6 +2572,23 @@ private[spark] object Utils extends Logging {
           sparkJars.map(_.split(",")).map(_.filter(_.nonEmpty)).toSeq.flatten
         }
       }
    +
    +  private[util] val REDACTION_REPLACEMENT_TEXT = "*********(redacted)"
    +  private[util] val SECRET_REDACTION_PATTERN = "(?i)secret|password".r
    --- End diff --
    
    I did it work there... I tested here and UI and spark-submit already working. I think you can close this pull request and focus on #18802


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18765: [SPARK-19720][CORE][BACKPORT-2.1] Redact sensitive infor...

Posted by dmvieira <gi...@git.apache.org>.
Github user dmvieira commented on the issue:

    https://github.com/apache/spark/pull/18765
  
    Closing this PR since https://github.com/apache/spark/pull/18802 is completed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18765: [SPARK-19720][CORE][BACKPORT-2.1] Redact sensitive infor...

Posted by dmvieira <gi...@git.apache.org>.
Github user dmvieira commented on the issue:

    https://github.com/apache/spark/pull/18765
  
    Please @gatorsmile , check if it is better


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18765: [SPARK-19720][CORE][BACKPORT-2.1] Redact sensitiv...

Posted by dmvieira <gi...@git.apache.org>.
Github user dmvieira commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18765#discussion_r130676428
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -2571,6 +2572,23 @@ private[spark] object Utils extends Logging {
           sparkJars.map(_.split(",")).map(_.filter(_.nonEmpty)).toSeq.flatten
         }
       }
    +
    +  private[util] val REDACTION_REPLACEMENT_TEXT = "*********(redacted)"
    +  private[util] val SECRET_REDACTION_PATTERN = "(?i)secret|password".r
    --- End diff --
    
    Hi @markgrover ! My intention here was only fix this security breach making spark-submit redact patten similar to UI redact pattern. I can change it, but it will be a new feature backport and not a bugfix backport


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18765: [SPARK-19720][CORE][BACKPORT-2.1] Redact sensitiv...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18765#discussion_r130513976
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -2571,6 +2572,23 @@ private[spark] object Utils extends Logging {
           sparkJars.map(_.split(",")).map(_.filter(_.nonEmpty)).toSeq.flatten
         }
       }
    +
    +  private[util] val REDACTION_REPLACEMENT_TEXT = "*********(redacted)"
    +  private[util] val SECRET_REDACTION_PATTERN = "(?i)secret|password".r
    --- End diff --
    
    This should be a configurable SQLConf.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18765: [SPARK-19720][CORE][BACKPORT-2.1] Redact sensitiv...

Posted by markgrover <gi...@git.apache.org>.
Github user markgrover commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18765#discussion_r130668429
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -2571,6 +2572,23 @@ private[spark] object Utils extends Logging {
           sparkJars.map(_.split(",")).map(_.filter(_.nonEmpty)).toSeq.flatten
         }
       }
    +
    +  private[util] val REDACTION_REPLACEMENT_TEXT = "*********(redacted)"
    +  private[util] val SECRET_REDACTION_PATTERN = "(?i)secret|password".r
    --- End diff --
    
    I think what's really happening here is that we are backporting some changes introduced in SPARK-18535 while backporting this JIRA (SPARK-19720). SPARK-18535 is a dependency of this, so if we want to backport this, we should really be backporting SPARK-18535 as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18765: [SPARK-19720][CORE] Redact sensitive information from Sp...

Posted by dmvieira <gi...@git.apache.org>.
Github user dmvieira commented on the issue:

    https://github.com/apache/spark/pull/18765
  
    I'm sorry... I was just suggesting it because is a major issue as described here: https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-19720
    
    I'm using airflow for job submit and password appears in log if I want verbose mode in spark submit


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18765: [SPARK-19720][CORE] Redact sensitive information from Sp...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18765
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18765: [SPARK-19720][CORE] Redact sensitive information from Sp...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/18765
  
    This sounds reasonable to backport to 2.1. 
    
    First, please update your PR title with [BACKPORT-2.1]
    Second, please clean your PR description and also explain it is a backport PR at the beginning of the PR description. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18765: [SPARK-19720][CORE][BACKPORT-2.1] Redact sensitiv...

Posted by dmvieira <gi...@git.apache.org>.
Github user dmvieira closed the pull request at:

    https://github.com/apache/spark/pull/18765


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18765: [SPARK-19720][CORE][BACKPORT-2.1] Redact sensitiv...

Posted by dmvieira <gi...@git.apache.org>.
Github user dmvieira commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18765#discussion_r131036498
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -2571,6 +2572,23 @@ private[spark] object Utils extends Logging {
           sparkJars.map(_.split(",")).map(_.filter(_.nonEmpty)).toSeq.flatten
         }
       }
    +
    +  private[util] val REDACTION_REPLACEMENT_TEXT = "*********(redacted)"
    +  private[util] val SECRET_REDACTION_PATTERN = "(?i)secret|password".r
    --- End diff --
    
    I did PR but I don't know why Jenkins fail with access error... It sounds like permission issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18765: [SPARK-19720][CORE][BACKPORT-2.1] Redact sensitiv...

Posted by dmvieira <gi...@git.apache.org>.
Github user dmvieira commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18765#discussion_r130733138
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -2571,6 +2572,23 @@ private[spark] object Utils extends Logging {
           sparkJars.map(_.split(",")).map(_.filter(_.nonEmpty)).toSeq.flatten
         }
       }
    +
    +  private[util] val REDACTION_REPLACEMENT_TEXT = "*********(redacted)"
    +  private[util] val SECRET_REDACTION_PATTERN = "(?i)secret|password".r
    --- End diff --
    
    I did another pull request with all feature: https://github.com/apache/spark/pull/18802


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18765: [SPARK-19720][CORE][BACKPORT-2.1] Redact sensitive infor...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/18765
  
    You need to close it by yourself. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org