You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by vanzin <gi...@git.apache.org> on 2018/04/25 17:36:26 UTC

[GitHub] spark pull request #21158: [SPARK-23850][sql] Add separate config for SQL op...

GitHub user vanzin opened a pull request:

    https://github.com/apache/spark/pull/21158

    [SPARK-23850][sql] Add separate config for SQL options redaction.

    The old code was relying on a core configuration and extended its
    default value to include things that redact desired things in the
    app's environment. Instead, add a SQL-specific option for which
    options to redact, and apply both the core and SQL-specific rules
    when redacting the options in the save command.
    
    This is a little sub-optimal since it adds another config, but it
    retains the current default behavior.
    
    While there I also fixed a typo and a couple of minor config API
    usage issues in the related redaction option that SQL already had.
    
    Tested with existing unit tests, plus checking the env page on
    a shell UI.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/vanzin/spark SPARK-23850

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21158.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21158
    
----
commit deb299b0211977735f6a830f763d4b9c37f5960c
Author: Marcelo Vanzin <va...@...>
Date:   2018-04-24T22:52:58Z

    [SPARK-23850][sql] Add separate config for SQL options redaction.
    
    The old code was relying on a core configuration and extended its
    default value to include things that redact desired things in the
    app's environment. Instead, add a SQL-specific option for which
    options to redact, and apply both the core and SQL-specific rules
    when redacting the options in the save command.
    
    This is a little sub-optimal since it adds another config, but it
    retains the current default behavior.
    
    While there I also fixed a typo and a couple of minor config API
    usage issues in the "sistes" redaction option that SQL already had.
    
    Tested with existing unit tests, plus checking the env page on
    a shell UI.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21158: [SPARK-23850][sql] Add separate config for SQL options r...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/21158
  
    Merging to master / 2.3 / 2.2.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21158: [SPARK-23850][sql] Add separate config for SQL options r...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/21158
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21158: [SPARK-23850][sql] Add separate config for SQL options r...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/21158
  
    Ping @gatorsmile 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21158: [SPARK-23850][sql] Add separate config for SQL options r...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21158
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21158: [SPARK-23850][sql] Add separate config for SQL options r...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21158
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21158: [SPARK-23850][sql] Add separate config for SQL op...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/21158


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21158: [SPARK-23850][sql] Add separate config for SQL options r...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/21158
  
    The confs in the core modules are not part of the outputs of SQL statement `SET -v`, which only outputs the confs in Spark SQL.
    
    We are making a behavior change here. I am not confident about the changes in this PR. I would like to hear more options from the others.  



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21158: [SPARK-23850][sql] Add separate config for SQL options r...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/21158
  
    > To keep all the previous behaviour, SQL_OPTIONS_REDACTION_PATTERN can include user
    
    User names, unlike passwords, are useful for debugging. And they're not meant to be secret. They're meant to identify an entity, and by that, it means it's not generally hard to guess them. Which is why you need a password.
    
    (Think it in a different way: if you access a table you shouldn't, wouldn't you get an exception saying "user blah cannot access table foo"? And are you redacting that in the places where that stuff shows up?)
    
    If you have an environment where even user names are considered secret, it's easy enough to change the configuration. But at that time you really should think hard about following Tom's advice above and just enable authentication for your web UIs. Otherwise you're not really taking security seriously.
    
    I really disliked even keeping the URL redacted, since that's even more useful than the user for debugging. But some vendors still support and even document putting passwords in those URLs, so that's why I kept it.
    
    If you guys really feel strongly about redacting user names, I'll add it back in the SQL config. I don't really care about that part that much, even if I don't agree with the premise. But I strongly disagree with keeping the current default value in the core option.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21158: [SPARK-23850][sql] Add separate config for SQL options r...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/21158
  
    But the main thing is that the change modified the previous behavior which was *not* to redact these things in the previous places where redaction occurred. So, in a way, this change is fixing a regression in 2.3.0 (which was also added to 2.2.1 according to the bug), and still providing the same feature from SPARK-22479.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21158: [SPARK-23850][sql] Add separate config for SQL options r...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/21158
  
    > We are making a behavior change here.
    
    We are *not*. That's the whole reason why I'm adding the SQL-specific option to extend the behavior of the core options.
    
    If I just wanted to revert the original behavior change, which added more unwanted default redactions to the core options, I'd just have changed the default value of the core option back to what it was.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21158: [SPARK-23850][sql] Add separate config for SQL op...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21158#discussion_r184152311
  
    --- Diff: core/src/main/scala/org/apache/spark/internal/config/package.scala ---
    @@ -342,7 +342,7 @@ package object config {
             "a property key or value, the value is redacted from the environment UI and various logs " +
             "like YARN and event logs.")
           .regexConf
    -      .createWithDefault("(?i)secret|password|url|user|username".r)
    +      .createWithDefault("(?i)secret|password".r)
    --- End diff --
    
    Not sure what you're trying to imply? The default is redacting too much. That needs to be fixed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21158: [SPARK-23850][sql] Add separate config for SQL options r...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/21158
  
    I disagree on this. I find it a regression that you blocked url's and user names in the first place.  Showing these things have been a feature in spark for a long time.  The only security issue here is with jdbc driver, block that instance.   Are there any other places spark itself prints or uses a url with a password?  If so we should fix those.
    
    If not then it would be specified in a user config and if they are doing that, they should change the config to add it to be redacted.  I think the current behavior is purely worse from a user perspective.  There are so many urls that have no security implications and us blocking them by default I think is wrong.  


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21158: [SPARK-23850][sql] Add separate config for SQL options r...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/21158
  
    Last ping before I assume everybody is ok with the patch.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21158: [SPARK-23850][sql] Add separate config for SQL options r...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/21158
  
    @vanzin That really depends on how the users deploy Spark. Not all the Spark end users are allowed to access the command lines and run `ls`, but they are allowed to read Spark UI in almost all the cases.
    
    @tgravescs @vanzin This PR tries to not affect the original scenario explained in the JIRA SPARK-22479. I would say thank you @vanzin for your efforts. 
    
    The PR to fix SPARK-22479 also impacts more cases (e.g., in the environment page) too. I can understand your arguments. In your scenarios, it would be nice to expose all these info to the external users by default, as what we did in the past. Since the new behaviors have already been introduced in the last release, it would be better to keep them unchanged, if possible, based on the principle of least surprise.
    
    The lessons we learned is to document the behavior change in the migration guide. When users try the RC branches, the users could easily catch the changes. Unfortunately, we did not do it in that PR. Thus, it becomes a surprise to @tgravescs after we already released Spark 2.3. We need to avoid similar cases. We need to enforce every PR to document the behavior changes, no matter whether it is a regression fix or a bug fix.
    
    I am also fine to merge this if the other reviewers of the previous PR think we should change the behavior again. cc @onursatici @ash211 @hvanhovell @jiangxb1987 
    
    In the PR, please document the behavior changes. Hopefully, we will not see another PR to change the default value again after we release Spark 2.4. cc @vanzin @tgravescs WDYT?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21158: [SPARK-23850][sql] Add separate config for SQL options r...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/21158
  
    I would like to hear more inputs from the reviewers of the original PR. cc @onursatici @ash211 @hvanhovell @jiangxb1987


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21158: [SPARK-23850][sql] Add separate config for SQL options r...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/21158
  
    Also, Spark by default is shipped with not secure settings.  Meaning spark.acls.enable is false and spark.authenticate is false.  I see no reason to make the redact configs more strict then our defaults for those (Note I'm not arguing we shouldn't redact credentials where Spark itself is showing).  If a user turns on the acls, then by default only the user who submitted the job can see the UI.
    
    Going back to https://issues.apache.org/jira/browse/SPARK-22479 it looks like the user can see the url via the console, logs and I assume in the UI.  Is there somewhere else someone can see this information?  I want to make sure I understand the vulnerability here.
    
    If you enable security properly on spark no user should have access to those without being given permission.  If you are not running with acls and authentication on then I would argue there are a lot of attack vectors to where I could run things as another user anyway.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21158: [SPARK-23850][sql] Add separate config for SQL options r...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/21158
  
    @gatorsmile seriously, why are you focusing on the fact that I'm changing that default value when I wrote a bunch of code to actually maintain the URL redaction on the SQL plans, which was the original intent? Did you even read the PR at all?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21158: [SPARK-23850][sql] Add separate config for SQL options r...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/21158
  
    > usernames could have potentially personal identifiable information
    
    User names are not sensitive information. There's a ton of places where you can see them outside of the Spark UI (have you tried running `ls` on you machine lately?).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21158: [SPARK-23850][sql] Add separate config for SQL options r...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/21158
  
    @gatorsmile is there anything else you're looking for here?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21158: [SPARK-23850][sql] Add separate config for SQL options r...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/21158
  
    ping again


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21158: [SPARK-23850][sql] Add separate config for SQL options r...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/21158
  
    Also note that Tom's comment above is not 100% accurate - this change *does* redact the JDBC URL on the SQL side, for reasons I described in my comments on the bug. That's the whole reason why the SQL changes are even needed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21158: [SPARK-23850][sql] Add separate config for SQL options r...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21158
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21158: [SPARK-23850][sql] Add separate config for SQL options r...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/21158
  
    the original change was merged into 2.3.0, see https://issues.apache.org/jira/browse/SPARK-22479.   We are keeping the behavior of that fix and that is to not show the credentials of the jdbc connection.  The url and user name were added for some reason in that jira as well and they shouldn't have been. There is nothing sensitive about these.  I don't see this being a breaking compatibility issue.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21158: [SPARK-23850][sql] Add separate config for SQL options r...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/21158
  
    > Not all the Spark end users are allowed to access the command lines and run ls
    
    Come on. That is just an example. You surely are smart enough to think of another example where you can see other people's user names outside of Spark.
    
    > it would be better to keep them unchanged, if possible
    
    I disagree. Can you list all the Spark configuration that would be redacted by the conf in 2.3? I could only find two:
    
    * spark.driver.userClassPathFirsrt
    * spark.executor.userClassPathFirst
    
    And those are a great example of why the original change is a regression. It's wrong to redact those. It makes debugging things harder.
    
    And yet there are a ton of config keys where you can provide URLs, the thing you seem really worried about, which are not caught by that regex. Let's see... spark.master, spark.jars, spark.files, spark.yarn.archives, spark.yarn.jars, spark.jars.repositories... I could go on.
    
    > Since the new behaviors have already been introduced in the last release,
    
    Every change introduces a behavior change (unless you're like changing a comment string or something like that). Saying a change is not wanted because it changes behavior is really quite silly.
    
    The question is always whether the change is desired or not. And in this case *it is*. The change that was introduced in 2.3.0 is a regression in behavior from 2.2, or, in other words, it makes things *worse*. So we should fix it.
    
    >  please document the behavior changes.
    
    As I've stated in previous comments, I don't think there is a user-visible behavior change - at least one users would care about. As I mentioned above, the default value in 2.3.0 only causes problems, and doesn't apply to anything useful other than the JDBC URL, a behavior that this change keeps.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21158: [SPARK-23850][sql] Add separate config for SQL options r...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/21158
  
    ping


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21158: [SPARK-23850][sql] Add separate config for SQL options r...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21158
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21158: [SPARK-23850][sql] Add separate config for SQL options r...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/21158
  
    (I'll open a separate PR for 2.2 to run tests before pushing, since there are conflicts.)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21158: [SPARK-23850][sql] Add separate config for SQL options r...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21158
  
    **[Test build #90750 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90750/testReport)** for PR 21158 at commit [`6ca25e3`](https://github.com/apache/spark/commit/6ca25e3db8966c47f58d206e3f6e7bdce7786478).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21158: [SPARK-23850][sql] Add separate config for SQL options r...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21158
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2679/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21158: [SPARK-23850][sql] Add separate config for SQL options r...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21158
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89855/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21158: [SPARK-23850][sql] Add separate config for SQL options r...

Posted by onursatici <gi...@git.apache.org>.
Github user onursatici commented on the issue:

    https://github.com/apache/spark/pull/21158
  
    I think this PR is fine given it preserves the redaction of URL's. That was indeed the main reason why url's were in the default redaction pattern in the previous PR.
    To keep all the previous behaviour, `SQL_OPTIONS_REDACTION_PATTERN` can include `user`, as there might be environments considering user name information on sql connection options as db credentials. Showing that information could be opt-in if we redact by default.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21158: [SPARK-23850][sql] Add separate config for SQL options r...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21158
  
    **[Test build #89855 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89855/testReport)** for PR 21158 at commit [`deb299b`](https://github.com/apache/spark/commit/deb299b0211977735f6a830f763d4b9c37f5960c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21158: [SPARK-23850][sql] Add separate config for SQL options r...

Posted by onursatici <gi...@git.apache.org>.
Github user onursatici commented on the issue:

    https://github.com/apache/spark/pull/21158
  
    I don't have any strong opinions about usernames, so merging this as it stays is fine by me


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21158: [SPARK-23850][sql] Add separate config for SQL op...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21158#discussion_r185143980
  
    --- Diff: core/src/main/scala/org/apache/spark/internal/config/package.scala ---
    @@ -342,7 +342,7 @@ package object config {
             "a property key or value, the value is redacted from the environment UI and various logs " +
             "like YARN and event logs.")
           .regexConf
    -      .createWithDefault("(?i)secret|password|url|user|username".r)
    +      .createWithDefault("(?i)secret|password".r)
    --- End diff --
    
    URIs could contain the access keys and usernames could have potentially personal identifiable information. Could we do not change the default?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21158: [SPARK-23850][sql] Add separate config for SQL options r...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21158
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90750/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21158: [SPARK-23850][sql] Add separate config for SQL options r...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/21158
  
    Looks good.  I didn't have a jdbc connection handy to test with.
    
    I was looking for documentation on these sql confs and I don't see them in the SET -v output, did we decide not to document these?  I haven't looked to see how that works.
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21158: [SPARK-23850][sql] Add separate config for SQL options r...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/21158
  
    Is the default `(?i)secret|password|url|user|username".r` merged to the released branches? If the default is not in the previously release, I am fine to change it back. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21158: [SPARK-23850][sql] Add separate config for SQL options r...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/21158
  
    The default value introduced by SPARK-22479 was already part of Apache Spark 2.3. It is hard to say this is a regression, since URIs could contain the access keys and usernames could have potentially personal identifiable information. Thus, it also makes sense to block them. Although the original JIRA is for JDBC, we also make it more secure. If users do not want to redact them, they can change the default. I am not feeling comfortable to not redact them in the current situation, since we already did it in Spark 2.3.  All the security issues could be serious. Thus, I think we should make it unchanged. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21158: [SPARK-23850][sql] Add separate config for SQL op...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21158#discussion_r184151995
  
    --- Diff: core/src/main/scala/org/apache/spark/internal/config/package.scala ---
    @@ -342,7 +342,7 @@ package object config {
             "a property key or value, the value is redacted from the environment UI and various logs " +
             "like YARN and event logs.")
           .regexConf
    -      .createWithDefault("(?i)secret|password|url|user|username".r)
    +      .createWithDefault("(?i)secret|password".r)
    --- End diff --
    
    If the users want to see the values, they can change the default by themselves, right?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21158: [SPARK-23850][sql] Add separate config for SQL options r...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21158
  
    **[Test build #89855 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89855/testReport)** for PR 21158 at commit [`deb299b`](https://github.com/apache/spark/commit/deb299b0211977735f6a830f763d4b9c37f5960c).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21158: [SPARK-23850][sql] Add separate config for SQL options r...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21158
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3309/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21158: [SPARK-23850][sql] Add separate config for SQL options r...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/21158
  
    > I was looking for documentation on these sql confs
    
    That's part of this fix, if you're talking about the old config. Since it was using `ConfigBuilder` directly instead of `buildConf`, it did not make it to the output of "SET -v".
    
    If that was intentional then probably that part should be reverted - don't see why it should be hidden, but maybe @gatorsmile had a reason.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21158: [SPARK-23850][sql] Add separate config for SQL op...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21158#discussion_r184152897
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala ---
    @@ -225,7 +225,7 @@ class QueryExecution(val sparkSession: SparkSession, val logical: LogicalPlan) {
        * Redact the sensitive information in the given string.
        */
       private def withRedaction(message: String): String = {
    -    Utils.redact(sparkSession.sessionState.conf.stringRedationPattern, message)
    +    Utils.redact(sparkSession.sessionState.conf.stringRedactionPattern, message)
    --- End diff --
    
    Thanks for fixing this typo


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21158: [SPARK-23850][sql] Add separate config for SQL options r...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/21158
  
    That's fine, but I'm not going to wait indefinitely for their feedback. They've already been pinged multiple times here and on jira to comment on this issue.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21158: [SPARK-23850][sql] Add separate config for SQL options r...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/21158
  
    I plan to push this tomorrow by EOD unless I hear strong opposition, and merge it to 2.3 and 2.2 (where the original change was also backported). I want to avoid having another release out with the undesired behavior.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21158: [SPARK-23850][sql] Add separate config for SQL options r...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21158
  
    **[Test build #90750 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90750/testReport)** for PR 21158 at commit [`6ca25e3`](https://github.com/apache/spark/commit/6ca25e3db8966c47f58d206e3f6e7bdce7786478).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21158: [SPARK-23850][sql] Add separate config for SQL op...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21158#discussion_r185142687
  
    --- Diff: core/src/main/scala/org/apache/spark/internal/config/package.scala ---
    @@ -342,7 +342,7 @@ package object config {
             "a property key or value, the value is redacted from the environment UI and various logs " +
             "like YARN and event logs.")
           .regexConf
    -      .createWithDefault("(?i)secret|password|url|user|username".r)
    +      .createWithDefault("(?i)secret|password".r)
    --- End diff --
    
    Normally, making it stricter is not a big deal. However, if we want to relax it, this could be a big deal since users might expect these variables have been hided by default. cc the original reviewers and author @onursatici @ash211 @hvanhovell @jiangxb1987 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org