You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Charles Givre <cg...@gmail.com> on 2020/04/17 14:34:16 UTC

[DISCUSS]: Masking Creds in Query Plans

Hello all, 
I was thinking about this, if a user were to execute an EXPLAIN PLAN FOR query, they get a lot of information about the storage plugin, including in some cases creds.
The example below shows a query plan for the JDBC storage plugin.   As you can see, the user creds are right there.... 

I'm wondering would it be advisable or possible to mask the creds in query plans so that users can't access this information?  If masking it isn't an option, is there some other way to prevent users from seeing this information?  In a multi-tenant environment, it seems like a rather large security hole. 
Thanks,
-- C


{
  "head" : {
    "version" : 1,
    "generator" : {
      "type" : "ExplainHandler",
      "info" : ""
    },
    "type" : "APACHE_DRILL_PHYSICAL",
    "options" : [ ],
    "queue" : 0,
    "hasResourcePlan" : false,
    "resultMode" : "EXEC"
  },
  "graph" : [ {
    "pop" : "jdbc-scan",
    "@id" : 5,
    "sql" : "SELECT *\nFROM `stats`.`batting`",
    "columns" : [ "`playerID`", "`yearID`", "`stint`", "`teamID`", "`lgID`", "`G`", "`AB`", "`R`", "`H`", "`2B`", "`3B`", "`HR`", "`RBI`", "`SB`", "`CS`", "`BB`", "`SO`", "`IBB`", "`HBP`", "`SH`", "`SF`", "`GIDP`" ],
    "config" : {
      "type" : "jdbc",
      "driver" : "com.mysql.cj.jdbc.Driver",
      "url" : "jdbc:mysql://localhost:3306/?serverTimezone=EST5EDT",
      "username" : "<username>",
      "password" : "<password>",
      "caseInsensitiveTableNames" : false,
      "sourceParameters" : { },
      "enabled" : true
    },
    "userName" : "",
    "cost" : {
      "memoryCost" : 1.6777216E7,
      "outputRowCount" : 100.0
    }
  }, {
    "pop" : "limit",
    "@id" : 4,
    "child" : 5,
    "first" : 0,
    "last" : 10,
    "initialAllocation" : 1000000,
    "maxAllocation" : 10000000000,
    "cost" : {
      "memoryCost" : 1.6777216E7,
      "outputRowCount" : 10.0
    }
  }, {
    "pop" : "limit",
    "@id" : 3,



Re: [DISCUSS]: Masking Creds in Query Plans

Posted by Paul Rogers <pa...@yahoo.com.INVALID>.
Hi Charles,

Excellent point. The problem is deeper. Drill serializes plugin configs in the query plan which it sends to each worker (Drillbit.) Why? To avoid race conditions if you start a query then change the plugin config and thus different nodes see different versions of the config.

Masking can't happen in the execution plan or the plan won't work. (I hope your password is not actually "*******".) So, masking would have to happen in logs and in the EXPLAIN PLAN FOR. This would, in turn, require that we have code that understands each config well enough to make a copy of the config with the credentials masked so we can then serialize the copied plan to JSON. (Or, we'd have to edit the JSON after generated.) Both are pretty ugly and not very secure.

What we need is some kind of "vault" interface: a config which is a key into a vault where Drill itself has been given the key, and the vault returns the actual credential value. As a security guy yourself, what would you recommend as our target? Should we create a generic API? Is there some system common enough on Hadoop systems that we should target that as our reference implementation? Also, can you perhaps file a JIRA ticket for this issue?

Thanks,
- Paul

 

    On Friday, April 17, 2020, 7:34:32 AM PDT, Charles Givre <cg...@gmail.com> wrote:  
 
 Hello all, 
I was thinking about this, if a user were to execute an EXPLAIN PLAN FOR query, they get a lot of information about the storage plugin, including in some cases creds.
The example below shows a query plan for the JDBC storage plugin.  As you can see, the user creds are right there.... 

I'm wondering would it be advisable or possible to mask the creds in query plans so that users can't access this information?  If masking it isn't an option, is there some other way to prevent users from seeing this information?  In a multi-tenant environment, it seems like a rather large security hole. 
Thanks,
-- C


{
  "head" : {
    "version" : 1,
    "generator" : {
      "type" : "ExplainHandler",
      "info" : ""
    },
    "type" : "APACHE_DRILL_PHYSICAL",
    "options" : [ ],
    "queue" : 0,
    "hasResourcePlan" : false,
    "resultMode" : "EXEC"
  },
  "graph" : [ {
    "pop" : "jdbc-scan",
    "@id" : 5,
    "sql" : "SELECT *\nFROM `stats`.`batting`",
    "columns" : [ "`playerID`", "`yearID`", "`stint`", "`teamID`", "`lgID`", "`G`", "`AB`", "`R`", "`H`", "`2B`", "`3B`", "`HR`", "`RBI`", "`SB`", "`CS`", "`BB`", "`SO`", "`IBB`", "`HBP`", "`SH`", "`SF`", "`GIDP`" ],
    "config" : {
      "type" : "jdbc",
      "driver" : "com.mysql.cj.jdbc.Driver",
      "url" : "jdbc:mysql://localhost:3306/?serverTimezone=EST5EDT",
      "username" : "<username>",
      "password" : "<password>",
      "caseInsensitiveTableNames" : false,
      "sourceParameters" : { },
      "enabled" : true
    },
    "userName" : "",
    "cost" : {
      "memoryCost" : 1.6777216E7,
      "outputRowCount" : 100.0
    }
  }, {
    "pop" : "limit",
    "@id" : 4,
    "child" : 5,
    "first" : 0,
    "last" : 10,
    "initialAllocation" : 1000000,
    "maxAllocation" : 10000000000,
    "cost" : {
      "memoryCost" : 1.6777216E7,
      "outputRowCount" : 10.0
    }
  }, {
    "pop" : "limit",
    "@id" : 3,

  

Re: [DISCUSS]: Masking Creds in Query Plans

Posted by Arina Ielchiieva <ar...@gmail.com>.
Agree, that we should not display sensitive data, like passwords, I would say the best option is to mask it during output.

Kind regards,
Arina

> On Apr 17, 2020, at 5:34 PM, Charles Givre <cg...@gmail.com> wrote:
> 
> Hello all, 
> I was thinking about this, if a user were to execute an EXPLAIN PLAN FOR query, they get a lot of information about the storage plugin, including in some cases creds.
> The example below shows a query plan for the JDBC storage plugin.   As you can see, the user creds are right there.... 
> 
> I'm wondering would it be advisable or possible to mask the creds in query plans so that users can't access this information?  If masking it isn't an option, is there some other way to prevent users from seeing this information?  In a multi-tenant environment, it seems like a rather large security hole. 
> Thanks,
> -- C
> 
> 
> {
>  "head" : {
>    "version" : 1,
>    "generator" : {
>      "type" : "ExplainHandler",
>      "info" : ""
>    },
>    "type" : "APACHE_DRILL_PHYSICAL",
>    "options" : [ ],
>    "queue" : 0,
>    "hasResourcePlan" : false,
>    "resultMode" : "EXEC"
>  },
>  "graph" : [ {
>    "pop" : "jdbc-scan",
>    "@id" : 5,
>    "sql" : "SELECT *\nFROM `stats`.`batting`",
>    "columns" : [ "`playerID`", "`yearID`", "`stint`", "`teamID`", "`lgID`", "`G`", "`AB`", "`R`", "`H`", "`2B`", "`3B`", "`HR`", "`RBI`", "`SB`", "`CS`", "`BB`", "`SO`", "`IBB`", "`HBP`", "`SH`", "`SF`", "`GIDP`" ],
>    "config" : {
>      "type" : "jdbc",
>      "driver" : "com.mysql.cj.jdbc.Driver",
>      "url" : "jdbc:mysql://localhost:3306/?serverTimezone=EST5EDT",
>      "username" : "<username>",
>      "password" : "<password>",
>      "caseInsensitiveTableNames" : false,
>      "sourceParameters" : { },
>      "enabled" : true
>    },
>    "userName" : "",
>    "cost" : {
>      "memoryCost" : 1.6777216E7,
>      "outputRowCount" : 100.0
>    }
>  }, {
>    "pop" : "limit",
>    "@id" : 4,
>    "child" : 5,
>    "first" : 0,
>    "last" : 10,
>    "initialAllocation" : 1000000,
>    "maxAllocation" : 10000000000,
>    "cost" : {
>      "memoryCost" : 1.6777216E7,
>      "outputRowCount" : 10.0
>    }
>  }, {
>    "pop" : "limit",
>    "@id" : 3,
> 
>