You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Bohdan Kazydub <bo...@gmail.com> on 2018/08/02 18:24:35 UTC

[DISCUSS] Add Hadoop Credentials API support for Drill S3 storage plugin

Hi all,

Currently, to access S3A filesystem, `fs.s3a.secret.key` and
`fs.s3a.access.key` properties should be configured either in S3 Storage
Plugin or in core-site.xml in plaintext. This approach is considered
unsecure. To eliminate a need to store passwords in plaintext,
CredentialProvider API [1] may be used to extract secret keys from
encrypted store.

Here is a document with implementation details:
https://docs.google.com/document/d/1ow4v5HOh0qJh-5KsZHqSjohM2ukGSayEd9360tHZZvo/edit#
.
And here is an open issue for the improvement:
https://issues.apache.org/jira/browse/DRILL-6662

Any thoughts?

[1]
https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/CredentialProviderAPI.html

Kind regards,
Bohdan

Re: [DISCUSS] Add Hadoop Credentials API support for Drill S3 storage plugin

Posted by John Omernik <jo...@omernik.com>.
I think in general Drill should consider the issue of credentials across
storage plugins. Credentials exist for S3, but also Mongo, JDBC, and others
as they get added.  This can be a pain to manage and leads to insecurely
setup Drill clusters.

One option may be to allow a generic integration with a Secrets stores like
Hashicorp Vault, or Kubernetes Secrets.  I am not sure the best approach
here, but one back of napkin idea is to have an plugin interface, where
users can retrieve passwords from stores.  (or perhaps even build it in to
Apache Drill itself) The goal here would be to allow Users to set once a
token/single password (Think Password Safe) at a session level on login.
This would allow Drill to use the session key, to open the password store
(whether K8s, Vault, or something built into Drill it self) and retrieve
all the passwords for the user. This would allow a one time session setting
of passwords for users, and still allow for secure use, storage, and
accountability of passwords in third party systems.



Process would be something like this:

Each storage plugin that could use a username/password would be able to set
a flag for username/access ID and then password/secret

The user exp would be something like this


1. I logon
2. I now have a session and can issue SQL. SQL for Filesystem stuff can
already just work.
3. If I want to access say a S3 bucket, I need to do something like ALTER
SESSION set session.key = ''
   * NOTE What ever this is , we have to find a way to allow it to NOT BE
LOGGED in queries. This process needs to allow for user accountability,
thus it should not be a normal query for logging purposes.
4. Now that I am logged in I could do something like
  ALTER SESSION set storage.plugin.s3.accessid = '' and ALTER SESSION set
storage.plugin.s3.secret ''
  * NOTE like the session.key, this would have to NOT BE LOGGED. In
addition, it should not print in plain text at select * from sys.options
(Perhaps this should be a special sys.options table of sys.secrets?)
5. Now, when ever a s3 query is made, it can use that (as long as
session.key is set correctly to unlock the password safe)
  * Note, storage.plugin.s3.x may not be enough, we may need to allow each
user to have multiple passwords per storage plugin based on defined
workspaces/plugins... or do we just define the plugin twice... should a
user have multiple passwords per data store? Could we just spoof that by
having a new plugin for each instance of store a user may need? Open to
discussion

6. When the session closes, the session.key is automatically ended. Any
plugins that require a key will require this to happen again.


This is just chicken scratch of an idea, but I think we need to think about
passwords holistically, and ensure that passwords are stored securely, and
allow for accountability and auditing of the connections various data
stores. I think this is a bigger problem them just S3 or JDBC, and
something that should be built into Drill's core.

John






On Thu, Aug 2, 2018 at 7:24 PM, Bohdan Kazydub <bo...@gmail.com>
wrote:

> Hi all,
>
> Currently, to access S3A filesystem, `fs.s3a.secret.key` and
> `fs.s3a.access.key` properties should be configured either in S3 Storage
> Plugin or in core-site.xml in plaintext. This approach is considered
> unsecure. To eliminate a need to store passwords in plaintext,
> CredentialProvider API [1] may be used to extract secret keys from
> encrypted store.
>
> Here is a document with implementation details:
> https://docs.google.com/document/d/1ow4v5HOh0qJh-
> 5KsZHqSjohM2ukGSayEd9360tHZZvo/edit#
> .
> And here is an open issue for the improvement:
> https://issues.apache.org/jira/browse/DRILL-6662
>
> Any thoughts?
>
> [1]
> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/
> CredentialProviderAPI.html
>
> Kind regards,
> Bohdan
>

Re: [DISCUSS] Add Hadoop Credentials API support for Drill S3 storage plugin

Posted by John Omernik <jo...@omernik.com>.
I think in general Drill should consider the issue of credentials across
storage plugins. Credentials exist for S3, but also Mongo, JDBC, and others
as they get added.  This can be a pain to manage and leads to insecurely
setup Drill clusters.

One option may be to allow a generic integration with a Secrets stores like
Hashicorp Vault, or Kubernetes Secrets.  I am not sure the best approach
here, but one back of napkin idea is to have an plugin interface, where
users can retrieve passwords from stores.  (or perhaps even build it in to
Apache Drill itself) The goal here would be to allow Users to set once a
token/single password (Think Password Safe) at a session level on login.
This would allow Drill to use the session key, to open the password store
(whether K8s, Vault, or something built into Drill it self) and retrieve
all the passwords for the user. This would allow a one time session setting
of passwords for users, and still allow for secure use, storage, and
accountability of passwords in third party systems.



Process would be something like this:

Each storage plugin that could use a username/password would be able to set
a flag for username/access ID and then password/secret

The user exp would be something like this


1. I logon
2. I now have a session and can issue SQL. SQL for Filesystem stuff can
already just work.
3. If I want to access say a S3 bucket, I need to do something like ALTER
SESSION set session.key = ''
   * NOTE What ever this is , we have to find a way to allow it to NOT BE
LOGGED in queries. This process needs to allow for user accountability,
thus it should not be a normal query for logging purposes.
4. Now that I am logged in I could do something like
  ALTER SESSION set storage.plugin.s3.accessid = '' and ALTER SESSION set
storage.plugin.s3.secret ''
  * NOTE like the session.key, this would have to NOT BE LOGGED. In
addition, it should not print in plain text at select * from sys.options
(Perhaps this should be a special sys.options table of sys.secrets?)
5. Now, when ever a s3 query is made, it can use that (as long as
session.key is set correctly to unlock the password safe)
  * Note, storage.plugin.s3.x may not be enough, we may need to allow each
user to have multiple passwords per storage plugin based on defined
workspaces/plugins... or do we just define the plugin twice... should a
user have multiple passwords per data store? Could we just spoof that by
having a new plugin for each instance of store a user may need? Open to
discussion

6. When the session closes, the session.key is automatically ended. Any
plugins that require a key will require this to happen again.


This is just chicken scratch of an idea, but I think we need to think about
passwords holistically, and ensure that passwords are stored securely, and
allow for accountability and auditing of the connections various data
stores. I think this is a bigger problem them just S3 or JDBC, and
something that should be built into Drill's core.

John






On Thu, Aug 2, 2018 at 7:24 PM, Bohdan Kazydub <bo...@gmail.com>
wrote:

> Hi all,
>
> Currently, to access S3A filesystem, `fs.s3a.secret.key` and
> `fs.s3a.access.key` properties should be configured either in S3 Storage
> Plugin or in core-site.xml in plaintext. This approach is considered
> unsecure. To eliminate a need to store passwords in plaintext,
> CredentialProvider API [1] may be used to extract secret keys from
> encrypted store.
>
> Here is a document with implementation details:
> https://docs.google.com/document/d/1ow4v5HOh0qJh-
> 5KsZHqSjohM2ukGSayEd9360tHZZvo/edit#
> .
> And here is an open issue for the improvement:
> https://issues.apache.org/jira/browse/DRILL-6662
>
> Any thoughts?
>
> [1]
> https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/
> CredentialProviderAPI.html
>
> Kind regards,
> Bohdan
>