You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Daniel Standish <dp...@gmail.com> on 2020/03/08 06:40:58 UTC

Re: [DISCUSS] AIP-33 Creds Backend

Thanks Jarek for adding the PR link.  In addition to forgetting to include
that, I put the wrong AIP number in subject -- fixed now (AIP-33 is the
correct AIP).

I like the name Secrets -- I will let the other ideas percolate and will
look into your suggestion about using the plugin system.


On Sat, Mar 7, 2020 at 8:09 PM Jarek Potiuk <Ja...@polidea.com>
wrote:

> +1. I like it. And I like the general direction of the name as well not to
> be tied with connections.
>
> I think it's also worth mentioning that there is PR in progress about it
> here: https://github.com/apache/airflow/pull/6376/files
>
> I have one observation though. We might not necessarily go in this
> direction or maybe defer it to later discussion, but I thought about
> extending the use of such CredentialsBackend and make it simply a
> SecretsBackend (that would be a nice solution to discussing various names
> for it :)).  Daniel - you might consider that as a "loose" comment.
>
> I do not want to hijack your topic here, so feel free to cut the discussion
> here if you think it's going too far from the original proposal.
>
> Why don't we change it to a bit more generic "SecretsBackend" and provide a
> more generic interface to use by the Hooks/Operators? I think the current
> implementation is not very far from it. Additionally to just providing a
> "get_connection" method it could provide much more generic "get_secret" and
> this would give the Hooks/Operators an opportunity to use it to retrieve
> all various kinds of secrets, not only Connections. We already have a
> half-baked solution for that with the _COMMAND variables and maybe we can
> incorporate this in backwards-compatible way in the "SecretBackend'
> similarly as you did with Metastore and Env Backends.
>
> I think this would be a very handy feature - some of the secrets are not
> necessary in the URI form, they might be much more complex and contain
> better structured data and might be needed or available at different times
> than when connection is retrieved. Often secrets are short-living and they
> should be retrieved "just" before they are used - limiting it only to when
> the Hook is created and only to URI is quite a limitation I think.
>
> Again - feel free to disregard that comment, I just thought it might be a
> handy feature and maybe it's a good time to add it.
>
> J.
>
>
> On Sun, Mar 8, 2020 at 1:03 AM Daniel Standish <dp...@gmail.com>
> wrote:
>
> > We can currently retrieve connections from environment variables or the
> > metastore database.
> >
> > AIP-33
> > <
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-33+Creds+backend>
> > provides
> > a way to retrieve them from other sources, for example AWS SSM parameter
> > store.
> >
> > There are many instances in airflow where we allow for user customization
> > like this, for example with auth backend and hostname callable -- it just
> > hasn't been done yet with creds.
> >
> > *How is it implemented?*
> > This adds a base class BaseCredsBackend which takes over BaseHook's
> > get_connections method.  Then EnvironmentVariablesCredsBackend and
> > MetastoreCredsBackend are added as subclasses.  New implementations can
> be
> > added and user can configure precedence.  E.g. instead of the default env
> > vars > metastore, a user could specify SSM > metastore > env vars in
> > airflow.cfg.
> >
> > *Does this break anything?*
> > No. This is a relatively simple refactor that results in no change in the
> > default behavior unless user modifies config to enable an alternative
> > backend.
> >
> > *Why is this worth adding?*
> > If implemented, you could store creds anywhere, as long as you can write
> a
> > `get_connections` method that retrieves them.
> > This can make it possible for a team of devs to share one creds source
> > rather than each dev storing them in text files, for example.
> > Can also make it easier to spin up a dev instance since you don't have to
> > worry about loading creds.
> > And if you don't have access to the airflow CLI (because you are on a
> cloud
> > platform perhaps) then this can provide an easy way to load and manage
> > creds.
> >
> > *Why do you call it creds?*
> > I don't need to call it creds.  I could call it BaseConnectionBackend or
> > BaseConnectionInfoBackend or any other thing -- I don't care too much.
> > There was something about calling it connection backend that was
> > unsatisfying.  Perhaps because connection is a bit of an ambiguous word.
> > So even though Connection is the name of the model that we instantiate
> when
> > retrieving these creds, creds seemed more specific and more
> > representative.  But again if there is consensus on something else, I am
> > happy to change.
> > This adds a `creds` package under `airflow`.  Alternatively would we
> create
> > a `connections` package?  Would this cause confusion relative to the
> > *model*
> > Connection?
> > Another thought: even though today this backend will produce a connection
> > object, maybe in the future it can produce a different model -- either
> way,
> > this backend is still about retrieving *creds*.
> >
> > *Outstanding questions*
> > As mentioned above, some don't like using the word Creds because this is
> > about producing connections.  I totally get this.  If there is general
> > consensus around another name, I am happy to change it.
> > I welcome any other suggestions or feedback on the structure of this
> > implementation.
> >
>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>

Re: [DISCUSS] AIP-33 Creds Backend

Posted by Kaxil Naik <ka...@gmail.com>.
I will mark this AIP
<https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-33+Secrets+backend>
as accepted, given we now have 4 +1 binding votes and already 8 days have
passed since this AIP was created.

+1 bindings:

   - Kaxil
   - Jarek
   - Tomek
   - Fokko

I will now create a VOTE thread on the exiting PR.

Regards,
Kaxil


On Mon, Mar 16, 2020 at 10:26 AM Tomasz Urbaszek <
tomasz.urbaszek@polidea.com> wrote:

> I am also +1 for the idea of secrets backend. And agree with Fokko
> that we should support other backends as well.
>
> T.
>
>
> On Mon, Mar 16, 2020 at 11:19 AM Kaxil Naik <ka...@gmail.com> wrote:
> >
> > Bumping this thread again. We have 2 "+1" votes on it currently (Me &
> > Jarek).
> >
> > Would love to hear other's opinion and thoughts. Personally I feel this
> is
> > a great addition and the one I was looking forward to for a long time.
> >
> > Regards,
> > Kaxil
> >
> > On Sun, Mar 8, 2020 at 8:33 PM Kaxil Naik <ka...@gmail.com> wrote:
> >
> > > Thanks, Daniel, a *big "+1"* to the AIP and the idea of integrating
> > > external vaults and using them as connections. Was in the roadmap of
> items
> > > I wanted to work on so very happy that it is coming together and we are
> > > already discussing it.
> > >
> > > I have also looked at the PR and for the most part, it looks good to
> me.
> > > Following are my comments
> > >
> > >
> > >    1. "*SecretsBackend*" is a more intuitive name in my opinion (we can
> > >    have a vote in case we don't come to a conclusion :D)
> > >    2. I think we should have *creds_backend_kwargs* (either separate
> > >    section or a config allowing JSON-string) or something similar to
> avoid
> > >    hardcoding configs (like *aws_ssm_prefix*/*aws_ssm_profile_name*)
> > >    related to a specific provider in *airflow.cfg*.
> > >    If we don't do that we would need to hardcode these configs when I
> (or
> > >    someone) adds support for Google's KMS and Hashicorp Vault :)
> > >    3. How do we authenticate to the Secrets/Creds Backend?
> > >    For Google Cloud KMS or Hashicorp Vault, the main way of
> > >    authenticating I think is using Environment Variable.
> > >    Of course (2) can solve this, for example, we can pass
> > >    *GOOGLE_APPLICATION_CREDENTIAL* JSON file path to
> *creds_backend_kwargs
> > >    *which would then authenticate with GCP
> > >    Or do we want to get that value from* Connection object stored in
> DB*?
> > >
> > >
> > > Regards,
> > > Kaxil
> > >
> > >
> > > On Sun, Mar 8, 2020 at 6:41 AM Daniel Standish <dp...@gmail.com>
> > > wrote:
> > >
> > >> Thanks Jarek for adding the PR link.  In addition to forgetting to
> include
> > >> that, I put the wrong AIP number in subject -- fixed now (AIP-33 is
> the
> > >> correct AIP).
> > >>
> > >> I like the name Secrets -- I will let the other ideas percolate and
> will
> > >> look into your suggestion about using the plugin system.
> > >>
> > >>
> > >> On Sat, Mar 7, 2020 at 8:09 PM Jarek Potiuk <Jarek.Potiuk@polidea.com
> >
> > >> wrote:
> > >>
> > >> > +1. I like it. And I like the general direction of the name as well
> not
> > >> to
> > >> > be tied with connections.
> > >> >
> > >> > I think it's also worth mentioning that there is PR in progress
> about it
> > >> > here: https://github.com/apache/airflow/pull/6376/files
> > >> >
> > >> > I have one observation though. We might not necessarily go in this
> > >> > direction or maybe defer it to later discussion, but I thought about
> > >> > extending the use of such CredentialsBackend and make it simply a
> > >> > SecretsBackend (that would be a nice solution to discussing various
> > >> names
> > >> > for it :)).  Daniel - you might consider that as a "loose" comment.
> > >> >
> > >> > I do not want to hijack your topic here, so feel free to cut the
> > >> discussion
> > >> > here if you think it's going too far from the original proposal.
> > >> >
> > >> > Why don't we change it to a bit more generic "SecretsBackend" and
> > >> provide a
> > >> > more generic interface to use by the Hooks/Operators? I think the
> > >> current
> > >> > implementation is not very far from it. Additionally to just
> providing a
> > >> > "get_connection" method it could provide much more generic
> "get_secret"
> > >> and
> > >> > this would give the Hooks/Operators an opportunity to use it to
> retrieve
> > >> > all various kinds of secrets, not only Connections. We already have
> a
> > >> > half-baked solution for that with the _COMMAND variables and maybe
> we
> > >> can
> > >> > incorporate this in backwards-compatible way in the "SecretBackend'
> > >> > similarly as you did with Metastore and Env Backends.
> > >> >
> > >> > I think this would be a very handy feature - some of the secrets
> are not
> > >> > necessary in the URI form, they might be much more complex and
> contain
> > >> > better structured data and might be needed or available at different
> > >> times
> > >> > than when connection is retrieved. Often secrets are short-living
> and
> > >> they
> > >> > should be retrieved "just" before they are used - limiting it only
> to
> > >> when
> > >> > the Hook is created and only to URI is quite a limitation I think.
> > >> >
> > >> > Again - feel free to disregard that comment, I just thought it
> might be
> > >> a
> > >> > handy feature and maybe it's a good time to add it.
> > >> >
> > >> > J.
> > >> >
> > >> >
> > >> > On Sun, Mar 8, 2020 at 1:03 AM Daniel Standish <
> dpstandish@gmail.com>
> > >> > wrote:
> > >> >
> > >> > > We can currently retrieve connections from environment variables
> or
> > >> the
> > >> > > metastore database.
> > >> > >
> > >> > > AIP-33
> > >> > > <
> > >> >
> > >>
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-33+Creds+backend>
> > >> > > provides
> > >> > > a way to retrieve them from other sources, for example AWS SSM
> > >> parameter
> > >> > > store.
> > >> > >
> > >> > > There are many instances in airflow where we allow for user
> > >> customization
> > >> > > like this, for example with auth backend and hostname callable --
> it
> > >> just
> > >> > > hasn't been done yet with creds.
> > >> > >
> > >> > > *How is it implemented?*
> > >> > > This adds a base class BaseCredsBackend which takes over
> BaseHook's
> > >> > > get_connections method.  Then EnvironmentVariablesCredsBackend and
> > >> > > MetastoreCredsBackend are added as subclasses.  New
> implementations
> > >> can
> > >> > be
> > >> > > added and user can configure precedence.  E.g. instead of the
> default
> > >> env
> > >> > > vars > metastore, a user could specify SSM > metastore > env vars
> in
> > >> > > airflow.cfg.
> > >> > >
> > >> > > *Does this break anything?*
> > >> > > No. This is a relatively simple refactor that results in no
> change in
> > >> the
> > >> > > default behavior unless user modifies config to enable an
> alternative
> > >> > > backend.
> > >> > >
> > >> > > *Why is this worth adding?*
> > >> > > If implemented, you could store creds anywhere, as long as you can
> > >> write
> > >> > a
> > >> > > `get_connections` method that retrieves them.
> > >> > > This can make it possible for a team of devs to share one creds
> source
> > >> > > rather than each dev storing them in text files, for example.
> > >> > > Can also make it easier to spin up a dev instance since you don't
> > >> have to
> > >> > > worry about loading creds.
> > >> > > And if you don't have access to the airflow CLI (because you are
> on a
> > >> > cloud
> > >> > > platform perhaps) then this can provide an easy way to load and
> manage
> > >> > > creds.
> > >> > >
> > >> > > *Why do you call it creds?*
> > >> > > I don't need to call it creds.  I could call it
> BaseConnectionBackend
> > >> or
> > >> > > BaseConnectionInfoBackend or any other thing -- I don't care too
> much.
> > >> > > There was something about calling it connection backend that was
> > >> > > unsatisfying.  Perhaps because connection is a bit of an ambiguous
> > >> word.
> > >> > > So even though Connection is the name of the model that we
> instantiate
> > >> > when
> > >> > > retrieving these creds, creds seemed more specific and more
> > >> > > representative.  But again if there is consensus on something
> else, I
> > >> am
> > >> > > happy to change.
> > >> > > This adds a `creds` package under `airflow`.  Alternatively would
> we
> > >> > create
> > >> > > a `connections` package?  Would this cause confusion relative to
> the
> > >> > > *model*
> > >> > > Connection?
> > >> > > Another thought: even though today this backend will produce a
> > >> connection
> > >> > > object, maybe in the future it can produce a different model --
> either
> > >> > way,
> > >> > > this backend is still about retrieving *creds*.
> > >> > >
> > >> > > *Outstanding questions*
> > >> > > As mentioned above, some don't like using the word Creds because
> this
> > >> is
> > >> > > about producing connections.  I totally get this.  If there is
> general
> > >> > > consensus around another name, I am happy to change it.
> > >> > > I welcome any other suggestions or feedback on the structure of
> this
> > >> > > implementation.
> > >> > >
> > >> >
> > >> >
> > >> > --
> > >> >
> > >> > Jarek Potiuk
> > >> > Polidea <https://www.polidea.com/> | Principal Software Engineer
> > >> >
> > >> > M: +48 660 796 129 <+48660796129>
> > >> > [image: Polidea] <https://www.polidea.com/>
> > >> >
> > >>
> > >
>
>
>
> --
>
> Tomasz Urbaszek
> Polidea | Software Engineer
>
> M: +48 505 628 493
> E: tomasz.urbaszek@polidea.com
>
> Unique Tech
> Check out our projects!
>

Re: [DISCUSS] AIP-33 Creds Backend

Posted by Tomasz Urbaszek <to...@polidea.com>.
I am also +1 for the idea of secrets backend. And agree with Fokko
that we should support other backends as well.

T.


On Mon, Mar 16, 2020 at 11:19 AM Kaxil Naik <ka...@gmail.com> wrote:
>
> Bumping this thread again. We have 2 "+1" votes on it currently (Me &
> Jarek).
>
> Would love to hear other's opinion and thoughts. Personally I feel this is
> a great addition and the one I was looking forward to for a long time.
>
> Regards,
> Kaxil
>
> On Sun, Mar 8, 2020 at 8:33 PM Kaxil Naik <ka...@gmail.com> wrote:
>
> > Thanks, Daniel, a *big "+1"* to the AIP and the idea of integrating
> > external vaults and using them as connections. Was in the roadmap of items
> > I wanted to work on so very happy that it is coming together and we are
> > already discussing it.
> >
> > I have also looked at the PR and for the most part, it looks good to me.
> > Following are my comments
> >
> >
> >    1. "*SecretsBackend*" is a more intuitive name in my opinion (we can
> >    have a vote in case we don't come to a conclusion :D)
> >    2. I think we should have *creds_backend_kwargs* (either separate
> >    section or a config allowing JSON-string) or something similar to avoid
> >    hardcoding configs (like *aws_ssm_prefix*/*aws_ssm_profile_name*)
> >    related to a specific provider in *airflow.cfg*.
> >    If we don't do that we would need to hardcode these configs when I (or
> >    someone) adds support for Google's KMS and Hashicorp Vault :)
> >    3. How do we authenticate to the Secrets/Creds Backend?
> >    For Google Cloud KMS or Hashicorp Vault, the main way of
> >    authenticating I think is using Environment Variable.
> >    Of course (2) can solve this, for example, we can pass
> >    *GOOGLE_APPLICATION_CREDENTIAL* JSON file path to *creds_backend_kwargs
> >    *which would then authenticate with GCP
> >    Or do we want to get that value from* Connection object stored in DB*?
> >
> >
> > Regards,
> > Kaxil
> >
> >
> > On Sun, Mar 8, 2020 at 6:41 AM Daniel Standish <dp...@gmail.com>
> > wrote:
> >
> >> Thanks Jarek for adding the PR link.  In addition to forgetting to include
> >> that, I put the wrong AIP number in subject -- fixed now (AIP-33 is the
> >> correct AIP).
> >>
> >> I like the name Secrets -- I will let the other ideas percolate and will
> >> look into your suggestion about using the plugin system.
> >>
> >>
> >> On Sat, Mar 7, 2020 at 8:09 PM Jarek Potiuk <Ja...@polidea.com>
> >> wrote:
> >>
> >> > +1. I like it. And I like the general direction of the name as well not
> >> to
> >> > be tied with connections.
> >> >
> >> > I think it's also worth mentioning that there is PR in progress about it
> >> > here: https://github.com/apache/airflow/pull/6376/files
> >> >
> >> > I have one observation though. We might not necessarily go in this
> >> > direction or maybe defer it to later discussion, but I thought about
> >> > extending the use of such CredentialsBackend and make it simply a
> >> > SecretsBackend (that would be a nice solution to discussing various
> >> names
> >> > for it :)).  Daniel - you might consider that as a "loose" comment.
> >> >
> >> > I do not want to hijack your topic here, so feel free to cut the
> >> discussion
> >> > here if you think it's going too far from the original proposal.
> >> >
> >> > Why don't we change it to a bit more generic "SecretsBackend" and
> >> provide a
> >> > more generic interface to use by the Hooks/Operators? I think the
> >> current
> >> > implementation is not very far from it. Additionally to just providing a
> >> > "get_connection" method it could provide much more generic "get_secret"
> >> and
> >> > this would give the Hooks/Operators an opportunity to use it to retrieve
> >> > all various kinds of secrets, not only Connections. We already have a
> >> > half-baked solution for that with the _COMMAND variables and maybe we
> >> can
> >> > incorporate this in backwards-compatible way in the "SecretBackend'
> >> > similarly as you did with Metastore and Env Backends.
> >> >
> >> > I think this would be a very handy feature - some of the secrets are not
> >> > necessary in the URI form, they might be much more complex and contain
> >> > better structured data and might be needed or available at different
> >> times
> >> > than when connection is retrieved. Often secrets are short-living and
> >> they
> >> > should be retrieved "just" before they are used - limiting it only to
> >> when
> >> > the Hook is created and only to URI is quite a limitation I think.
> >> >
> >> > Again - feel free to disregard that comment, I just thought it might be
> >> a
> >> > handy feature and maybe it's a good time to add it.
> >> >
> >> > J.
> >> >
> >> >
> >> > On Sun, Mar 8, 2020 at 1:03 AM Daniel Standish <dp...@gmail.com>
> >> > wrote:
> >> >
> >> > > We can currently retrieve connections from environment variables or
> >> the
> >> > > metastore database.
> >> > >
> >> > > AIP-33
> >> > > <
> >> >
> >> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-33+Creds+backend>
> >> > > provides
> >> > > a way to retrieve them from other sources, for example AWS SSM
> >> parameter
> >> > > store.
> >> > >
> >> > > There are many instances in airflow where we allow for user
> >> customization
> >> > > like this, for example with auth backend and hostname callable -- it
> >> just
> >> > > hasn't been done yet with creds.
> >> > >
> >> > > *How is it implemented?*
> >> > > This adds a base class BaseCredsBackend which takes over BaseHook's
> >> > > get_connections method.  Then EnvironmentVariablesCredsBackend and
> >> > > MetastoreCredsBackend are added as subclasses.  New implementations
> >> can
> >> > be
> >> > > added and user can configure precedence.  E.g. instead of the default
> >> env
> >> > > vars > metastore, a user could specify SSM > metastore > env vars in
> >> > > airflow.cfg.
> >> > >
> >> > > *Does this break anything?*
> >> > > No. This is a relatively simple refactor that results in no change in
> >> the
> >> > > default behavior unless user modifies config to enable an alternative
> >> > > backend.
> >> > >
> >> > > *Why is this worth adding?*
> >> > > If implemented, you could store creds anywhere, as long as you can
> >> write
> >> > a
> >> > > `get_connections` method that retrieves them.
> >> > > This can make it possible for a team of devs to share one creds source
> >> > > rather than each dev storing them in text files, for example.
> >> > > Can also make it easier to spin up a dev instance since you don't
> >> have to
> >> > > worry about loading creds.
> >> > > And if you don't have access to the airflow CLI (because you are on a
> >> > cloud
> >> > > platform perhaps) then this can provide an easy way to load and manage
> >> > > creds.
> >> > >
> >> > > *Why do you call it creds?*
> >> > > I don't need to call it creds.  I could call it BaseConnectionBackend
> >> or
> >> > > BaseConnectionInfoBackend or any other thing -- I don't care too much.
> >> > > There was something about calling it connection backend that was
> >> > > unsatisfying.  Perhaps because connection is a bit of an ambiguous
> >> word.
> >> > > So even though Connection is the name of the model that we instantiate
> >> > when
> >> > > retrieving these creds, creds seemed more specific and more
> >> > > representative.  But again if there is consensus on something else, I
> >> am
> >> > > happy to change.
> >> > > This adds a `creds` package under `airflow`.  Alternatively would we
> >> > create
> >> > > a `connections` package?  Would this cause confusion relative to the
> >> > > *model*
> >> > > Connection?
> >> > > Another thought: even though today this backend will produce a
> >> connection
> >> > > object, maybe in the future it can produce a different model -- either
> >> > way,
> >> > > this backend is still about retrieving *creds*.
> >> > >
> >> > > *Outstanding questions*
> >> > > As mentioned above, some don't like using the word Creds because this
> >> is
> >> > > about producing connections.  I totally get this.  If there is general
> >> > > consensus around another name, I am happy to change it.
> >> > > I welcome any other suggestions or feedback on the structure of this
> >> > > implementation.
> >> > >
> >> >
> >> >
> >> > --
> >> >
> >> > Jarek Potiuk
> >> > Polidea <https://www.polidea.com/> | Principal Software Engineer
> >> >
> >> > M: +48 660 796 129 <+48660796129>
> >> > [image: Polidea] <https://www.polidea.com/>
> >> >
> >>
> >



-- 

Tomasz Urbaszek
Polidea | Software Engineer

M: +48 505 628 493
E: tomasz.urbaszek@polidea.com

Unique Tech
Check out our projects!

Re: [DISCUSS] AIP-33 Creds Backend

Posted by Kaxil Naik <ka...@gmail.com>.
Here is the PR https://github.com/apache/airflow/pull/6376 that I have
merged already (Apologies for merging this before everyone voted on the AIP)

The backend is pluggable and I am planning to extend it Google KMS and
Hashicorp Vault over the weekend but if someone wants to go ahead please
feel free as I would be spending my time on DAG Serialization over the
weekdays.

Regards,
Kaxil

On Mon, Mar 16, 2020 at 10:22 AM Driesprong, Fokko <fo...@driesprong.frl>
wrote:

> Yes, big +1 from my side as well. I think this is a requirement for any
> bigger company.
>
>
>    1. I think we should add support for KMS, Vault etc.
>    2. Make the current backend pluggable, and set the current as default
>    (Airflow metastore)
>
>
> Cheers, Fokko
>
> Op ma 16 mrt. 2020 om 11:19 schreef Kaxil Naik <ka...@gmail.com>:
>
> > Bumping this thread again. We have 2 "+1" votes on it currently (Me &
> > Jarek).
> >
> > Would love to hear other's opinion and thoughts. Personally I feel this
> is
> > a great addition and the one I was looking forward to for a long time.
> >
> > Regards,
> > Kaxil
> >
> > On Sun, Mar 8, 2020 at 8:33 PM Kaxil Naik <ka...@gmail.com> wrote:
> >
> > > Thanks, Daniel, a *big "+1"* to the AIP and the idea of integrating
> > > external vaults and using them as connections. Was in the roadmap of
> > items
> > > I wanted to work on so very happy that it is coming together and we are
> > > already discussing it.
> > >
> > > I have also looked at the PR and for the most part, it looks good to
> me.
> > > Following are my comments
> > >
> > >
> > >    1. "*SecretsBackend*" is a more intuitive name in my opinion (we can
> > >    have a vote in case we don't come to a conclusion :D)
> > >    2. I think we should have *creds_backend_kwargs* (either separate
> > >    section or a config allowing JSON-string) or something similar to
> > avoid
> > >    hardcoding configs (like *aws_ssm_prefix*/*aws_ssm_profile_name*)
> > >    related to a specific provider in *airflow.cfg*.
> > >    If we don't do that we would need to hardcode these configs when I
> (or
> > >    someone) adds support for Google's KMS and Hashicorp Vault :)
> > >    3. How do we authenticate to the Secrets/Creds Backend?
> > >    For Google Cloud KMS or Hashicorp Vault, the main way of
> > >    authenticating I think is using Environment Variable.
> > >    Of course (2) can solve this, for example, we can pass
> > >    *GOOGLE_APPLICATION_CREDENTIAL* JSON file path to
> > *creds_backend_kwargs
> > >    *which would then authenticate with GCP
> > >    Or do we want to get that value from* Connection object stored in
> DB*?
> > >
> > >
> > > Regards,
> > > Kaxil
> > >
> > >
> > > On Sun, Mar 8, 2020 at 6:41 AM Daniel Standish <dp...@gmail.com>
> > > wrote:
> > >
> > >> Thanks Jarek for adding the PR link.  In addition to forgetting to
> > include
> > >> that, I put the wrong AIP number in subject -- fixed now (AIP-33 is
> the
> > >> correct AIP).
> > >>
> > >> I like the name Secrets -- I will let the other ideas percolate and
> will
> > >> look into your suggestion about using the plugin system.
> > >>
> > >>
> > >> On Sat, Mar 7, 2020 at 8:09 PM Jarek Potiuk <Jarek.Potiuk@polidea.com
> >
> > >> wrote:
> > >>
> > >> > +1. I like it. And I like the general direction of the name as well
> > not
> > >> to
> > >> > be tied with connections.
> > >> >
> > >> > I think it's also worth mentioning that there is PR in progress
> about
> > it
> > >> > here: https://github.com/apache/airflow/pull/6376/files
> > >> >
> > >> > I have one observation though. We might not necessarily go in this
> > >> > direction or maybe defer it to later discussion, but I thought about
> > >> > extending the use of such CredentialsBackend and make it simply a
> > >> > SecretsBackend (that would be a nice solution to discussing various
> > >> names
> > >> > for it :)).  Daniel - you might consider that as a "loose" comment.
> > >> >
> > >> > I do not want to hijack your topic here, so feel free to cut the
> > >> discussion
> > >> > here if you think it's going too far from the original proposal.
> > >> >
> > >> > Why don't we change it to a bit more generic "SecretsBackend" and
> > >> provide a
> > >> > more generic interface to use by the Hooks/Operators? I think the
> > >> current
> > >> > implementation is not very far from it. Additionally to just
> > providing a
> > >> > "get_connection" method it could provide much more generic
> > "get_secret"
> > >> and
> > >> > this would give the Hooks/Operators an opportunity to use it to
> > retrieve
> > >> > all various kinds of secrets, not only Connections. We already have
> a
> > >> > half-baked solution for that with the _COMMAND variables and maybe
> we
> > >> can
> > >> > incorporate this in backwards-compatible way in the "SecretBackend'
> > >> > similarly as you did with Metastore and Env Backends.
> > >> >
> > >> > I think this would be a very handy feature - some of the secrets are
> > not
> > >> > necessary in the URI form, they might be much more complex and
> contain
> > >> > better structured data and might be needed or available at different
> > >> times
> > >> > than when connection is retrieved. Often secrets are short-living
> and
> > >> they
> > >> > should be retrieved "just" before they are used - limiting it only
> to
> > >> when
> > >> > the Hook is created and only to URI is quite a limitation I think.
> > >> >
> > >> > Again - feel free to disregard that comment, I just thought it might
> > be
> > >> a
> > >> > handy feature and maybe it's a good time to add it.
> > >> >
> > >> > J.
> > >> >
> > >> >
> > >> > On Sun, Mar 8, 2020 at 1:03 AM Daniel Standish <
> dpstandish@gmail.com>
> > >> > wrote:
> > >> >
> > >> > > We can currently retrieve connections from environment variables
> or
> > >> the
> > >> > > metastore database.
> > >> > >
> > >> > > AIP-33
> > >> > > <
> > >> >
> > >>
> > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-33+Creds+backend
> >
> > >> > > provides
> > >> > > a way to retrieve them from other sources, for example AWS SSM
> > >> parameter
> > >> > > store.
> > >> > >
> > >> > > There are many instances in airflow where we allow for user
> > >> customization
> > >> > > like this, for example with auth backend and hostname callable --
> it
> > >> just
> > >> > > hasn't been done yet with creds.
> > >> > >
> > >> > > *How is it implemented?*
> > >> > > This adds a base class BaseCredsBackend which takes over
> BaseHook's
> > >> > > get_connections method.  Then EnvironmentVariablesCredsBackend and
> > >> > > MetastoreCredsBackend are added as subclasses.  New
> implementations
> > >> can
> > >> > be
> > >> > > added and user can configure precedence.  E.g. instead of the
> > default
> > >> env
> > >> > > vars > metastore, a user could specify SSM > metastore > env vars
> in
> > >> > > airflow.cfg.
> > >> > >
> > >> > > *Does this break anything?*
> > >> > > No. This is a relatively simple refactor that results in no change
> > in
> > >> the
> > >> > > default behavior unless user modifies config to enable an
> > alternative
> > >> > > backend.
> > >> > >
> > >> > > *Why is this worth adding?*
> > >> > > If implemented, you could store creds anywhere, as long as you can
> > >> write
> > >> > a
> > >> > > `get_connections` method that retrieves them.
> > >> > > This can make it possible for a team of devs to share one creds
> > source
> > >> > > rather than each dev storing them in text files, for example.
> > >> > > Can also make it easier to spin up a dev instance since you don't
> > >> have to
> > >> > > worry about loading creds.
> > >> > > And if you don't have access to the airflow CLI (because you are
> on
> > a
> > >> > cloud
> > >> > > platform perhaps) then this can provide an easy way to load and
> > manage
> > >> > > creds.
> > >> > >
> > >> > > *Why do you call it creds?*
> > >> > > I don't need to call it creds.  I could call it
> > BaseConnectionBackend
> > >> or
> > >> > > BaseConnectionInfoBackend or any other thing -- I don't care too
> > much.
> > >> > > There was something about calling it connection backend that was
> > >> > > unsatisfying.  Perhaps because connection is a bit of an ambiguous
> > >> word.
> > >> > > So even though Connection is the name of the model that we
> > instantiate
> > >> > when
> > >> > > retrieving these creds, creds seemed more specific and more
> > >> > > representative.  But again if there is consensus on something
> else,
> > I
> > >> am
> > >> > > happy to change.
> > >> > > This adds a `creds` package under `airflow`.  Alternatively would
> we
> > >> > create
> > >> > > a `connections` package?  Would this cause confusion relative to
> the
> > >> > > *model*
> > >> > > Connection?
> > >> > > Another thought: even though today this backend will produce a
> > >> connection
> > >> > > object, maybe in the future it can produce a different model --
> > either
> > >> > way,
> > >> > > this backend is still about retrieving *creds*.
> > >> > >
> > >> > > *Outstanding questions*
> > >> > > As mentioned above, some don't like using the word Creds because
> > this
> > >> is
> > >> > > about producing connections.  I totally get this.  If there is
> > general
> > >> > > consensus around another name, I am happy to change it.
> > >> > > I welcome any other suggestions or feedback on the structure of
> this
> > >> > > implementation.
> > >> > >
> > >> >
> > >> >
> > >> > --
> > >> >
> > >> > Jarek Potiuk
> > >> > Polidea <https://www.polidea.com/> | Principal Software Engineer
> > >> >
> > >> > M: +48 660 796 129 <+48660796129>
> > >> > [image: Polidea] <https://www.polidea.com/>
> > >> >
> > >>
> > >
> >
>

Re: [DISCUSS] AIP-33 Creds Backend

Posted by "Driesprong, Fokko" <fo...@driesprong.frl>.
Yes, big +1 from my side as well. I think this is a requirement for any
bigger company.


   1. I think we should add support for KMS, Vault etc.
   2. Make the current backend pluggable, and set the current as default
   (Airflow metastore)


Cheers, Fokko

Op ma 16 mrt. 2020 om 11:19 schreef Kaxil Naik <ka...@gmail.com>:

> Bumping this thread again. We have 2 "+1" votes on it currently (Me &
> Jarek).
>
> Would love to hear other's opinion and thoughts. Personally I feel this is
> a great addition and the one I was looking forward to for a long time.
>
> Regards,
> Kaxil
>
> On Sun, Mar 8, 2020 at 8:33 PM Kaxil Naik <ka...@gmail.com> wrote:
>
> > Thanks, Daniel, a *big "+1"* to the AIP and the idea of integrating
> > external vaults and using them as connections. Was in the roadmap of
> items
> > I wanted to work on so very happy that it is coming together and we are
> > already discussing it.
> >
> > I have also looked at the PR and for the most part, it looks good to me.
> > Following are my comments
> >
> >
> >    1. "*SecretsBackend*" is a more intuitive name in my opinion (we can
> >    have a vote in case we don't come to a conclusion :D)
> >    2. I think we should have *creds_backend_kwargs* (either separate
> >    section or a config allowing JSON-string) or something similar to
> avoid
> >    hardcoding configs (like *aws_ssm_prefix*/*aws_ssm_profile_name*)
> >    related to a specific provider in *airflow.cfg*.
> >    If we don't do that we would need to hardcode these configs when I (or
> >    someone) adds support for Google's KMS and Hashicorp Vault :)
> >    3. How do we authenticate to the Secrets/Creds Backend?
> >    For Google Cloud KMS or Hashicorp Vault, the main way of
> >    authenticating I think is using Environment Variable.
> >    Of course (2) can solve this, for example, we can pass
> >    *GOOGLE_APPLICATION_CREDENTIAL* JSON file path to
> *creds_backend_kwargs
> >    *which would then authenticate with GCP
> >    Or do we want to get that value from* Connection object stored in DB*?
> >
> >
> > Regards,
> > Kaxil
> >
> >
> > On Sun, Mar 8, 2020 at 6:41 AM Daniel Standish <dp...@gmail.com>
> > wrote:
> >
> >> Thanks Jarek for adding the PR link.  In addition to forgetting to
> include
> >> that, I put the wrong AIP number in subject -- fixed now (AIP-33 is the
> >> correct AIP).
> >>
> >> I like the name Secrets -- I will let the other ideas percolate and will
> >> look into your suggestion about using the plugin system.
> >>
> >>
> >> On Sat, Mar 7, 2020 at 8:09 PM Jarek Potiuk <Ja...@polidea.com>
> >> wrote:
> >>
> >> > +1. I like it. And I like the general direction of the name as well
> not
> >> to
> >> > be tied with connections.
> >> >
> >> > I think it's also worth mentioning that there is PR in progress about
> it
> >> > here: https://github.com/apache/airflow/pull/6376/files
> >> >
> >> > I have one observation though. We might not necessarily go in this
> >> > direction or maybe defer it to later discussion, but I thought about
> >> > extending the use of such CredentialsBackend and make it simply a
> >> > SecretsBackend (that would be a nice solution to discussing various
> >> names
> >> > for it :)).  Daniel - you might consider that as a "loose" comment.
> >> >
> >> > I do not want to hijack your topic here, so feel free to cut the
> >> discussion
> >> > here if you think it's going too far from the original proposal.
> >> >
> >> > Why don't we change it to a bit more generic "SecretsBackend" and
> >> provide a
> >> > more generic interface to use by the Hooks/Operators? I think the
> >> current
> >> > implementation is not very far from it. Additionally to just
> providing a
> >> > "get_connection" method it could provide much more generic
> "get_secret"
> >> and
> >> > this would give the Hooks/Operators an opportunity to use it to
> retrieve
> >> > all various kinds of secrets, not only Connections. We already have a
> >> > half-baked solution for that with the _COMMAND variables and maybe we
> >> can
> >> > incorporate this in backwards-compatible way in the "SecretBackend'
> >> > similarly as you did with Metastore and Env Backends.
> >> >
> >> > I think this would be a very handy feature - some of the secrets are
> not
> >> > necessary in the URI form, they might be much more complex and contain
> >> > better structured data and might be needed or available at different
> >> times
> >> > than when connection is retrieved. Often secrets are short-living and
> >> they
> >> > should be retrieved "just" before they are used - limiting it only to
> >> when
> >> > the Hook is created and only to URI is quite a limitation I think.
> >> >
> >> > Again - feel free to disregard that comment, I just thought it might
> be
> >> a
> >> > handy feature and maybe it's a good time to add it.
> >> >
> >> > J.
> >> >
> >> >
> >> > On Sun, Mar 8, 2020 at 1:03 AM Daniel Standish <dp...@gmail.com>
> >> > wrote:
> >> >
> >> > > We can currently retrieve connections from environment variables or
> >> the
> >> > > metastore database.
> >> > >
> >> > > AIP-33
> >> > > <
> >> >
> >>
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-33+Creds+backend>
> >> > > provides
> >> > > a way to retrieve them from other sources, for example AWS SSM
> >> parameter
> >> > > store.
> >> > >
> >> > > There are many instances in airflow where we allow for user
> >> customization
> >> > > like this, for example with auth backend and hostname callable -- it
> >> just
> >> > > hasn't been done yet with creds.
> >> > >
> >> > > *How is it implemented?*
> >> > > This adds a base class BaseCredsBackend which takes over BaseHook's
> >> > > get_connections method.  Then EnvironmentVariablesCredsBackend and
> >> > > MetastoreCredsBackend are added as subclasses.  New implementations
> >> can
> >> > be
> >> > > added and user can configure precedence.  E.g. instead of the
> default
> >> env
> >> > > vars > metastore, a user could specify SSM > metastore > env vars in
> >> > > airflow.cfg.
> >> > >
> >> > > *Does this break anything?*
> >> > > No. This is a relatively simple refactor that results in no change
> in
> >> the
> >> > > default behavior unless user modifies config to enable an
> alternative
> >> > > backend.
> >> > >
> >> > > *Why is this worth adding?*
> >> > > If implemented, you could store creds anywhere, as long as you can
> >> write
> >> > a
> >> > > `get_connections` method that retrieves them.
> >> > > This can make it possible for a team of devs to share one creds
> source
> >> > > rather than each dev storing them in text files, for example.
> >> > > Can also make it easier to spin up a dev instance since you don't
> >> have to
> >> > > worry about loading creds.
> >> > > And if you don't have access to the airflow CLI (because you are on
> a
> >> > cloud
> >> > > platform perhaps) then this can provide an easy way to load and
> manage
> >> > > creds.
> >> > >
> >> > > *Why do you call it creds?*
> >> > > I don't need to call it creds.  I could call it
> BaseConnectionBackend
> >> or
> >> > > BaseConnectionInfoBackend or any other thing -- I don't care too
> much.
> >> > > There was something about calling it connection backend that was
> >> > > unsatisfying.  Perhaps because connection is a bit of an ambiguous
> >> word.
> >> > > So even though Connection is the name of the model that we
> instantiate
> >> > when
> >> > > retrieving these creds, creds seemed more specific and more
> >> > > representative.  But again if there is consensus on something else,
> I
> >> am
> >> > > happy to change.
> >> > > This adds a `creds` package under `airflow`.  Alternatively would we
> >> > create
> >> > > a `connections` package?  Would this cause confusion relative to the
> >> > > *model*
> >> > > Connection?
> >> > > Another thought: even though today this backend will produce a
> >> connection
> >> > > object, maybe in the future it can produce a different model --
> either
> >> > way,
> >> > > this backend is still about retrieving *creds*.
> >> > >
> >> > > *Outstanding questions*
> >> > > As mentioned above, some don't like using the word Creds because
> this
> >> is
> >> > > about producing connections.  I totally get this.  If there is
> general
> >> > > consensus around another name, I am happy to change it.
> >> > > I welcome any other suggestions or feedback on the structure of this
> >> > > implementation.
> >> > >
> >> >
> >> >
> >> > --
> >> >
> >> > Jarek Potiuk
> >> > Polidea <https://www.polidea.com/> | Principal Software Engineer
> >> >
> >> > M: +48 660 796 129 <+48660796129>
> >> > [image: Polidea] <https://www.polidea.com/>
> >> >
> >>
> >
>

Re: [DISCUSS] AIP-33 Creds Backend

Posted by Kaxil Naik <ka...@gmail.com>.
Bumping this thread again. We have 2 "+1" votes on it currently (Me &
Jarek).

Would love to hear other's opinion and thoughts. Personally I feel this is
a great addition and the one I was looking forward to for a long time.

Regards,
Kaxil

On Sun, Mar 8, 2020 at 8:33 PM Kaxil Naik <ka...@gmail.com> wrote:

> Thanks, Daniel, a *big "+1"* to the AIP and the idea of integrating
> external vaults and using them as connections. Was in the roadmap of items
> I wanted to work on so very happy that it is coming together and we are
> already discussing it.
>
> I have also looked at the PR and for the most part, it looks good to me.
> Following are my comments
>
>
>    1. "*SecretsBackend*" is a more intuitive name in my opinion (we can
>    have a vote in case we don't come to a conclusion :D)
>    2. I think we should have *creds_backend_kwargs* (either separate
>    section or a config allowing JSON-string) or something similar to avoid
>    hardcoding configs (like *aws_ssm_prefix*/*aws_ssm_profile_name*)
>    related to a specific provider in *airflow.cfg*.
>    If we don't do that we would need to hardcode these configs when I (or
>    someone) adds support for Google's KMS and Hashicorp Vault :)
>    3. How do we authenticate to the Secrets/Creds Backend?
>    For Google Cloud KMS or Hashicorp Vault, the main way of
>    authenticating I think is using Environment Variable.
>    Of course (2) can solve this, for example, we can pass
>    *GOOGLE_APPLICATION_CREDENTIAL* JSON file path to *creds_backend_kwargs
>    *which would then authenticate with GCP
>    Or do we want to get that value from* Connection object stored in DB*?
>
>
> Regards,
> Kaxil
>
>
> On Sun, Mar 8, 2020 at 6:41 AM Daniel Standish <dp...@gmail.com>
> wrote:
>
>> Thanks Jarek for adding the PR link.  In addition to forgetting to include
>> that, I put the wrong AIP number in subject -- fixed now (AIP-33 is the
>> correct AIP).
>>
>> I like the name Secrets -- I will let the other ideas percolate and will
>> look into your suggestion about using the plugin system.
>>
>>
>> On Sat, Mar 7, 2020 at 8:09 PM Jarek Potiuk <Ja...@polidea.com>
>> wrote:
>>
>> > +1. I like it. And I like the general direction of the name as well not
>> to
>> > be tied with connections.
>> >
>> > I think it's also worth mentioning that there is PR in progress about it
>> > here: https://github.com/apache/airflow/pull/6376/files
>> >
>> > I have one observation though. We might not necessarily go in this
>> > direction or maybe defer it to later discussion, but I thought about
>> > extending the use of such CredentialsBackend and make it simply a
>> > SecretsBackend (that would be a nice solution to discussing various
>> names
>> > for it :)).  Daniel - you might consider that as a "loose" comment.
>> >
>> > I do not want to hijack your topic here, so feel free to cut the
>> discussion
>> > here if you think it's going too far from the original proposal.
>> >
>> > Why don't we change it to a bit more generic "SecretsBackend" and
>> provide a
>> > more generic interface to use by the Hooks/Operators? I think the
>> current
>> > implementation is not very far from it. Additionally to just providing a
>> > "get_connection" method it could provide much more generic "get_secret"
>> and
>> > this would give the Hooks/Operators an opportunity to use it to retrieve
>> > all various kinds of secrets, not only Connections. We already have a
>> > half-baked solution for that with the _COMMAND variables and maybe we
>> can
>> > incorporate this in backwards-compatible way in the "SecretBackend'
>> > similarly as you did with Metastore and Env Backends.
>> >
>> > I think this would be a very handy feature - some of the secrets are not
>> > necessary in the URI form, they might be much more complex and contain
>> > better structured data and might be needed or available at different
>> times
>> > than when connection is retrieved. Often secrets are short-living and
>> they
>> > should be retrieved "just" before they are used - limiting it only to
>> when
>> > the Hook is created and only to URI is quite a limitation I think.
>> >
>> > Again - feel free to disregard that comment, I just thought it might be
>> a
>> > handy feature and maybe it's a good time to add it.
>> >
>> > J.
>> >
>> >
>> > On Sun, Mar 8, 2020 at 1:03 AM Daniel Standish <dp...@gmail.com>
>> > wrote:
>> >
>> > > We can currently retrieve connections from environment variables or
>> the
>> > > metastore database.
>> > >
>> > > AIP-33
>> > > <
>> >
>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-33+Creds+backend>
>> > > provides
>> > > a way to retrieve them from other sources, for example AWS SSM
>> parameter
>> > > store.
>> > >
>> > > There are many instances in airflow where we allow for user
>> customization
>> > > like this, for example with auth backend and hostname callable -- it
>> just
>> > > hasn't been done yet with creds.
>> > >
>> > > *How is it implemented?*
>> > > This adds a base class BaseCredsBackend which takes over BaseHook's
>> > > get_connections method.  Then EnvironmentVariablesCredsBackend and
>> > > MetastoreCredsBackend are added as subclasses.  New implementations
>> can
>> > be
>> > > added and user can configure precedence.  E.g. instead of the default
>> env
>> > > vars > metastore, a user could specify SSM > metastore > env vars in
>> > > airflow.cfg.
>> > >
>> > > *Does this break anything?*
>> > > No. This is a relatively simple refactor that results in no change in
>> the
>> > > default behavior unless user modifies config to enable an alternative
>> > > backend.
>> > >
>> > > *Why is this worth adding?*
>> > > If implemented, you could store creds anywhere, as long as you can
>> write
>> > a
>> > > `get_connections` method that retrieves them.
>> > > This can make it possible for a team of devs to share one creds source
>> > > rather than each dev storing them in text files, for example.
>> > > Can also make it easier to spin up a dev instance since you don't
>> have to
>> > > worry about loading creds.
>> > > And if you don't have access to the airflow CLI (because you are on a
>> > cloud
>> > > platform perhaps) then this can provide an easy way to load and manage
>> > > creds.
>> > >
>> > > *Why do you call it creds?*
>> > > I don't need to call it creds.  I could call it BaseConnectionBackend
>> or
>> > > BaseConnectionInfoBackend or any other thing -- I don't care too much.
>> > > There was something about calling it connection backend that was
>> > > unsatisfying.  Perhaps because connection is a bit of an ambiguous
>> word.
>> > > So even though Connection is the name of the model that we instantiate
>> > when
>> > > retrieving these creds, creds seemed more specific and more
>> > > representative.  But again if there is consensus on something else, I
>> am
>> > > happy to change.
>> > > This adds a `creds` package under `airflow`.  Alternatively would we
>> > create
>> > > a `connections` package?  Would this cause confusion relative to the
>> > > *model*
>> > > Connection?
>> > > Another thought: even though today this backend will produce a
>> connection
>> > > object, maybe in the future it can produce a different model -- either
>> > way,
>> > > this backend is still about retrieving *creds*.
>> > >
>> > > *Outstanding questions*
>> > > As mentioned above, some don't like using the word Creds because this
>> is
>> > > about producing connections.  I totally get this.  If there is general
>> > > consensus around another name, I am happy to change it.
>> > > I welcome any other suggestions or feedback on the structure of this
>> > > implementation.
>> > >
>> >
>> >
>> > --
>> >
>> > Jarek Potiuk
>> > Polidea <https://www.polidea.com/> | Principal Software Engineer
>> >
>> > M: +48 660 796 129 <+48660796129>
>> > [image: Polidea] <https://www.polidea.com/>
>> >
>>
>

Re: [DISCUSS] AIP-33 Creds Backend

Posted by Kaxil Naik <ka...@gmail.com>.
Thanks, Daniel, a *big "+1"* to the AIP and the idea of integrating
external vaults and using them as connections. Was in the roadmap of items
I wanted to work on so very happy that it is coming together and we are
already discussing it.

I have also looked at the PR and for the most part, it looks good to me.
Following are my comments


   1. "*SecretsBackend*" is a more intuitive name in my opinion (we can
   have a vote in case we don't come to a conclusion :D)
   2. I think we should have *creds_backend_kwargs* (either separate
   section or a config allowing JSON-string) or something similar to avoid
   hardcoding configs (like *aws_ssm_prefix*/*aws_ssm_profile_name*)
   related to a specific provider in *airflow.cfg*.
   If we don't do that we would need to hardcode these configs when I (or
   someone) adds support for Google's KMS and Hashicorp Vault :)
   3. How do we authenticate to the Secrets/Creds Backend?
   For Google Cloud KMS or Hashicorp Vault, the main way of authenticating
   I think is using Environment Variable.
   Of course (2) can solve this, for example, we can pass
   *GOOGLE_APPLICATION_CREDENTIAL* JSON file path to
*creds_backend_kwargs *which
   would then authenticate with GCP
   Or do we want to get that value from* Connection object stored in DB*?


Regards,
Kaxil


On Sun, Mar 8, 2020 at 6:41 AM Daniel Standish <dp...@gmail.com> wrote:

> Thanks Jarek for adding the PR link.  In addition to forgetting to include
> that, I put the wrong AIP number in subject -- fixed now (AIP-33 is the
> correct AIP).
>
> I like the name Secrets -- I will let the other ideas percolate and will
> look into your suggestion about using the plugin system.
>
>
> On Sat, Mar 7, 2020 at 8:09 PM Jarek Potiuk <Ja...@polidea.com>
> wrote:
>
> > +1. I like it. And I like the general direction of the name as well not
> to
> > be tied with connections.
> >
> > I think it's also worth mentioning that there is PR in progress about it
> > here: https://github.com/apache/airflow/pull/6376/files
> >
> > I have one observation though. We might not necessarily go in this
> > direction or maybe defer it to later discussion, but I thought about
> > extending the use of such CredentialsBackend and make it simply a
> > SecretsBackend (that would be a nice solution to discussing various names
> > for it :)).  Daniel - you might consider that as a "loose" comment.
> >
> > I do not want to hijack your topic here, so feel free to cut the
> discussion
> > here if you think it's going too far from the original proposal.
> >
> > Why don't we change it to a bit more generic "SecretsBackend" and
> provide a
> > more generic interface to use by the Hooks/Operators? I think the current
> > implementation is not very far from it. Additionally to just providing a
> > "get_connection" method it could provide much more generic "get_secret"
> and
> > this would give the Hooks/Operators an opportunity to use it to retrieve
> > all various kinds of secrets, not only Connections. We already have a
> > half-baked solution for that with the _COMMAND variables and maybe we can
> > incorporate this in backwards-compatible way in the "SecretBackend'
> > similarly as you did with Metastore and Env Backends.
> >
> > I think this would be a very handy feature - some of the secrets are not
> > necessary in the URI form, they might be much more complex and contain
> > better structured data and might be needed or available at different
> times
> > than when connection is retrieved. Often secrets are short-living and
> they
> > should be retrieved "just" before they are used - limiting it only to
> when
> > the Hook is created and only to URI is quite a limitation I think.
> >
> > Again - feel free to disregard that comment, I just thought it might be a
> > handy feature and maybe it's a good time to add it.
> >
> > J.
> >
> >
> > On Sun, Mar 8, 2020 at 1:03 AM Daniel Standish <dp...@gmail.com>
> > wrote:
> >
> > > We can currently retrieve connections from environment variables or the
> > > metastore database.
> > >
> > > AIP-33
> > > <
> > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-33+Creds+backend
> >
> > > provides
> > > a way to retrieve them from other sources, for example AWS SSM
> parameter
> > > store.
> > >
> > > There are many instances in airflow where we allow for user
> customization
> > > like this, for example with auth backend and hostname callable -- it
> just
> > > hasn't been done yet with creds.
> > >
> > > *How is it implemented?*
> > > This adds a base class BaseCredsBackend which takes over BaseHook's
> > > get_connections method.  Then EnvironmentVariablesCredsBackend and
> > > MetastoreCredsBackend are added as subclasses.  New implementations can
> > be
> > > added and user can configure precedence.  E.g. instead of the default
> env
> > > vars > metastore, a user could specify SSM > metastore > env vars in
> > > airflow.cfg.
> > >
> > > *Does this break anything?*
> > > No. This is a relatively simple refactor that results in no change in
> the
> > > default behavior unless user modifies config to enable an alternative
> > > backend.
> > >
> > > *Why is this worth adding?*
> > > If implemented, you could store creds anywhere, as long as you can
> write
> > a
> > > `get_connections` method that retrieves them.
> > > This can make it possible for a team of devs to share one creds source
> > > rather than each dev storing them in text files, for example.
> > > Can also make it easier to spin up a dev instance since you don't have
> to
> > > worry about loading creds.
> > > And if you don't have access to the airflow CLI (because you are on a
> > cloud
> > > platform perhaps) then this can provide an easy way to load and manage
> > > creds.
> > >
> > > *Why do you call it creds?*
> > > I don't need to call it creds.  I could call it BaseConnectionBackend
> or
> > > BaseConnectionInfoBackend or any other thing -- I don't care too much.
> > > There was something about calling it connection backend that was
> > > unsatisfying.  Perhaps because connection is a bit of an ambiguous
> word.
> > > So even though Connection is the name of the model that we instantiate
> > when
> > > retrieving these creds, creds seemed more specific and more
> > > representative.  But again if there is consensus on something else, I
> am
> > > happy to change.
> > > This adds a `creds` package under `airflow`.  Alternatively would we
> > create
> > > a `connections` package?  Would this cause confusion relative to the
> > > *model*
> > > Connection?
> > > Another thought: even though today this backend will produce a
> connection
> > > object, maybe in the future it can produce a different model -- either
> > way,
> > > this backend is still about retrieving *creds*.
> > >
> > > *Outstanding questions*
> > > As mentioned above, some don't like using the word Creds because this
> is
> > > about producing connections.  I totally get this.  If there is general
> > > consensus around another name, I am happy to change it.
> > > I welcome any other suggestions or feedback on the structure of this
> > > implementation.
> > >
> >
> >
> > --
> >
> > Jarek Potiuk
> > Polidea <https://www.polidea.com/> | Principal Software Engineer
> >
> > M: +48 660 796 129 <+48660796129>
> > [image: Polidea] <https://www.polidea.com/>
> >
>