You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Jarek Potiuk <Ja...@polidea.com> on 2020/05/18 13:45:55 UTC

[PROPOSAL] Secret Backend Hooks

Hello Everyone,

TL;DR; I was just about to start to work on a small set of Hooks -
dedicated to retrieving screts from the Secret Backend. I discussed it with Ash
and Kamil
<https://apache-airflow.slack.com/archives/C0145R4NPS5/p1589805908013700> on
Slack today. So far I thought I treat them as usual providers, but Ash
raised some valid concenrs. so I wanted to raise teh proposal before I
start working on it/

*Context:*

Currently we have "Secret Backend" support built in in 2.0 and 1.10.10+. It
includes retrieving the variable and connections (via Secret Manager class)
for:

   -  Hashicorp Vault
   -  Secret Manager
   -  KMS
   -  AWS secret manager

Those secret managers are configured in:

[secret]
backend=<SecretManagerClass>
backend_kwargs={}

Those are available for use in a nice way (via Jinja templates and the
like), but they need support in the Core of Airlfow (so require 1.10.10+).
This means that if you are on pre 1.10.10 you cannot use those secrets.
Currently you can only use one secret per whole Airflow installation so if
youre secrets are split between several secret managers (or if secrets for
particular service require different credentials) - you cannot use the
mechanism to access such distributed secrets. It's not often case, but I
very well imagine it might happen that there are different sets of
credentials to access different secrets - some services might have
different scopes/level of access needed. .

*Proposal*

We have an idea that we might want also (on top of the above SecretManager
implementation) define generic Hooks for accessing secrets from those
services (just generic secrets, not connection, variables). Simply treat
each of the backends above as another "provider" and create a Hook to
access the service. Such Hook could have just one method:

def get_secret(self, path_prefix: str, secret_id: str) -> Optional[str]

It would use a connection defined (as usual) in ENV variables or database
of Airflow to authenticate with the secret service and retrieve the
secrets.

The good thing with it is that you could have easily multiple secret
backends configured to retrieve secrets for specific "service" (so that you
could keep "generic airflow's secerts" in one backend but still have
possibility of custom operators to use other backends (with different
authentication,  scopes etc.). And it is not touching any of the "core" of
Airflow. It's just a set of hooks with corresponding connections that work
the same way as accessing any other provider in Airflow. No core of Airflow
will be touched with this change.

*Pros/Cons*

*Con:*

I do realise it is a bit of duplication in functionality. We already have a
way to connect to a secret backend via airflow configuration and we should
likely promote it rather than introduce additional mechanism.

*Pros:*

* Most of all -> it adds flexibility of accessing several secret backends
for different use-cases. I looked at it so far in the way those hooks are
merely another set of "provider hooks". For me this is nothing different
than "providers" for any other services we have.  fFr example "cloudant"
provider has only "CloudantHook" that other custom operators can use. And I
well imagine this might be actually even more convenient to configure
connections in the DB and access secrets this way rather than having to
configure Secret Backends in Airflow configuration.

* The dupication there it is very, very limited (basically a method call to
secret backend).

* Another benefit of it is that it would allow people still stuck on pre
1.10.10 to  write custom operators that would like to use secret backends
(via backport operators). And still continue doing it in the future
(possibly migrating to 2.0/1.10.10+ in cases when there is one secret
backed only - but continue ot use connections/hooks where some specific
secrets shoudl be kept in different secret backend.

I would like to hear your opinion on that.

J.

-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Re: [PROPOSAL] Secret Backend Hooks

Posted by Kaxil Naik <ka...@gmail.com>.
Hi Jarek,

I would like if we keep "Secrets" as a separate concept and not mix it with
"hooks".

While introducing and reviewing the initial PRs and AIP about it, it made
sense to have "Secrets" as a separate concept altogether.

"Secrets" is what the Hook would use to interact with external services.

As soon as we try to mix those concepts it starts getting
unnecessarily complicated.

Example: Hashicorp Vault is what I use to get my Airflow Connections that
is used in my hooks and operators. But if I need to define Connection for
the Secrets that I used to get Connections, it gets "tricky".

I would treat getting "Secrets" as the same way as we connect to our "SQL
Backend" (sql_alchemy_conn in airflow.cfg).

One of the main benefits of the Secrets Backend was we do not need to
define anything Connections at all in the Metadata DB. And automation tools
can handle airflow.cfg as a single point for this configs.

We should not also try to backport Core Services to older versions. I feel
very strongly about it and feel "Secrets" concepts is a part of a core
functionality rather than a hook. For users > 1.10.10, it is already very
easy to write their own secrets backend.

We should I think try to get Airflow 2.0 soon'ish with all the features we
want to.

Note: I am not at my 100% today and if I have misunderstood something I
will respond back  in coming days.

Regards,
Kaxil


On Mon, May 18, 2020, 14:46 Jarek Potiuk <Ja...@polidea.com> wrote:

> Hello Everyone,
>
> TL;DR; I was just about to start to work on a small set of Hooks -
> dedicated to retrieving screts from the Secret Backend. I discussed it
> with Ash
> and Kamil
> <https://apache-airflow.slack.com/archives/C0145R4NPS5/p1589805908013700>
> on
> Slack today. So far I thought I treat them as usual providers, but Ash
> raised some valid concenrs. so I wanted to raise teh proposal before I
> start working on it/
>
> *Context:*
>
> Currently we have "Secret Backend" support built in in 2.0 and 1.10.10+. It
> includes retrieving the variable and connections (via Secret Manager class)
> for:
>
>    -  Hashicorp Vault
>    -  Secret Manager
>    -  KMS
>    -  AWS secret manager
>
> Those secret managers are configured in:
>
> [secret]
> backend=<SecretManagerClass>
> backend_kwargs={}
>
> Those are available for use in a nice way (via Jinja templates and the
> like), but they need support in the Core of Airlfow (so require 1.10.10+).
> This means that if you are on pre 1.10.10 you cannot use those secrets.
> Currently you can only use one secret per whole Airflow installation so if
> youre secrets are split between several secret managers (or if secrets for
> particular service require different credentials) - you cannot use the
> mechanism to access such distributed secrets. It's not often case, but I
> very well imagine it might happen that there are different sets of
> credentials to access different secrets - some services might have
> different scopes/level of access needed. .
>
> *Proposal*
>
> We have an idea that we might want also (on top of the above SecretManager
> implementation) define generic Hooks for accessing secrets from those
> services (just generic secrets, not connection, variables). Simply treat
> each of the backends above as another "provider" and create a Hook to
> access the service. Such Hook could have just one method:
>
> def get_secret(self, path_prefix: str, secret_id: str) -> Optional[str]
>
> It would use a connection defined (as usual) in ENV variables or database
> of Airflow to authenticate with the secret service and retrieve the
> secrets.
>
> The good thing with it is that you could have easily multiple secret
> backends configured to retrieve secrets for specific "service" (so that you
> could keep "generic airflow's secerts" in one backend but still have
> possibility of custom operators to use other backends (with different
> authentication,  scopes etc.). And it is not touching any of the "core" of
> Airflow. It's just a set of hooks with corresponding connections that work
> the same way as accessing any other provider in Airflow. No core of Airflow
> will be touched with this change.
>
> *Pros/Cons*
>
> *Con:*
>
> I do realise it is a bit of duplication in functionality. We already have a
> way to connect to a secret backend via airflow configuration and we should
> likely promote it rather than introduce additional mechanism.
>
> *Pros:*
>
> * Most of all -> it adds flexibility of accessing several secret backends
> for different use-cases. I looked at it so far in the way those hooks are
> merely another set of "provider hooks". For me this is nothing different
> than "providers" for any other services we have.  fFr example "cloudant"
> provider has only "CloudantHook" that other custom operators can use. And I
> well imagine this might be actually even more convenient to configure
> connections in the DB and access secrets this way rather than having to
> configure Secret Backends in Airflow configuration.
>
> * The dupication there it is very, very limited (basically a method call to
> secret backend).
>
> * Another benefit of it is that it would allow people still stuck on pre
> 1.10.10 to  write custom operators that would like to use secret backends
> (via backport operators). And still continue doing it in the future
> (possibly migrating to 2.0/1.10.10+ in cases when there is one secret
> backed only - but continue ot use connections/hooks where some specific
> secrets shoudl be kept in different secret backend.
>
> I would like to hear your opinion on that.
>
> J.
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>

Re: [PROPOSAL] Secret Backend Hooks

Posted by Jarek Potiuk <Ja...@polidea.com>.
Cool.  I thought it's a misunderstanding :). Great it is clear now!

J.

On Tue, May 19, 2020 at 11:17 AM Ash Berlin-Taylor <as...@apache.org> wrote:

> Yes, sorry I got completely the wrong idea somehow. This makes sense,
> and thank you for patiently explaining it to me until I got it!
>
> My main reason for questioning was not this specific feature, but the
> gradual "scope creep" of Airflow operators.
>
> One of the hardest things we as project "stewards" have to do is say no
> to features. There have been a few examples of merged PRs that I've seen
> recently where my immediate reaction was "just use Terraform".
>
> My main worry is that "when all you have is a hammer, everything looks
> like a nail" syndrome, where we end up re-inventing
> Ansible/Terraform/CloudFormation (to give an example), but badly, and in
> operators.
>
> -ash
>
> On May 19 2020, at 7:26 am, Jarek Potiuk <Ja...@polidea.com> wrote:
>
> > Let me start again from scratch and use KMS as an example. Maybe -
> > again -
> > we understand things differently:
> >
> > Just to start KMSHook -> has two methods: "encrypt" and "decrypt". I
> would
> > continue to use that as  a base.
> >
> > Again let me repeat that. I do not want to implement a generic
> SecretHook!
> > I also do not want to implement SecretOperator. I never wanted to. I
> wanted
> > to implement VaultHook, GCPSeceretManagerHook, AWSSecretManagerHook.
> >
> > *Assumptions for the use case*
> >
> > * Let's assume all secrets of airflow (Connections and Variables) are
> kept
> > in HashiCorpVault (using SecretsBackend) Airflow is configured to read
> them
> > as Variables/Connections
> > * For security reasons those secrets are read-only for Airflow. The Vault
> > is very secure - only high security admins have access there. Airflow can
> > only read from it.
> > * Additionally the company uses KMS to keep encrypted data which is more
> > "temporary" in nature but still should be kept secret and never stored
> > in a
> > traditional database. It keeps the history of those secrets and audit log
> > so that in case of any breach we can track the origin of the breach
> > * Aaccess to the KMS service from within Airflow is both READ and WRITE
> > * One of the Connections we have in the Airflow Connections (in the
> Vault)
> > are GCP credentials to both read/write from KMS. It is rotated frequently
> > in the Vault so that it's unintended use in case it leaks is limited
> > to -
> > say - 24 hours
> >
> > *The Use Case:*
> >
> > 1) We need to generate a random SEEED to start our calculations from. We
> > need the same SEED by every job as parameter. However we never want to
> > store the SEED in the Airflow database (so they cannot be passed as
> XCOM).
> > In the job we have Custom Operators that do this: (note that the
> complexity
> > of handling authentication to KMS is handled - as usual by the KMSHook.
> > KMSHook derives from GcpBaseHook and has all the complexity of
> > handling the
> > authentication implemented):
> >
> > hook = KMSHook(conn_id="service_account_for_accessing_kms_hook")
> > seed = rand()
> > hook.encrypt(key="seed<dag_id><run_id>")
> >
> > 2) Then we run each of the jobs. Those jobs use custom operators that do:
> >
> > hook = KMSHook(conn_id="service_account_for_accessing_kms_hook")
> > seed = hook.decrypt("seed<dag_id><run_id>)
> >
> > In this case we treat KMS as a database of secrets that are temporary and
> > can be used across the jobs - but never stored in a "traditional"
> database.
> > they are stored encrypted and the job has full control over the key names
> > used.
> >
> > Surely we could use GCS or any other storage for that, but KMS gives us:
> >
> > * audit logs
> > * encryption at rest
> > * history of seeds
> > * potential to destroy the secure data safely without the option of
> recovery
> >
> > 3) A the end we could even invalidate such secret if we add "delete"
> method
> > (which I have not thought about but I think it makes sense)..
> >
> > *Proposal*
> >
> > I want the same capabilities as we have now with KMSHook to be
> > available in
> > new hooks: VaultHook, GCPSeceretManagerHook, AWSSecretManagerHook. So
> that
> > they can also be used as Hooks by Airflow to access (both read and write)
> > any kind of secrets there.
> >
> > I really, really do not see why this is a problem of any sort. I
> > wonder if
> > others see it as a problem?  Ash, maybe you misunderstood the
> > intention ?
> >
> > J.
> >
> > On Tue, May 19, 2020 at 12:05 AM Ash Berlin-Taylor <as...@apache.org>
> wrote:
> >
> >> This is why I was asking for a concrete example of where you'd want to
> >> use this, right now I still can't see what problem you are aiming to
> >> solve with this proposal.
> >>
> >> So I'll ask again Jarek: Do you have a concrete use case that is driving
> >> you wanting to create SecretManagerHook?
> >>
> >> > Straightforward API to be used by the operators.
> >>
> >> We already have that, don't we? It's the SecretsBackend API.
> >>
> >>
> >> Would your ask be solved by being able to configure multiple secrets
> >> backends rather than a just a single one?
> >>
> >>
> >> > hook = SecretManagerHook(conn_id = conn_id)
> >> > hook.decrypt(key="KEY")
> >>
> >> I don't think any of the  secrets backends support encrypting/decrypting
> >> values, did you mean `hook.get_secret` here?
> >>
> >> A counter proposal: Just use Variables within the operator. There is
> >> essentially no difference between "a secret" and "a variable", and
> >> doesn't introduce a whole new concept to Airflow.
> >>
> >>
> >> The advantage of Hooks is they are know how to ask for a connection ID
> >> to find their credentials.
> >>
> >> But for a secrets backend/hook that all gets very self-referential. So
> >> to use KMSHook does that mean I need to configure the secrets backend
> >> (to look up connections) _and_ create a connection for itself in KMS so
> >> the KMSHook and connect to it?
> >>
> >> That is my major complaint I think. It strikes me as a very messy API
> >> that is prone to user confusion and hard to document.
> >>
> >>
> >> I do have one ask though if we do go down this route: that we don't end
> >> up duplicating code to speak to the Secrets providers (i.e. in hooks and
> >> in secrets backends) - it should live in one place only. (I'm sure you'd
> >> do this anyway, I just wanted to state it)
> >>
> >> -ash
> >>
> >>
> >> On May 18 2020, at 10:42 pm, Jarek Potiuk <Ja...@polidea.com>
> >> wrote:
> >>
> >> > On Mon, May 18, 2020 at 11:17 PM Ash Berlin-Taylor <as...@apache.org>
> >> wrote:
> >> >
> >> >> > GCP Secret Manager Hook, Vault Hook, KMS Hook, AWS Secret Hook
> >> >>
> >> >> Why do we even need Hooks for those? Why can't we use the existing
> GCP
> >> >> Secret Manager class inside a custom operator? What does creating
> >> a hook
> >> >> give us?
> >> >>
> >> >
> >> > The same as all the other hooks. Common way to authenticate to the
> >> service
> >> > (using Airflow Connection mechanism). Straightforward API to be used
> >> > by the
> >> > operators.
> >> >
> >> > Now that Kaxil mentioned it - This is exactly what KMS hooks gives
> >> - it
> >> > cause already defined
> >> > connection id in Airflow DB to authenticate to KMS and encrypt/decrypt
> >> > secret. Please take a look there:
> >> >
> >> >
> >>
> https://github.com/apache/airflow/blob/master/airflow/providers/google/cloud/hooks/kms.py
> >> >
> >> > Then in operator I'd use it like:
> >> >
> >> > hook = KMSHook(conn_id = conn_id)
> >> > hook.decrypt(key="KEY", cipher ="") (the cipher part is
> KMS-specific)..
> >> > or
> >> >
> >> > hook = SecretManagerHook(conn_id = conn_id)
> >> > hook.decrypt(key="KEY")
> >> >
> >> > Which is substantially easier than handling all the
> >> > authentication/credential options (for example in GCP case it
> >> handles all
> >> > the different forms of authentication 0 like variables, json file,
> >> > connection-encoded-credentials) out of the box. This is the very same
> >> > reason pretty much any hook exists in Airflow. The only difference for
> >> > secrets is that it makes no sense to write operators for them
> >> because of
> >> > having to pass decrypted secrets via XCom.
> >> >
> >> > From the very beginning of this conversation, I was surprised it is at
> >> all
> >> > a problem for anyone.
> >> >
> >> > I never intended to make any shared service, I just wanted to
> implement
> >> > what Kaxil described - separate hooks for all the secrets - same as
> any
> >> > other service.
> >> > I am quite surprised it is a problem for anyone (now knowing that KMS
> >> Hook
> >> > already exist at all makes it even more surprising).
> >> > J.
> >> >
> >> >
> >> >
> >> >>
> >> >> -a
> >> >>
> >> >> On May 18 2020, at 9:50 pm, Jarek Potiuk <Ja...@polidea.com>
> >> wrote:
> >> >>
> >> >> >> Are we all talking about different things ?
> >> >> >
> >> >> > Good point. I think that's the main source of confusion here and we
> >> >> > think about different things.
> >> >> >
> >> >> >> So what I feel that the use case that Nathan defined can just be
> >> >> >> solved a
> >> >> >> VaultHook & VaultOperator for example.
> >> >> >
> >> >> > That's what I was talking (from the beginning - maybe it was not
> >> >> > clear) about separate hooks for each service. Not a shared one. GCP
> >> >> > Secret Manager Hook, Vault Hook,  KMS Hook, AWS Secret Hook -
> >> all of
> >> >> > them separate, in different providers, and simple hooks to be
> >> used by
> >> >> > whoever wants to use them in their custom operators.
> >> >> >
> >> >> > We also talked about implementing operators, but there is very
> little
> >> >> > use of generic Operators for secrets. Mainly because the only way
> >> >> > operators can pass anything to other operators (tasks) is via xcom
> >> >> > which would make the secrets stored plain text in the database.
> That
> >> >> > is rather bad I am afraid. Having Hooks make them instantiatable in
> >> >> > the context of running tasks, use Fernet to decrypt credentials
> from
> >> >> > the Connection DB, request to retrieve secret from the backend and
> >> >> > pass the unencrypted secret to the other parts of the operator -
> all
> >> >> > in the context of a single worker/task.
> >> >> >
> >> >> >>
> >> >> >> This should not be confused with "Secrets" at all. Why do we
> >> need to
> >> >> create
> >> >> >> a generic Hooks for all Secrets Backend?
> >> >> >
> >> >> > No generic hooks :). I never meant it to be generic.Maybe that's a
> >> >> > confusion there - I wanted to implement a separate hook for
> >> every type
> >> >> > of backend.
> >> >> >
> >> >> >> Consider we use PostgreSQL for backend and the connection is
> >> >> defined in
> >> >> >> airflow.cfg. Now you can still use the MySQLHook and PostgresHook
> >> >> >> independently to connect to those Databases, correct.
> >> >> >>
> >> >> >> But they both should not be confused to be using anything
> "shared".
> >> >> >
> >> >> > No plans for that whatsoever.
> >> >> >
> >> >> >> The proposal if I interpret correctly talks about the following:
> >> >> >>
> >> >> >> We have an idea that we might want also (on top of the above
> >> >> SecretManager
> >> >> >> > implementation) define generic Hooks for accessing secrets from
> >> those
> >> >> >> > services (just generic secrets, not connection, variables).
> Simply
> >> >> treat
> >> >> >> > each of the backends above as another "provider" and create a
> >> >> Hook to
> >> >> >> > access the service. Such Hook could have just one method:
> >> >> >> > def get_secret(self, path_prefix: str, secret_id: str) ->
> >> >> Optional[str]
> >> >> >> > It would use a connection defined (as usual) in ENV variables or
> >> >> database
> >> >> >> > of Airflow to authenticate with the secret service and
> >> retrieve the
> >> >> >> > secrets.
> >> >> >
> >> >> > OK. maybe confusion is about 'generic' . My "generic" was ("no
> >> >> > connections, no variables") - just retrieve "generic" secret.
> Separate
> >> >> > implementation for Hashicorp Vault, Separate for Secret Manager,
> etc.
> >> >> >
> >> >> >> The connection can be defined in The Secrets backend. To make it
> >> >> clearer,
> >> >> >> "Vault" in Nathan's case is a "Service" and has nothing to do with
> >> >> >> SecretsBackend similar to how PostgresHook or MySQLHook has
> nothing
> >> >> >> to do
> >> >> >> with using Postgres as Airflow MetadataDB Backend.
> >> >> >>
> >> >> >> Another example is Google KMS, there is already Hook for Google
> >> >> KMS (
> >> >> >>
> >> >>
> >>
> https://github.com/apache/airflow/blob/master/airflow/providers/google/cloud/hooks/kms.py
> >> >> )
> >> >> >> and an Operator can be created. Same can be done for Google
> Secrets
> >> >> Manager
> >> >> >> and Hashicorp Vault, in which cases all of these are "Services".
> >> >> >
> >> >> > That's exactly what I plan to implement. As explained above -
> Operator
> >> >> > for secrets makes no sense because it would have to pass the
> secrets
> >> >> > via xcom :(. I did not even check that we already have KMS hook.
> >> I was
> >> >> > mostly about Vault and Secret Manager and AWS Secret Manager.
> Knowing
> >> >> > that we have KMS makes it even easier :).
> >> >> >
> >> >> >> We could create SecretsHook similar to DbApiHook (
> >> >> >>
> >> >>
> >>
> https://github.com/apache/airflow/blob/master/airflow/hooks/dbapi_hook.py)
> >> >> >> if we want to just define the single *get_secret* method you
> talked
> >> >> about.
> >> >> >
> >> >> > I don't even plan that in fact, I thought about implementing
> several
> >> >> > totally independent Hooks for each of the Backend Secrets.
> >> >> >
> >> >> >> The concept of "Secrets Backend" is to allow Managing of "Secrets
> >> >> >> used in
> >> >> >> Airflow" (Either to connect to an external system or Variables) in
> >> >> actual
> >> >> >> Secret Management Tools.
> >> >> >>
> >> >> >
> >> >> > Yeah. I do not - at all - want to mess with that :)
> >> >> >
> >> >> >>
> >> >> >> *Pros:*
> >> >> >> >  And I
> >> >> >> > well imagine this might be actually even more convenient to
> >> configure
> >> >> >> > connections in the DB and access secrets this way rather than
> >> >> >> having to
> >> >> >> > configure Secret Backends in Airflow configuration.
> >> >> >>
> >> >> >> This is exactly where both "Secrets" and the "Service" terms are
> >> >> >> mixed I
> >> >> >> think. Again echoing what I said above : The concept of "Secrets
> >> >> Backend"
> >> >> >> is to allow Managing of "Secrets used in Airflow".
> >> >> >> The Secrets Backend is so that you don't need to store secrets in
> >> >> Airflow
> >> >> >> Metadata DB whether they can encrypted or not as there are
> >> tools that
> >> >> are
> >> >> >> specifically designed to handle "Secrets, rotation of secrets
> etc".
> >> >> Having
> >> >> >> the Hook and Operator to talk to the Service should be separate.
> >> >> >
> >> >> > Full agreement - I do not want to intermix those. It was always
> >> >> > thought as per-provider implementation of traditional "Hook".
> >> >> >
> >> >> >>
> >> >> >> * Another benefit of it is that it would allow people still stuck
> >> >> on pre
> >> >> >> > 1.10.10 to  write custom operators that would like to use secret
> >> >> backends
> >> >> >> > (via backport operators). And still continue doing it in the
> future
> >> >> >> > (possibly migrating to 2.0/1.10.10+ in cases when there is one
> >> secret
> >> >> >> > backed only - but continue ot use connections/hooks where some
> >> >> specific
> >> >> >> > secrets shoudl be kept in different secret backend.
> >> >> >>
> >> >> >>
> >> >> >> What is the objective here: (1) is it to interact with those
> Services
> >> >> >> (Vault or Secrets Manager etc) or (2) Get Airflow Connections and
> >> >> Variables
> >> >> >> from different Secrets Backend
> >> >> >
> >> >> > Just to interact with it - no plans at all to get Airflow
> Connections
> >> >> > nor Variables.
> >> >> >
> >> >> >>
> >> >> >> Regards,
> >> >> >> Kaxil
> >> >> >>
> >> >> >
> >> >>
> >> >
> >> >
> >> > --
> >> >
> >> > Jarek Potiuk
> >> >
> >>
> >
> >
> > --
> >
> > Jarek Potiuk
> > Polidea <https://www.polidea.com/> | Principal Software Engineer
> >
> > M: +48 660 796 129 <+48660796129>
> > [image: Polidea] <https://www.polidea.com/>
> >
>


-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Re: [PROPOSAL] Secret Backend Hooks

Posted by Ash Berlin-Taylor <as...@apache.org>.
Yes, sorry I got completely the wrong idea somehow. This makes sense,
and thank you for patiently explaining it to me until I got it!

My main reason for questioning was not this specific feature, but the
gradual "scope creep" of Airflow operators.

One of the hardest things we as project "stewards" have to do is say no
to features. There have been a few examples of merged PRs that I've seen
recently where my immediate reaction was "just use Terraform".

My main worry is that "when all you have is a hammer, everything looks
like a nail" syndrome, where we end up re-inventing
Ansible/Terraform/CloudFormation (to give an example), but badly, and in operators.

-ash

On May 19 2020, at 7:26 am, Jarek Potiuk <Ja...@polidea.com> wrote:

> Let me start again from scratch and use KMS as an example. Maybe -
> again -
> we understand things differently:
> 
> Just to start KMSHook -> has two methods: "encrypt" and "decrypt". I would
> continue to use that as  a base.
> 
> Again let me repeat that. I do not want to implement a generic SecretHook!
> I also do not want to implement SecretOperator. I never wanted to. I wanted
> to implement VaultHook, GCPSeceretManagerHook, AWSSecretManagerHook.
> 
> *Assumptions for the use case*
> 
> * Let's assume all secrets of airflow (Connections and Variables) are kept
> in HashiCorpVault (using SecretsBackend) Airflow is configured to read them
> as Variables/Connections
> * For security reasons those secrets are read-only for Airflow. The Vault
> is very secure - only high security admins have access there. Airflow can
> only read from it.
> * Additionally the company uses KMS to keep encrypted data which is more
> "temporary" in nature but still should be kept secret and never stored
> in a
> traditional database. It keeps the history of those secrets and audit log
> so that in case of any breach we can track the origin of the breach
> * Aaccess to the KMS service from within Airflow is both READ and WRITE
> * One of the Connections we have in the Airflow Connections (in the Vault)
> are GCP credentials to both read/write from KMS. It is rotated frequently
> in the Vault so that it's unintended use in case it leaks is limited
> to -
> say - 24 hours
> 
> *The Use Case:*
> 
> 1) We need to generate a random SEEED to start our calculations from. We
> need the same SEED by every job as parameter. However we never want to
> store the SEED in the Airflow database (so they cannot be passed as XCOM).
> In the job we have Custom Operators that do this: (note that the complexity
> of handling authentication to KMS is handled - as usual by the KMSHook.
> KMSHook derives from GcpBaseHook and has all the complexity of
> handling the
> authentication implemented):
> 
> hook = KMSHook(conn_id="service_account_for_accessing_kms_hook")
> seed = rand()
> hook.encrypt(key="seed<dag_id><run_id>")
> 
> 2) Then we run each of the jobs. Those jobs use custom operators that do:
> 
> hook = KMSHook(conn_id="service_account_for_accessing_kms_hook")
> seed = hook.decrypt("seed<dag_id><run_id>)
> 
> In this case we treat KMS as a database of secrets that are temporary and
> can be used across the jobs - but never stored in a "traditional" database.
> they are stored encrypted and the job has full control over the key names
> used.
> 
> Surely we could use GCS or any other storage for that, but KMS gives us:
> 
> * audit logs
> * encryption at rest
> * history of seeds
> * potential to destroy the secure data safely without the option of recovery
> 
> 3) A the end we could even invalidate such secret if we add "delete" method
> (which I have not thought about but I think it makes sense)..
> 
> *Proposal*
> 
> I want the same capabilities as we have now with KMSHook to be
> available in
> new hooks: VaultHook, GCPSeceretManagerHook, AWSSecretManagerHook. So that
> they can also be used as Hooks by Airflow to access (both read and write)
> any kind of secrets there.
> 
> I really, really do not see why this is a problem of any sort. I
> wonder if
> others see it as a problem?  Ash, maybe you misunderstood the
> intention ?
> 
> J.
> 
> On Tue, May 19, 2020 at 12:05 AM Ash Berlin-Taylor <as...@apache.org> wrote:
> 
>> This is why I was asking for a concrete example of where you'd want to
>> use this, right now I still can't see what problem you are aiming to
>> solve with this proposal.
>> 
>> So I'll ask again Jarek: Do you have a concrete use case that is driving
>> you wanting to create SecretManagerHook?
>> 
>> > Straightforward API to be used by the operators.
>> 
>> We already have that, don't we? It's the SecretsBackend API.
>> 
>> 
>> Would your ask be solved by being able to configure multiple secrets
>> backends rather than a just a single one?
>> 
>> 
>> > hook = SecretManagerHook(conn_id = conn_id)
>> > hook.decrypt(key="KEY")
>> 
>> I don't think any of the  secrets backends support encrypting/decrypting
>> values, did you mean `hook.get_secret` here?
>> 
>> A counter proposal: Just use Variables within the operator. There is
>> essentially no difference between "a secret" and "a variable", and
>> doesn't introduce a whole new concept to Airflow.
>> 
>> 
>> The advantage of Hooks is they are know how to ask for a connection ID
>> to find their credentials.
>> 
>> But for a secrets backend/hook that all gets very self-referential. So
>> to use KMSHook does that mean I need to configure the secrets backend
>> (to look up connections) _and_ create a connection for itself in KMS so
>> the KMSHook and connect to it?
>> 
>> That is my major complaint I think. It strikes me as a very messy API
>> that is prone to user confusion and hard to document.
>> 
>> 
>> I do have one ask though if we do go down this route: that we don't end
>> up duplicating code to speak to the Secrets providers (i.e. in hooks and
>> in secrets backends) - it should live in one place only. (I'm sure you'd
>> do this anyway, I just wanted to state it)
>> 
>> -ash
>> 
>> 
>> On May 18 2020, at 10:42 pm, Jarek Potiuk <Ja...@polidea.com>
>> wrote:
>> 
>> > On Mon, May 18, 2020 at 11:17 PM Ash Berlin-Taylor <as...@apache.org>
>> wrote:
>> >
>> >> > GCP Secret Manager Hook, Vault Hook, KMS Hook, AWS Secret Hook
>> >>
>> >> Why do we even need Hooks for those? Why can't we use the existing GCP
>> >> Secret Manager class inside a custom operator? What does creating
>> a hook
>> >> give us?
>> >>
>> >
>> > The same as all the other hooks. Common way to authenticate to the
>> service
>> > (using Airflow Connection mechanism). Straightforward API to be used
>> > by the
>> > operators.
>> >
>> > Now that Kaxil mentioned it - This is exactly what KMS hooks gives
>> - it
>> > cause already defined
>> > connection id in Airflow DB to authenticate to KMS and encrypt/decrypt
>> > secret. Please take a look there:
>> >
>> >
>> https://github.com/apache/airflow/blob/master/airflow/providers/google/cloud/hooks/kms.py
>> >
>> > Then in operator I'd use it like:
>> >
>> > hook = KMSHook(conn_id = conn_id)
>> > hook.decrypt(key="KEY", cipher ="") (the cipher part is KMS-specific)..
>> > or
>> >
>> > hook = SecretManagerHook(conn_id = conn_id)
>> > hook.decrypt(key="KEY")
>> >
>> > Which is substantially easier than handling all the
>> > authentication/credential options (for example in GCP case it
>> handles all
>> > the different forms of authentication 0 like variables, json file,
>> > connection-encoded-credentials) out of the box. This is the very same
>> > reason pretty much any hook exists in Airflow. The only difference for
>> > secrets is that it makes no sense to write operators for them
>> because of
>> > having to pass decrypted secrets via XCom.
>> >
>> > From the very beginning of this conversation, I was surprised it is at
>> all
>> > a problem for anyone.
>> >
>> > I never intended to make any shared service, I just wanted to implement
>> > what Kaxil described - separate hooks for all the secrets - same as any
>> > other service.
>> > I am quite surprised it is a problem for anyone (now knowing that KMS
>> Hook
>> > already exist at all makes it even more surprising).
>> > J.
>> >
>> >
>> >
>> >>
>> >> -a
>> >>
>> >> On May 18 2020, at 9:50 pm, Jarek Potiuk <Ja...@polidea.com>
>> wrote:
>> >>
>> >> >> Are we all talking about different things ?
>> >> >
>> >> > Good point. I think that's the main source of confusion here and we
>> >> > think about different things.
>> >> >
>> >> >> So what I feel that the use case that Nathan defined can just be
>> >> >> solved a
>> >> >> VaultHook & VaultOperator for example.
>> >> >
>> >> > That's what I was talking (from the beginning - maybe it was not
>> >> > clear) about separate hooks for each service. Not a shared one. GCP
>> >> > Secret Manager Hook, Vault Hook,  KMS Hook, AWS Secret Hook -
>> all of
>> >> > them separate, in different providers, and simple hooks to be
>> used by
>> >> > whoever wants to use them in their custom operators.
>> >> >
>> >> > We also talked about implementing operators, but there is very little
>> >> > use of generic Operators for secrets. Mainly because the only way
>> >> > operators can pass anything to other operators (tasks) is via xcom
>> >> > which would make the secrets stored plain text in the database. That
>> >> > is rather bad I am afraid. Having Hooks make them instantiatable in
>> >> > the context of running tasks, use Fernet to decrypt credentials from
>> >> > the Connection DB, request to retrieve secret from the backend and
>> >> > pass the unencrypted secret to the other parts of the operator - all
>> >> > in the context of a single worker/task.
>> >> >
>> >> >>
>> >> >> This should not be confused with "Secrets" at all. Why do we
>> need to
>> >> create
>> >> >> a generic Hooks for all Secrets Backend?
>> >> >
>> >> > No generic hooks :). I never meant it to be generic.Maybe that's a
>> >> > confusion there - I wanted to implement a separate hook for
>> every type
>> >> > of backend.
>> >> >
>> >> >> Consider we use PostgreSQL for backend and the connection is
>> >> defined in
>> >> >> airflow.cfg. Now you can still use the MySQLHook and PostgresHook
>> >> >> independently to connect to those Databases, correct.
>> >> >>
>> >> >> But they both should not be confused to be using anything "shared".
>> >> >
>> >> > No plans for that whatsoever.
>> >> >
>> >> >> The proposal if I interpret correctly talks about the following:
>> >> >>
>> >> >> We have an idea that we might want also (on top of the above
>> >> SecretManager
>> >> >> > implementation) define generic Hooks for accessing secrets from
>> those
>> >> >> > services (just generic secrets, not connection, variables). Simply
>> >> treat
>> >> >> > each of the backends above as another "provider" and create a
>> >> Hook to
>> >> >> > access the service. Such Hook could have just one method:
>> >> >> > def get_secret(self, path_prefix: str, secret_id: str) ->
>> >> Optional[str]
>> >> >> > It would use a connection defined (as usual) in ENV variables or
>> >> database
>> >> >> > of Airflow to authenticate with the secret service and
>> retrieve the
>> >> >> > secrets.
>> >> >
>> >> > OK. maybe confusion is about 'generic' . My "generic" was ("no
>> >> > connections, no variables") - just retrieve "generic" secret. Separate
>> >> > implementation for Hashicorp Vault, Separate for Secret Manager, etc.
>> >> >
>> >> >> The connection can be defined in The Secrets backend. To make it
>> >> clearer,
>> >> >> "Vault" in Nathan's case is a "Service" and has nothing to do with
>> >> >> SecretsBackend similar to how PostgresHook or MySQLHook has nothing
>> >> >> to do
>> >> >> with using Postgres as Airflow MetadataDB Backend.
>> >> >>
>> >> >> Another example is Google KMS, there is already Hook for Google
>> >> KMS (
>> >> >>
>> >>
>> https://github.com/apache/airflow/blob/master/airflow/providers/google/cloud/hooks/kms.py
>> >> )
>> >> >> and an Operator can be created. Same can be done for Google Secrets
>> >> Manager
>> >> >> and Hashicorp Vault, in which cases all of these are "Services".
>> >> >
>> >> > That's exactly what I plan to implement. As explained above - Operator
>> >> > for secrets makes no sense because it would have to pass the secrets
>> >> > via xcom :(. I did not even check that we already have KMS hook.
>> I was
>> >> > mostly about Vault and Secret Manager and AWS Secret Manager. Knowing
>> >> > that we have KMS makes it even easier :).
>> >> >
>> >> >> We could create SecretsHook similar to DbApiHook (
>> >> >>
>> >>
>> https://github.com/apache/airflow/blob/master/airflow/hooks/dbapi_hook.py)
>> >> >> if we want to just define the single *get_secret* method you talked
>> >> about.
>> >> >
>> >> > I don't even plan that in fact, I thought about implementing several
>> >> > totally independent Hooks for each of the Backend Secrets.
>> >> >
>> >> >> The concept of "Secrets Backend" is to allow Managing of "Secrets
>> >> >> used in
>> >> >> Airflow" (Either to connect to an external system or Variables) in
>> >> actual
>> >> >> Secret Management Tools.
>> >> >>
>> >> >
>> >> > Yeah. I do not - at all - want to mess with that :)
>> >> >
>> >> >>
>> >> >> *Pros:*
>> >> >> >  And I
>> >> >> > well imagine this might be actually even more convenient to
>> configure
>> >> >> > connections in the DB and access secrets this way rather than
>> >> >> having to
>> >> >> > configure Secret Backends in Airflow configuration.
>> >> >>
>> >> >> This is exactly where both "Secrets" and the "Service" terms are
>> >> >> mixed I
>> >> >> think. Again echoing what I said above : The concept of "Secrets
>> >> Backend"
>> >> >> is to allow Managing of "Secrets used in Airflow".
>> >> >> The Secrets Backend is so that you don't need to store secrets in
>> >> Airflow
>> >> >> Metadata DB whether they can encrypted or not as there are
>> tools that
>> >> are
>> >> >> specifically designed to handle "Secrets, rotation of secrets etc".
>> >> Having
>> >> >> the Hook and Operator to talk to the Service should be separate.
>> >> >
>> >> > Full agreement - I do not want to intermix those. It was always
>> >> > thought as per-provider implementation of traditional "Hook".
>> >> >
>> >> >>
>> >> >> * Another benefit of it is that it would allow people still stuck
>> >> on pre
>> >> >> > 1.10.10 to  write custom operators that would like to use secret
>> >> backends
>> >> >> > (via backport operators). And still continue doing it in the future
>> >> >> > (possibly migrating to 2.0/1.10.10+ in cases when there is one
>> secret
>> >> >> > backed only - but continue ot use connections/hooks where some
>> >> specific
>> >> >> > secrets shoudl be kept in different secret backend.
>> >> >>
>> >> >>
>> >> >> What is the objective here: (1) is it to interact with those Services
>> >> >> (Vault or Secrets Manager etc) or (2) Get Airflow Connections and
>> >> Variables
>> >> >> from different Secrets Backend
>> >> >
>> >> > Just to interact with it - no plans at all to get Airflow Connections
>> >> > nor Variables.
>> >> >
>> >> >>
>> >> >> Regards,
>> >> >> Kaxil
>> >> >>
>> >> >
>> >>
>> >
>> >
>> > --
>> >
>> > Jarek Potiuk
>> >
>> 
> 
> 
> -- 
> 
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
> 
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
> 

Re: [PROPOSAL] Secret Backend Hooks

Posted by Jarek Potiuk <Ja...@polidea.com>.
 Let me start again from scratch and use KMS as an example. Maybe - again -
we understand things differently:

Just to start KMSHook -> has two methods: "encrypt" and "decrypt". I would
continue to use that as  a base.

Again let me repeat that. I do not want to implement a generic SecretHook!
I also do not want to implement SecretOperator. I never wanted to. I wanted
to implement VaultHook, GCPSeceretManagerHook, AWSSecretManagerHook.

*Assumptions for the use case*

* Let's assume all secrets of airflow (Connections and Variables) are kept
in HashiCorpVault (using SecretsBackend) Airflow is configured to read them
as Variables/Connections
* For security reasons those secrets are read-only for Airflow. The Vault
is very secure - only high security admins have access there. Airflow can
only read from it.
* Additionally the company uses KMS to keep encrypted data which is more
"temporary" in nature but still should be kept secret and never stored in a
traditional database. It keeps the history of those secrets and audit log
so that in case of any breach we can track the origin of the breach
* Aaccess to the KMS service from within Airflow is both READ and WRITE
* One of the Connections we have in the Airflow Connections (in the Vault)
are GCP credentials to both read/write from KMS. It is rotated frequently
in the Vault so that it's unintended use in case it leaks is limited to -
say - 24 hours

*The Use Case:*

1) We need to generate a random SEEED to start our calculations from. We
need the same SEED by every job as parameter. However we never want to
store the SEED in the Airflow database (so they cannot be passed as XCOM).
In the job we have Custom Operators that do this: (note that the complexity
of handling authentication to KMS is handled - as usual by the KMSHook.
KMSHook derives from GcpBaseHook and has all the complexity of handling the
authentication implemented):

hook = KMSHook(conn_id="service_account_for_accessing_kms_hook")
seed = rand()
hook.encrypt(key="seed<dag_id><run_id>")

2) Then we run each of the jobs. Those jobs use custom operators that do:

hook = KMSHook(conn_id="service_account_for_accessing_kms_hook")
seed = hook.decrypt("seed<dag_id><run_id>)

In this case we treat KMS as a database of secrets that are temporary and
can be used across the jobs - but never stored in a "traditional" database.
they are stored encrypted and the job has full control over the key names
used.

Surely we could use GCS or any other storage for that, but KMS gives us:

* audit logs
* encryption at rest
* history of seeds
* potential to destroy the secure data safely without the option of recovery

3) A the end we could even invalidate such secret if we add "delete" method
(which I have not thought about but I think it makes sense)..

*Proposal*

I want the same capabilities as we have now with KMSHook to be available in
new hooks: VaultHook, GCPSeceretManagerHook, AWSSecretManagerHook. So that
they can also be used as Hooks by Airflow to access (both read and write)
any kind of secrets there.

I really, really do not see why this is a problem of any sort. I wonder if
others see it as a problem?  Ash, maybe you misunderstood the intention ?

J.

On Tue, May 19, 2020 at 12:05 AM Ash Berlin-Taylor <as...@apache.org> wrote:

> This is why I was asking for a concrete example of where you'd want to
> use this, right now I still can't see what problem you are aiming to
> solve with this proposal.
>
> So I'll ask again Jarek: Do you have a concrete use case that is driving
> you wanting to create SecretManagerHook?
>
> > Straightforward API to be used by the operators.
>
> We already have that, don't we? It's the SecretsBackend API.
>
>
> Would your ask be solved by being able to configure multiple secrets
> backends rather than a just a single one?
>
>
> > hook = SecretManagerHook(conn_id = conn_id)
> > hook.decrypt(key="KEY")
>
> I don't think any of the  secrets backends support encrypting/decrypting
> values, did you mean `hook.get_secret` here?
>
> A counter proposal: Just use Variables within the operator. There is
> essentially no difference between "a secret" and "a variable", and
> doesn't introduce a whole new concept to Airflow.
>
>
> The advantage of Hooks is they are know how to ask for a connection ID
> to find their credentials.
>
> But for a secrets backend/hook that all gets very self-referential. So
> to use KMSHook does that mean I need to configure the secrets backend
> (to look up connections) _and_ create a connection for itself in KMS so
> the KMSHook and connect to it?
>
> That is my major complaint I think. It strikes me as a very messy API
> that is prone to user confusion and hard to document.
>
>
> I do have one ask though if we do go down this route: that we don't end
> up duplicating code to speak to the Secrets providers (i.e. in hooks and
> in secrets backends) - it should live in one place only. (I'm sure you'd
> do this anyway, I just wanted to state it)
>
> -ash
>
>
> On May 18 2020, at 10:42 pm, Jarek Potiuk <Ja...@polidea.com>
> wrote:
>
> > On Mon, May 18, 2020 at 11:17 PM Ash Berlin-Taylor <as...@apache.org>
> wrote:
> >
> >> > GCP Secret Manager Hook, Vault Hook, KMS Hook, AWS Secret Hook
> >>
> >> Why do we even need Hooks for those? Why can't we use the existing GCP
> >> Secret Manager class inside a custom operator? What does creating a hook
> >> give us?
> >>
> >
> > The same as all the other hooks. Common way to authenticate to the
> service
> > (using Airflow Connection mechanism). Straightforward API to be used
> > by the
> > operators.
> >
> > Now that Kaxil mentioned it - This is exactly what KMS hooks gives - it
> > cause already defined
> > connection id in Airflow DB to authenticate to KMS and encrypt/decrypt
> > secret. Please take a look there:
> >
> >
> https://github.com/apache/airflow/blob/master/airflow/providers/google/cloud/hooks/kms.py
> >
> > Then in operator I'd use it like:
> >
> > hook = KMSHook(conn_id = conn_id)
> > hook.decrypt(key="KEY", cipher ="") (the cipher part is KMS-specific)..
> > or
> >
> > hook = SecretManagerHook(conn_id = conn_id)
> > hook.decrypt(key="KEY")
> >
> > Which is substantially easier than handling all the
> > authentication/credential options (for example in GCP case it handles all
> > the different forms of authentication 0 like variables, json file,
> > connection-encoded-credentials) out of the box. This is the very same
> > reason pretty much any hook exists in Airflow. The only difference for
> > secrets is that it makes no sense to write operators for them because of
> > having to pass decrypted secrets via XCom.
> >
> > From the very beginning of this conversation, I was surprised it is at
> all
> > a problem for anyone.
> >
> > I never intended to make any shared service, I just wanted to implement
> > what Kaxil described - separate hooks for all the secrets - same as any
> > other service.
> > I am quite surprised it is a problem for anyone (now knowing that KMS
> Hook
> > already exist at all makes it even more surprising).
> > J.
> >
> >
> >
> >>
> >> -a
> >>
> >> On May 18 2020, at 9:50 pm, Jarek Potiuk <Ja...@polidea.com>
> wrote:
> >>
> >> >> Are we all talking about different things ?
> >> >
> >> > Good point. I think that's the main source of confusion here and we
> >> > think about different things.
> >> >
> >> >> So what I feel that the use case that Nathan defined can just be
> >> >> solved a
> >> >> VaultHook & VaultOperator for example.
> >> >
> >> > That's what I was talking (from the beginning - maybe it was not
> >> > clear) about separate hooks for each service. Not a shared one. GCP
> >> > Secret Manager Hook, Vault Hook,  KMS Hook, AWS Secret Hook - all of
> >> > them separate, in different providers, and simple hooks to be used by
> >> > whoever wants to use them in their custom operators.
> >> >
> >> > We also talked about implementing operators, but there is very little
> >> > use of generic Operators for secrets. Mainly because the only way
> >> > operators can pass anything to other operators (tasks) is via xcom
> >> > which would make the secrets stored plain text in the database. That
> >> > is rather bad I am afraid. Having Hooks make them instantiatable in
> >> > the context of running tasks, use Fernet to decrypt credentials from
> >> > the Connection DB, request to retrieve secret from the backend and
> >> > pass the unencrypted secret to the other parts of the operator - all
> >> > in the context of a single worker/task.
> >> >
> >> >>
> >> >> This should not be confused with "Secrets" at all. Why do we need to
> >> create
> >> >> a generic Hooks for all Secrets Backend?
> >> >
> >> > No generic hooks :). I never meant it to be generic.Maybe that's a
> >> > confusion there - I wanted to implement a separate hook for every type
> >> > of backend.
> >> >
> >> >> Consider we use PostgreSQL for backend and the connection is
> >> defined in
> >> >> airflow.cfg. Now you can still use the MySQLHook and PostgresHook
> >> >> independently to connect to those Databases, correct.
> >> >>
> >> >> But they both should not be confused to be using anything "shared".
> >> >
> >> > No plans for that whatsoever.
> >> >
> >> >> The proposal if I interpret correctly talks about the following:
> >> >>
> >> >> We have an idea that we might want also (on top of the above
> >> SecretManager
> >> >> > implementation) define generic Hooks for accessing secrets from
> those
> >> >> > services (just generic secrets, not connection, variables). Simply
> >> treat
> >> >> > each of the backends above as another "provider" and create a
> >> Hook to
> >> >> > access the service. Such Hook could have just one method:
> >> >> > def get_secret(self, path_prefix: str, secret_id: str) ->
> >> Optional[str]
> >> >> > It would use a connection defined (as usual) in ENV variables or
> >> database
> >> >> > of Airflow to authenticate with the secret service and retrieve the
> >> >> > secrets.
> >> >
> >> > OK. maybe confusion is about 'generic' . My "generic" was ("no
> >> > connections, no variables") - just retrieve "generic" secret. Separate
> >> > implementation for Hashicorp Vault, Separate for Secret Manager, etc.
> >> >
> >> >> The connection can be defined in The Secrets backend. To make it
> >> clearer,
> >> >> "Vault" in Nathan's case is a "Service" and has nothing to do with
> >> >> SecretsBackend similar to how PostgresHook or MySQLHook has nothing
> >> >> to do
> >> >> with using Postgres as Airflow MetadataDB Backend.
> >> >>
> >> >> Another example is Google KMS, there is already Hook for Google
> >> KMS (
> >> >>
> >>
> https://github.com/apache/airflow/blob/master/airflow/providers/google/cloud/hooks/kms.py
> >> )
> >> >> and an Operator can be created. Same can be done for Google Secrets
> >> Manager
> >> >> and Hashicorp Vault, in which cases all of these are "Services".
> >> >
> >> > That's exactly what I plan to implement. As explained above - Operator
> >> > for secrets makes no sense because it would have to pass the secrets
> >> > via xcom :(. I did not even check that we already have KMS hook. I was
> >> > mostly about Vault and Secret Manager and AWS Secret Manager. Knowing
> >> > that we have KMS makes it even easier :).
> >> >
> >> >> We could create SecretsHook similar to DbApiHook (
> >> >>
> >>
> https://github.com/apache/airflow/blob/master/airflow/hooks/dbapi_hook.py)
> >> >> if we want to just define the single *get_secret* method you talked
> >> about.
> >> >
> >> > I don't even plan that in fact, I thought about implementing several
> >> > totally independent Hooks for each of the Backend Secrets.
> >> >
> >> >> The concept of "Secrets Backend" is to allow Managing of "Secrets
> >> >> used in
> >> >> Airflow" (Either to connect to an external system or Variables) in
> >> actual
> >> >> Secret Management Tools.
> >> >>
> >> >
> >> > Yeah. I do not - at all - want to mess with that :)
> >> >
> >> >>
> >> >> *Pros:*
> >> >> >  And I
> >> >> > well imagine this might be actually even more convenient to
> configure
> >> >> > connections in the DB and access secrets this way rather than
> >> >> having to
> >> >> > configure Secret Backends in Airflow configuration.
> >> >>
> >> >> This is exactly where both "Secrets" and the "Service" terms are
> >> >> mixed I
> >> >> think. Again echoing what I said above : The concept of "Secrets
> >> Backend"
> >> >> is to allow Managing of "Secrets used in Airflow".
> >> >> The Secrets Backend is so that you don't need to store secrets in
> >> Airflow
> >> >> Metadata DB whether they can encrypted or not as there are tools that
> >> are
> >> >> specifically designed to handle "Secrets, rotation of secrets etc".
> >> Having
> >> >> the Hook and Operator to talk to the Service should be separate.
> >> >
> >> > Full agreement - I do not want to intermix those. It was always
> >> > thought as per-provider implementation of traditional "Hook".
> >> >
> >> >>
> >> >> * Another benefit of it is that it would allow people still stuck
> >> on pre
> >> >> > 1.10.10 to  write custom operators that would like to use secret
> >> backends
> >> >> > (via backport operators). And still continue doing it in the future
> >> >> > (possibly migrating to 2.0/1.10.10+ in cases when there is one
> secret
> >> >> > backed only - but continue ot use connections/hooks where some
> >> specific
> >> >> > secrets shoudl be kept in different secret backend.
> >> >>
> >> >>
> >> >> What is the objective here: (1) is it to interact with those Services
> >> >> (Vault or Secrets Manager etc) or (2) Get Airflow Connections and
> >> Variables
> >> >> from different Secrets Backend
> >> >
> >> > Just to interact with it - no plans at all to get Airflow Connections
> >> > nor Variables.
> >> >
> >> >>
> >> >> Regards,
> >> >> Kaxil
> >> >>
> >> >
> >>
> >
> >
> > --
> >
> > Jarek Potiuk
> >
>


-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Re: [PROPOSAL] Secret Backend Hooks

Posted by Ash Berlin-Taylor <as...@apache.org>.
This is why I was asking for a concrete example of where you'd want to
use this, right now I still can't see what problem you are aiming to
solve with this proposal.

So I'll ask again Jarek: Do you have a concrete use case that is driving
you wanting to create SecretManagerHook? 

> Straightforward API to be used by the operators.

We already have that, don't we? It's the SecretsBackend API.


Would your ask be solved by being able to configure multiple secrets
backends rather than a just a single one?


> hook = SecretManagerHook(conn_id = conn_id)
> hook.decrypt(key="KEY")

I don't think any of the  secrets backends support encrypting/decrypting
values, did you mean `hook.get_secret` here?

A counter proposal: Just use Variables within the operator. There is
essentially no difference between "a secret" and "a variable", and
doesn't introduce a whole new concept to Airflow.


The advantage of Hooks is they are know how to ask for a connection ID
to find their credentials.

But for a secrets backend/hook that all gets very self-referential. So
to use KMSHook does that mean I need to configure the secrets backend
(to look up connections) _and_ create a connection for itself in KMS so
the KMSHook and connect to it?

That is my major complaint I think. It strikes me as a very messy API
that is prone to user confusion and hard to document.


I do have one ask though if we do go down this route: that we don't end
up duplicating code to speak to the Secrets providers (i.e. in hooks and
in secrets backends) - it should live in one place only. (I'm sure you'd
do this anyway, I just wanted to state it)

-ash


On May 18 2020, at 10:42 pm, Jarek Potiuk <Ja...@polidea.com> wrote:

> On Mon, May 18, 2020 at 11:17 PM Ash Berlin-Taylor <as...@apache.org> wrote:
> 
>> > GCP Secret Manager Hook, Vault Hook, KMS Hook, AWS Secret Hook
>> 
>> Why do we even need Hooks for those? Why can't we use the existing GCP
>> Secret Manager class inside a custom operator? What does creating a hook
>> give us?
>> 
> 
> The same as all the other hooks. Common way to authenticate to the service
> (using Airflow Connection mechanism). Straightforward API to be used
> by the
> operators.
> 
> Now that Kaxil mentioned it - This is exactly what KMS hooks gives - it
> cause already defined
> connection id in Airflow DB to authenticate to KMS and encrypt/decrypt
> secret. Please take a look there:
> 
> https://github.com/apache/airflow/blob/master/airflow/providers/google/cloud/hooks/kms.py
> 
> Then in operator I'd use it like:
> 
> hook = KMSHook(conn_id = conn_id)
> hook.decrypt(key="KEY", cipher ="") (the cipher part is KMS-specific)..
> or
> 
> hook = SecretManagerHook(conn_id = conn_id)
> hook.decrypt(key="KEY")
> 
> Which is substantially easier than handling all the
> authentication/credential options (for example in GCP case it handles all
> the different forms of authentication 0 like variables, json file,
> connection-encoded-credentials) out of the box. This is the very same
> reason pretty much any hook exists in Airflow. The only difference for
> secrets is that it makes no sense to write operators for them because of
> having to pass decrypted secrets via XCom.
> 
> From the very beginning of this conversation, I was surprised it is at all
> a problem for anyone.
> 
> I never intended to make any shared service, I just wanted to implement
> what Kaxil described - separate hooks for all the secrets - same as any
> other service.
> I am quite surprised it is a problem for anyone (now knowing that KMS Hook
> already exist at all makes it even more surprising).
> J.
> 
> 
> 
>> 
>> -a
>> 
>> On May 18 2020, at 9:50 pm, Jarek Potiuk <Ja...@polidea.com> wrote:
>> 
>> >> Are we all talking about different things ?
>> >
>> > Good point. I think that's the main source of confusion here and we
>> > think about different things.
>> >
>> >> So what I feel that the use case that Nathan defined can just be
>> >> solved a
>> >> VaultHook & VaultOperator for example.
>> >
>> > That's what I was talking (from the beginning - maybe it was not
>> > clear) about separate hooks for each service. Not a shared one. GCP
>> > Secret Manager Hook, Vault Hook,  KMS Hook, AWS Secret Hook - all of
>> > them separate, in different providers, and simple hooks to be used by
>> > whoever wants to use them in their custom operators.
>> >
>> > We also talked about implementing operators, but there is very little
>> > use of generic Operators for secrets. Mainly because the only way
>> > operators can pass anything to other operators (tasks) is via xcom
>> > which would make the secrets stored plain text in the database. That
>> > is rather bad I am afraid. Having Hooks make them instantiatable in
>> > the context of running tasks, use Fernet to decrypt credentials from
>> > the Connection DB, request to retrieve secret from the backend and
>> > pass the unencrypted secret to the other parts of the operator - all
>> > in the context of a single worker/task.
>> >
>> >>
>> >> This should not be confused with "Secrets" at all. Why do we need to
>> create
>> >> a generic Hooks for all Secrets Backend?
>> >
>> > No generic hooks :). I never meant it to be generic.Maybe that's a
>> > confusion there - I wanted to implement a separate hook for every type
>> > of backend.
>> >
>> >> Consider we use PostgreSQL for backend and the connection is
>> defined in
>> >> airflow.cfg. Now you can still use the MySQLHook and PostgresHook
>> >> independently to connect to those Databases, correct.
>> >>
>> >> But they both should not be confused to be using anything "shared".
>> >
>> > No plans for that whatsoever.
>> >
>> >> The proposal if I interpret correctly talks about the following:
>> >>
>> >> We have an idea that we might want also (on top of the above
>> SecretManager
>> >> > implementation) define generic Hooks for accessing secrets from those
>> >> > services (just generic secrets, not connection, variables). Simply
>> treat
>> >> > each of the backends above as another "provider" and create a
>> Hook to
>> >> > access the service. Such Hook could have just one method:
>> >> > def get_secret(self, path_prefix: str, secret_id: str) ->
>> Optional[str]
>> >> > It would use a connection defined (as usual) in ENV variables or
>> database
>> >> > of Airflow to authenticate with the secret service and retrieve the
>> >> > secrets.
>> >
>> > OK. maybe confusion is about 'generic' . My "generic" was ("no
>> > connections, no variables") - just retrieve "generic" secret. Separate
>> > implementation for Hashicorp Vault, Separate for Secret Manager, etc.
>> >
>> >> The connection can be defined in The Secrets backend. To make it
>> clearer,
>> >> "Vault" in Nathan's case is a "Service" and has nothing to do with
>> >> SecretsBackend similar to how PostgresHook or MySQLHook has nothing
>> >> to do
>> >> with using Postgres as Airflow MetadataDB Backend.
>> >>
>> >> Another example is Google KMS, there is already Hook for Google
>> KMS (
>> >>
>> https://github.com/apache/airflow/blob/master/airflow/providers/google/cloud/hooks/kms.py
>> )
>> >> and an Operator can be created. Same can be done for Google Secrets
>> Manager
>> >> and Hashicorp Vault, in which cases all of these are "Services".
>> >
>> > That's exactly what I plan to implement. As explained above - Operator
>> > for secrets makes no sense because it would have to pass the secrets
>> > via xcom :(. I did not even check that we already have KMS hook. I was
>> > mostly about Vault and Secret Manager and AWS Secret Manager. Knowing
>> > that we have KMS makes it even easier :).
>> >
>> >> We could create SecretsHook similar to DbApiHook (
>> >>
>> https://github.com/apache/airflow/blob/master/airflow/hooks/dbapi_hook.py)
>> >> if we want to just define the single *get_secret* method you talked
>> about.
>> >
>> > I don't even plan that in fact, I thought about implementing several
>> > totally independent Hooks for each of the Backend Secrets.
>> >
>> >> The concept of "Secrets Backend" is to allow Managing of "Secrets
>> >> used in
>> >> Airflow" (Either to connect to an external system or Variables) in
>> actual
>> >> Secret Management Tools.
>> >>
>> >
>> > Yeah. I do not - at all - want to mess with that :)
>> >
>> >>
>> >> *Pros:*
>> >> >  And I
>> >> > well imagine this might be actually even more convenient to configure
>> >> > connections in the DB and access secrets this way rather than
>> >> having to
>> >> > configure Secret Backends in Airflow configuration.
>> >>
>> >> This is exactly where both "Secrets" and the "Service" terms are
>> >> mixed I
>> >> think. Again echoing what I said above : The concept of "Secrets
>> Backend"
>> >> is to allow Managing of "Secrets used in Airflow".
>> >> The Secrets Backend is so that you don't need to store secrets in
>> Airflow
>> >> Metadata DB whether they can encrypted or not as there are tools that
>> are
>> >> specifically designed to handle "Secrets, rotation of secrets etc".
>> Having
>> >> the Hook and Operator to talk to the Service should be separate.
>> >
>> > Full agreement - I do not want to intermix those. It was always
>> > thought as per-provider implementation of traditional "Hook".
>> >
>> >>
>> >> * Another benefit of it is that it would allow people still stuck
>> on pre
>> >> > 1.10.10 to  write custom operators that would like to use secret
>> backends
>> >> > (via backport operators). And still continue doing it in the future
>> >> > (possibly migrating to 2.0/1.10.10+ in cases when there is one secret
>> >> > backed only - but continue ot use connections/hooks where some
>> specific
>> >> > secrets shoudl be kept in different secret backend.
>> >>
>> >>
>> >> What is the objective here: (1) is it to interact with those Services
>> >> (Vault or Secrets Manager etc) or (2) Get Airflow Connections and
>> Variables
>> >> from different Secrets Backend
>> >
>> > Just to interact with it - no plans at all to get Airflow Connections
>> > nor Variables.
>> >
>> >>
>> >> Regards,
>> >> Kaxil
>> >>
>> >
>> 
> 
> 
> -- 
> 
> Jarek Potiuk
> 

Re: [PROPOSAL] Secret Backend Hooks

Posted by Jarek Potiuk <Ja...@polidea.com>.
On Mon, May 18, 2020 at 11:17 PM Ash Berlin-Taylor <as...@apache.org> wrote:

> > GCP Secret Manager Hook, Vault Hook, KMS Hook, AWS Secret Hook
>
> Why do we even need Hooks for those? Why can't we use the existing GCP
> Secret Manager class inside a custom operator? What does creating a hook
> give us?
>

The same as all the other hooks. Common way to authenticate to the service
(using Airflow Connection mechanism). Straightforward API to be used by the
operators.

Now that Kaxil mentioned it - This is exactly what KMS hooks gives - it
cause already defined
connection id in Airflow DB to authenticate to KMS and encrypt/decrypt
secret. Please take a look there:

https://github.com/apache/airflow/blob/master/airflow/providers/google/cloud/hooks/kms.py

Then in operator I'd use it like:

hook = KMSHook(conn_id = conn_id)
hook.decrypt(key="KEY", cipher ="") (the cipher part is KMS-specific)..
or

hook = SecretManagerHook(conn_id = conn_id)
hook.decrypt(key="KEY")

Which is substantially easier than handling all the
authentication/credential options (for example in GCP case it handles all
the different forms of authentication 0 like variables, json file,
connection-encoded-credentials) out of the box. This is the very same
reason pretty much any hook exists in Airflow. The only difference for
secrets is that it makes no sense to write operators for them because of
having to pass decrypted secrets via XCom.

From the very beginning of this conversation, I was surprised it is at all
a problem for anyone.

I never intended to make any shared service, I just wanted to implement
what Kaxil described - separate hooks for all the secrets - same as any
other service.
I am quite surprised it is a problem for anyone (now knowing that KMS Hook
already exist at all makes it even more surprising).
J.



>
> -a
>
> On May 18 2020, at 9:50 pm, Jarek Potiuk <Ja...@polidea.com> wrote:
>
> >> Are we all talking about different things ?
> >
> > Good point. I think that's the main source of confusion here and we
> > think about different things.
> >
> >> So what I feel that the use case that Nathan defined can just be
> >> solved a
> >> VaultHook & VaultOperator for example.
> >
> > That's what I was talking (from the beginning - maybe it was not
> > clear) about separate hooks for each service. Not a shared one. GCP
> > Secret Manager Hook, Vault Hook,  KMS Hook, AWS Secret Hook - all of
> > them separate, in different providers, and simple hooks to be used by
> > whoever wants to use them in their custom operators.
> >
> > We also talked about implementing operators, but there is very little
> > use of generic Operators for secrets. Mainly because the only way
> > operators can pass anything to other operators (tasks) is via xcom
> > which would make the secrets stored plain text in the database. That
> > is rather bad I am afraid. Having Hooks make them instantiatable in
> > the context of running tasks, use Fernet to decrypt credentials from
> > the Connection DB, request to retrieve secret from the backend and
> > pass the unencrypted secret to the other parts of the operator - all
> > in the context of a single worker/task.
> >
> >>
> >> This should not be confused with "Secrets" at all. Why do we need to
> create
> >> a generic Hooks for all Secrets Backend?
> >
> > No generic hooks :). I never meant it to be generic.Maybe that's a
> > confusion there - I wanted to implement a separate hook for every type
> > of backend.
> >
> >> Consider we use PostgreSQL for backend and the connection is defined in
> >> airflow.cfg. Now you can still use the MySQLHook and PostgresHook
> >> independently to connect to those Databases, correct.
> >>
> >> But they both should not be confused to be using anything "shared".
> >
> > No plans for that whatsoever.
> >
> >> The proposal if I interpret correctly talks about the following:
> >>
> >> We have an idea that we might want also (on top of the above
> SecretManager
> >> > implementation) define generic Hooks for accessing secrets from those
> >> > services (just generic secrets, not connection, variables). Simply
> treat
> >> > each of the backends above as another "provider" and create a Hook to
> >> > access the service. Such Hook could have just one method:
> >> > def get_secret(self, path_prefix: str, secret_id: str) ->
> Optional[str]
> >> > It would use a connection defined (as usual) in ENV variables or
> database
> >> > of Airflow to authenticate with the secret service and retrieve the
> >> > secrets.
> >
> > OK. maybe confusion is about 'generic' . My "generic" was ("no
> > connections, no variables") - just retrieve "generic" secret. Separate
> > implementation for Hashicorp Vault, Separate for Secret Manager, etc.
> >
> >> The connection can be defined in The Secrets backend. To make it
> clearer,
> >> "Vault" in Nathan's case is a "Service" and has nothing to do with
> >> SecretsBackend similar to how PostgresHook or MySQLHook has nothing
> >> to do
> >> with using Postgres as Airflow MetadataDB Backend.
> >>
> >> Another example is Google KMS, there is already Hook for Google KMS (
> >>
> https://github.com/apache/airflow/blob/master/airflow/providers/google/cloud/hooks/kms.py
> )
> >> and an Operator can be created. Same can be done for Google Secrets
> Manager
> >> and Hashicorp Vault, in which cases all of these are "Services".
> >
> > That's exactly what I plan to implement. As explained above - Operator
> > for secrets makes no sense because it would have to pass the secrets
> > via xcom :(. I did not even check that we already have KMS hook. I was
> > mostly about Vault and Secret Manager and AWS Secret Manager. Knowing
> > that we have KMS makes it even easier :).
> >
> >> We could create SecretsHook similar to DbApiHook (
> >>
> https://github.com/apache/airflow/blob/master/airflow/hooks/dbapi_hook.py)
> >> if we want to just define the single *get_secret* method you talked
> about.
> >
> > I don't even plan that in fact, I thought about implementing several
> > totally independent Hooks for each of the Backend Secrets.
> >
> >> The concept of "Secrets Backend" is to allow Managing of "Secrets
> >> used in
> >> Airflow" (Either to connect to an external system or Variables) in
> actual
> >> Secret Management Tools.
> >>
> >
> > Yeah. I do not - at all - want to mess with that :)
> >
> >>
> >> *Pros:*
> >> >  And I
> >> > well imagine this might be actually even more convenient to configure
> >> > connections in the DB and access secrets this way rather than
> >> having to
> >> > configure Secret Backends in Airflow configuration.
> >>
> >> This is exactly where both "Secrets" and the "Service" terms are
> >> mixed I
> >> think. Again echoing what I said above : The concept of "Secrets
> Backend"
> >> is to allow Managing of "Secrets used in Airflow".
> >> The Secrets Backend is so that you don't need to store secrets in
> Airflow
> >> Metadata DB whether they can encrypted or not as there are tools that
> are
> >> specifically designed to handle "Secrets, rotation of secrets etc".
> Having
> >> the Hook and Operator to talk to the Service should be separate.
> >
> > Full agreement - I do not want to intermix those. It was always
> > thought as per-provider implementation of traditional "Hook".
> >
> >>
> >> * Another benefit of it is that it would allow people still stuck on pre
> >> > 1.10.10 to  write custom operators that would like to use secret
> backends
> >> > (via backport operators). And still continue doing it in the future
> >> > (possibly migrating to 2.0/1.10.10+ in cases when there is one secret
> >> > backed only - but continue ot use connections/hooks where some
> specific
> >> > secrets shoudl be kept in different secret backend.
> >>
> >>
> >> What is the objective here: (1) is it to interact with those Services
> >> (Vault or Secrets Manager etc) or (2) Get Airflow Connections and
> Variables
> >> from different Secrets Backend
> >
> > Just to interact with it - no plans at all to get Airflow Connections
> > nor Variables.
> >
> >>
> >> Regards,
> >> Kaxil
> >>
> >
>


-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Re: [PROPOSAL] Secret Backend Hooks

Posted by Ash Berlin-Taylor <as...@apache.org>.
> GCP Secret Manager Hook, Vault Hook, KMS Hook, AWS Secret Hook

Why do we even need Hooks for those? Why can't we use the existing GCP
Secret Manager class inside a custom operator? What does creating a hook
give us?

-a

On May 18 2020, at 9:50 pm, Jarek Potiuk <Ja...@polidea.com> wrote:

>> Are we all talking about different things ?
> 
> Good point. I think that's the main source of confusion here and we
> think about different things.
> 
>> So what I feel that the use case that Nathan defined can just be
>> solved a
>> VaultHook & VaultOperator for example.
> 
> That's what I was talking (from the beginning - maybe it was not
> clear) about separate hooks for each service. Not a shared one. GCP
> Secret Manager Hook, Vault Hook,  KMS Hook, AWS Secret Hook - all of
> them separate, in different providers, and simple hooks to be used by
> whoever wants to use them in their custom operators.
> 
> We also talked about implementing operators, but there is very little
> use of generic Operators for secrets. Mainly because the only way
> operators can pass anything to other operators (tasks) is via xcom
> which would make the secrets stored plain text in the database. That
> is rather bad I am afraid. Having Hooks make them instantiatable in
> the context of running tasks, use Fernet to decrypt credentials from
> the Connection DB, request to retrieve secret from the backend and
> pass the unencrypted secret to the other parts of the operator - all
> in the context of a single worker/task.
> 
>> 
>> This should not be confused with "Secrets" at all. Why do we need to create
>> a generic Hooks for all Secrets Backend?
> 
> No generic hooks :). I never meant it to be generic.Maybe that's a
> confusion there - I wanted to implement a separate hook for every type
> of backend.
> 
>> Consider we use PostgreSQL for backend and the connection is defined in
>> airflow.cfg. Now you can still use the MySQLHook and PostgresHook
>> independently to connect to those Databases, correct.
>> 
>> But they both should not be confused to be using anything "shared".
> 
> No plans for that whatsoever.
> 
>> The proposal if I interpret correctly talks about the following:
>> 
>> We have an idea that we might want also (on top of the above SecretManager
>> > implementation) define generic Hooks for accessing secrets from those
>> > services (just generic secrets, not connection, variables). Simply treat
>> > each of the backends above as another "provider" and create a Hook to
>> > access the service. Such Hook could have just one method:
>> > def get_secret(self, path_prefix: str, secret_id: str) -> Optional[str]
>> > It would use a connection defined (as usual) in ENV variables or database
>> > of Airflow to authenticate with the secret service and retrieve the
>> > secrets.
> 
> OK. maybe confusion is about 'generic' . My "generic" was ("no
> connections, no variables") - just retrieve "generic" secret. Separate
> implementation for Hashicorp Vault, Separate for Secret Manager, etc.
> 
>> The connection can be defined in The Secrets backend. To make it clearer,
>> "Vault" in Nathan's case is a "Service" and has nothing to do with
>> SecretsBackend similar to how PostgresHook or MySQLHook has nothing
>> to do
>> with using Postgres as Airflow MetadataDB Backend.
>> 
>> Another example is Google KMS, there is already Hook for Google KMS (
>> https://github.com/apache/airflow/blob/master/airflow/providers/google/cloud/hooks/kms.py)
>> and an Operator can be created. Same can be done for Google Secrets Manager
>> and Hashicorp Vault, in which cases all of these are "Services".
> 
> That's exactly what I plan to implement. As explained above - Operator
> for secrets makes no sense because it would have to pass the secrets
> via xcom :(. I did not even check that we already have KMS hook. I was
> mostly about Vault and Secret Manager and AWS Secret Manager. Knowing
> that we have KMS makes it even easier :).
> 
>> We could create SecretsHook similar to DbApiHook (
>> https://github.com/apache/airflow/blob/master/airflow/hooks/dbapi_hook.py)
>> if we want to just define the single *get_secret* method you talked about.
> 
> I don't even plan that in fact, I thought about implementing several
> totally independent Hooks for each of the Backend Secrets.
> 
>> The concept of "Secrets Backend" is to allow Managing of "Secrets
>> used in
>> Airflow" (Either to connect to an external system or Variables) in actual
>> Secret Management Tools.
>> 
> 
> Yeah. I do not - at all - want to mess with that :)
> 
>> 
>> *Pros:*
>> >  And I
>> > well imagine this might be actually even more convenient to configure
>> > connections in the DB and access secrets this way rather than
>> having to
>> > configure Secret Backends in Airflow configuration.
>> 
>> This is exactly where both "Secrets" and the "Service" terms are
>> mixed I
>> think. Again echoing what I said above : The concept of "Secrets Backend"
>> is to allow Managing of "Secrets used in Airflow".
>> The Secrets Backend is so that you don't need to store secrets in Airflow
>> Metadata DB whether they can encrypted or not as there are tools that are
>> specifically designed to handle "Secrets, rotation of secrets etc". Having
>> the Hook and Operator to talk to the Service should be separate.
> 
> Full agreement - I do not want to intermix those. It was always
> thought as per-provider implementation of traditional "Hook".
> 
>> 
>> * Another benefit of it is that it would allow people still stuck on pre
>> > 1.10.10 to  write custom operators that would like to use secret backends
>> > (via backport operators). And still continue doing it in the future
>> > (possibly migrating to 2.0/1.10.10+ in cases when there is one secret
>> > backed only - but continue ot use connections/hooks where some specific
>> > secrets shoudl be kept in different secret backend.
>> 
>> 
>> What is the objective here: (1) is it to interact with those Services
>> (Vault or Secrets Manager etc) or (2) Get Airflow Connections and Variables
>> from different Secrets Backend
> 
> Just to interact with it - no plans at all to get Airflow Connections
> nor Variables.
> 
>> 
>> Regards,
>> Kaxil
>> 
> 

Re: [PROPOSAL] Secret Backend Hooks

Posted by Jarek Potiuk <Ja...@polidea.com>.
> Are we all talking about different things ?

Good point. I think that's the main source of confusion here and we
think about different things.

> So what I feel that the use case that Nathan defined can just be solved a
> VaultHook & VaultOperator for example.

That's what I was talking (from the beginning - maybe it was not
clear) about separate hooks for each service. Not a shared one. GCP
Secret Manager Hook, Vault Hook,  KMS Hook, AWS Secret Hook - all of
them separate, in different providers, and simple hooks to be used by
whoever wants to use them in their custom operators.

We also talked about implementing operators, but there is very little
use of generic Operators for secrets. Mainly because the only way
operators can pass anything to other operators (tasks) is via xcom
which would make the secrets stored plain text in the database. That
is rather bad I am afraid. Having Hooks make them instantiatable in
the context of running tasks, use Fernet to decrypt credentials from
the Connection DB, request to retrieve secret from the backend and
pass the unencrypted secret to the other parts of the operator - all
in the context of a single worker/task.

>
> This should not be confused with "Secrets" at all. Why do we need to create
> a generic Hooks for all Secrets Backend?

No generic hooks :). I never meant it to be generic.Maybe that's a
confusion there - I wanted to implement a separate hook for every type
of backend.

> Consider we use PostgreSQL for backend and the connection is defined in
> airflow.cfg. Now you can still use the MySQLHook and PostgresHook
> independently to connect to those Databases, correct.
>
> But they both should not be confused to be using anything "shared".

No plans for that whatsoever.

> The proposal if I interpret correctly talks about the following:
>
> We have an idea that we might want also (on top of the above SecretManager
> > implementation) define generic Hooks for accessing secrets from those
> > services (just generic secrets, not connection, variables). Simply treat
> > each of the backends above as another "provider" and create a Hook to
> > access the service. Such Hook could have just one method:
> > def get_secret(self, path_prefix: str, secret_id: str) -> Optional[str]
> > It would use a connection defined (as usual) in ENV variables or database
> > of Airflow to authenticate with the secret service and retrieve the
> > secrets.

OK. maybe confusion is about 'generic' . My "generic" was ("no
connections, no variables") - just retrieve "generic" secret. Separate
implementation for Hashicorp Vault, Separate for Secret Manager, etc.

> The connection can be defined in The Secrets backend. To make it clearer,
> "Vault" in Nathan's case is a "Service" and has nothing to do with
> SecretsBackend similar to how PostgresHook or MySQLHook has nothing to do
> with using Postgres as Airflow MetadataDB Backend.
>
> Another example is Google KMS, there is already Hook for Google KMS (
> https://github.com/apache/airflow/blob/master/airflow/providers/google/cloud/hooks/kms.py)
> and an Operator can be created. Same can be done for Google Secrets Manager
> and Hashicorp Vault, in which cases all of these are "Services".

That's exactly what I plan to implement. As explained above - Operator
for secrets makes no sense because it would have to pass the secrets
via xcom :(. I did not even check that we already have KMS hook. I was
mostly about Vault and Secret Manager and AWS Secret Manager. Knowing
that we have KMS makes it even easier :).

> We could create SecretsHook similar to DbApiHook (
> https://github.com/apache/airflow/blob/master/airflow/hooks/dbapi_hook.py)
> if we want to just define the single *get_secret* method you talked about.

I don't even plan that in fact, I thought about implementing several
totally independent Hooks for each of the Backend Secrets.

> The concept of "Secrets Backend" is to allow Managing of "Secrets used in
> Airflow" (Either to connect to an external system or Variables) in actual
> Secret Management Tools.
>

Yeah. I do not - at all - want to mess with that :)

>
> *Pros:*
> >  And I
> > well imagine this might be actually even more convenient to configure
> > connections in the DB and access secrets this way rather than having to
> > configure Secret Backends in Airflow configuration.
>
> This is exactly where both "Secrets" and the "Service" terms are mixed I
> think. Again echoing what I said above : The concept of "Secrets Backend"
> is to allow Managing of "Secrets used in Airflow".
> The Secrets Backend is so that you don't need to store secrets in Airflow
> Metadata DB whether they can encrypted or not as there are tools that are
> specifically designed to handle "Secrets, rotation of secrets etc". Having
> the Hook and Operator to talk to the Service should be separate.

Full agreement - I do not want to intermix those. It was always
thought as per-provider implementation of traditional "Hook".

>
> * Another benefit of it is that it would allow people still stuck on pre
> > 1.10.10 to  write custom operators that would like to use secret backends
> > (via backport operators). And still continue doing it in the future
> > (possibly migrating to 2.0/1.10.10+ in cases when there is one secret
> > backed only - but continue ot use connections/hooks where some specific
> > secrets shoudl be kept in different secret backend.
>
>
> What is the objective here: (1) is it to interact with those Services
> (Vault or Secrets Manager etc) or (2) Get Airflow Connections and Variables
> from different Secrets Backend

Just to interact with it - no plans at all to get Airflow Connections
nor Variables.

>
> Regards,
> Kaxil
>

Re: [PROPOSAL] Secret Backend Hooks

Posted by Kaxil Naik <ka...@gmail.com>.
Are we all talking about different things 😁 ?

So what I feel that the use case that Nathan defined can just be solved a
VaultHook & VaultOperator for example.

This should not be confused with "Secrets" at all. Why do we need to create
a generic Hooks for all Secrets Backend?

Consider we use PostgreSQL for backend and the connection is defined in
airflow.cfg. Now you can still use the MySQLHook and PostgresHook
independently to connect to those Databases, correct.

But they both should not be confused to be using anything "shared".

The proposal if I interpret correctly talks about the following:

We have an idea that we might want also (on top of the above SecretManager
> implementation) define generic Hooks for accessing secrets from those
> services (just generic secrets, not connection, variables). Simply treat
> each of the backends above as another "provider" and create a Hook to
> access the service. Such Hook could have just one method:
> def get_secret(self, path_prefix: str, secret_id: str) -> Optional[str]
> It would use a connection defined (as usual) in ENV variables or database
> of Airflow to authenticate with the secret service and retrieve the
> secrets.


The connection can be defined in The Secrets backend. To make it clearer,
"Vault" in Nathan's case is a "Service" and has nothing to do with
SecretsBackend similar to how PostgresHook or MySQLHook has nothing to do
with using Postgres as Airflow MetadataDB Backend.

Another example is Google KMS, there is already Hook for Google KMS (
https://github.com/apache/airflow/blob/master/airflow/providers/google/cloud/hooks/kms.py)
and an Operator can be created. Same can be done for Google Secrets Manager
and Hashicorp Vault, in which cases all of these are "Services".

We could create SecretsHook similar to DbApiHook (
https://github.com/apache/airflow/blob/master/airflow/hooks/dbapi_hook.py)
if we want to just define the single *get_secret* method you talked about.

The concept of "Secrets Backend" is to allow Managing of "Secrets used in
Airflow" (Either to connect to an external system or Variables) in actual
Secret Management Tools.


*Pros:*
>  And I
> well imagine this might be actually even more convenient to configure
> connections in the DB and access secrets this way rather than having to
> configure Secret Backends in Airflow configuration.


This is exactly where both "Secrets" and the "Service" terms are mixed I
think. Again echoing what I said above : The concept of "Secrets Backend"
is to allow Managing of "Secrets used in Airflow".
The Secrets Backend is so that you don't need to store secrets in Airflow
Metadata DB whether they can encrypted or not as there are tools that are
specifically designed to handle "Secrets, rotation of secrets etc". Having
the Hook and Operator to talk to the Service should be separate.


* Another benefit of it is that it would allow people still stuck on pre
> 1.10.10 to  write custom operators that would like to use secret backends
> (via backport operators). And still continue doing it in the future
> (possibly migrating to 2.0/1.10.10+ in cases when there is one secret
> backed only - but continue ot use connections/hooks where some specific
> secrets shoudl be kept in different secret backend.


What is the objective here: (1) is it to interact with those Services
(Vault or Secrets Manager etc) or (2) Get Airflow Connections and Variables
from different Secrets Backend


Regards,
Kaxil


On Mon, May 18, 2020 at 8:07 PM Jarek Potiuk <Ja...@polidea.com>
wrote:

> Thanks Nathan,
>
> I think your case is really good example where the Hook might be really
> useful (and apparently somebody did it already via Hooks).
>
> I wonder Nathan if you (in the future) switch to secret backend - would you
> use the same secret backend for Airlfow connections/variables? Or do you
> foresee that you will have another backend/credentials to access it?
>
> Maybe others had similar experiences - and would like to share it here?
>
> I still think there is a valid point in having separate hooks. Those are my
> points:
>
> 1) Seems that the use pattern is close to what I described - separe secret
> backend that contains more "dynamic" secrets. And I think still being able
> to used different connections is a nice way of accessing multiple backend
> credentials within Airflow core. I think there was a good reason why only
> one backend is considered for "core" and it really ill-sutied to support
> multiple credential backends. I can hardly imagine reading connections, or
> variables from multiple secret backends. How would you choose which backend
> to use for different variables? Fallback mechanisms? I think it's hardly
> useful.  Hooks on the other hand (via connections) has built in way to
> choose different backends and it's use pattern for custom operators is
> really standard "airflow" way.
>
> 2) Python operator is not the best idea, because you need to provide
> credentials to access secret backend. It can be done - of course - via
> environment variables. but using connection from Airlfow has the additional
> advantage of being encrypted at rest in the database. And with Hooks being
> the common denominator of accessing external services (secret backend being
> one of them) - it can hide all the authorisation and communication details
> from the operators using the hook (this is basically what hook is for).
>
> 3) I have a good parallell here I think.  I would compare my proposal to
> the current way we use Postgres and MySQL hooks vs. using SQLAlchemy for
> Airflow itself. While Airflow uses Postgres and MySQL to provide it's
> internal database, it also has the "postgres" and "MySQL" providers that
> provide hooks that access the database in a "generic" way (and those hooks
> are used by a number of operators). We still can choose various databases
> to connect to via hooks - even if "Airflow core" uses that single database.
>
> J.
>
> On Mon, May 18, 2020 at 5:57 PM Nathan Hadfield <Na...@king.com>
> wrote:
>
> > Yep, I understand.  I wasn't necessarily advocating for a Vault hook;
> just
> > wanted to give some real world colour to the conversation and what we did
> > to solve our needs prior to the secrets backend.
> >
> > I'm sure that extending the class would also enable the same
> functionality.
> >
> > Cheers,
> >
> > Nathan
> >
> > On 18/05/2020, 16:46, "Ash Berlin-Taylor" <as...@apache.org> wrote:
> >
> >     Accessing things that aren't connections or variables is, essentially
> >     creating a third class of thing that Secrets store.
> >
> >     But that is a separate issue to what Jarek is proposing, which is
> > Hooks.
> >
> >     For your use case a Python operator sounds like the best fit. A hook
> is
> >     going to have to target the lowest common denominator, which means
> >     vault-specific things are just a needless layer over the top.
> >
> >     Extending the existing Secrets Backend interface to support that is
> >     doable, but I don't see the need for a Hook. Not everything needs to
> be
> >     a hook :)
> >
> >     -ash
> >
> >
> >     On May 18 2020, at 4:41 pm, Nathan Hadfield <
> Nathan.Hadfield@king.com>
> > wrote:
> >
> >     > Hey,
> >     >
> >     >
> >     >
> >     > My quick two cents are that it would be good to access secrets that
> >     > are not explicitly either connections or variables
> >     >
> >     >
> >     >
> >     > We have a need for DAGs that feature more complex interactions with
> >     > Vault - which typically end up being custom operators - that I
> think
> >     > would be helped by more generic capabilities.
> >     >
> >     >
> >     >
> >     > For example, we have an automated system that regularly rotates GCP
> >     > service accounts across the whole company and stores them in Vault.
> >     > We then have to ensure that our different Looker environments
> always
> >     > have these SAs before the old ones expire every 48 hours.  To do
> > this,
> >     > we wrote a Vault Hook and a Looker Hook and them combine them in an
> >     > operator which would read every SA from a specific Vault path and
> > then
> >     > update the connection inside Looker.
> >     >
> >     >
> >     >
> >     > I don’t know if this will influence your thinking in any way but
> just
> >     > wanted to briefly share our experiences.  If anyone would like to
> >     > learn more then please reach out and I’d be happy to share more.
> >     >
> >     >
> >     >
> >     > Cheers,
> >     >
> >     > Nathan
> >     >
> >     >
> >     >
> >     > On 18/05/2020, 15:21, "Ash Berlin-Taylor" <as...@apache.org> wrote:
> >     >
> >     >
> >     >
> >     >
> >     >
> >     >    > The good thing with it is that you could have easily multiple
> > secret
> >     >
> >     >    > backends configured to retrieve secrets for specific "service"
> > (so
> >     >
> >     >    > that you
> >     >
> >     >    > could keep "generic airflow's secerts" in one backend but
> still
> > have
> >     >
> >     >    > possibility of custom operators to use other backends (with
> > different
> >     >
> >     >    > authentication, scopes etc.).
> >     >
> >     >
> >     >
> >     >    Having the ability to configure multiple secrets backends is
> > independent
> >     >
> >     >    of this feature. The original PR/AIP to add Secrets Backends
> >     > decided to
> >     >
> >     >    leave this ability out as it was more complex to configure. We
> >     > could add
> >     >
> >     >    that back in.
> >     >
> >     >
> >     >
> >     >    I still don't quite get from your example where you are
> proposing
> > this
> >     >
> >     >    would be used? Can you give a fuller example please? Do you
> have a
> >     >
> >     >    concrete use case where you need this?
> >     >
> >     >
> >     >
> >     >    Not everything in Airflow needs to be a hook; just access the
> > secrets
> >     >
> >     >    backend directly. I'm not sure what wrapping an extra layer
> > around these
> >     >
> >     >    classes gives us?
> >     >
> >     >
> >     >
> >     >    Without a concrete example I can't see anything other than this
> >     > adds a
> >     >
> >     >    lot of complexity.
> >     >
> >     >
> >     >
> >     >    -ash
> >     >
> >     >
> >     >
> >     >
> >     >
> >     >    On May 18 2020, at 2:45 pm, Jarek Potiuk <
> > Jarek.Potiuk@polidea.com> wrote:
> >     >
> >     >
> >     >
> >     >    > Hello Everyone,
> >     >
> >     >    >
> >     >
> >     >    > TL;DR; I was just about to start to work on a small set of
> > Hooks -
> >     >
> >     >    > dedicated to retrieving screts from the Secret Backend. I
> >     > discussed it
> >     >
> >     >    > with Ash
> >     >
> >     >    > and Kamil
> >     >
> >     >    >
> >     >
> >     >
> >     > <
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__apache-2Dairflow.slack.com_archives_C0145R4NPS5_p1589805908013700&d=DwICaQ&c=-0jfte1J3SKEE6FyZmTngg&r=cgex0jmJ1tJ3A5nVgQ7Pjo7sdo3NkXzIHPolJPlCwBw&m=AIc65Hls2sR87-APqi_0oh4N0F-NOUCC2ulfyS04GGU&s=NBBItsFcPZR-C26VepQEehBPNPEWUsxar_DatX5ulco&e=
> >     > > on
> >     >
> >     >    > Slack today. So far I thought I treat them as usual providers,
> >     > but Ash
> >     >
> >     >    > raised some valid concenrs. so I wanted to raise teh proposal
> >     > before I
> >     >
> >     >    > start working on it/
> >     >
> >     >    >
> >     >
> >     >    > *Context:*
> >     >
> >     >    >
> >     >
> >     >    > Currently we have "Secret Backend" support built in in 2.0 and
> >     >
> >     >    > 1.10.10+. It
> >     >
> >     >    > includes retrieving the variable and connections (via Secret
> >     > Manager class)
> >     >
> >     >    > for:
> >     >
> >     >    >
> >     >
> >     >    >   -  Hashicorp Vault
> >     >
> >     >    >   -  Secret Manager
> >     >
> >     >    >   -  KMS
> >     >
> >     >    >   -  AWS secret manager
> >     >
> >     >    >
> >     >
> >     >    > Those secret managers are configured in:
> >     >
> >     >    >
> >     >
> >     >    > [secret]
> >     >
> >     >    > backend=<SecretManagerClass>
> >     >
> >     >    > backend_kwargs={}
> >     >
> >     >    >
> >     >
> >     >    > Those are available for use in a nice way (via Jinja templates
> >     > and the
> >     >
> >     >    > like), but they need support in the Core of Airlfow (so
> require
> > 1.10.10+).
> >     >
> >     >    > This means that if you are on pre 1.10.10 you cannot use those
> > secrets.
> >     >
> >     >    > Currently you can only use one secret per whole Airflow
> > installation
> >     >
> >     >    > so if
> >     >
> >     >    > your secrets are split between several secret managers (or if
> >     > secrets for
> >     >
> >     >    > particular service require different credentials) - you cannot
> >     > use the
> >     >
> >     >    > mechanism to access such distributed secrets. It's not often
> >     > case, but I
> >     >
> >     >    > very well imagine it might happen that there are different
> sets
> > of
> >     >
> >     >    > credentials to access different secrets - some services might
> > have
> >     >
> >     >    > different scopes/level of access needed. .
> >     >
> >     >    >
> >     >
> >     >    > *Proposal*
> >     >
> >     >    >
> >     >
> >     >    > We have an idea that we might want also (on top of the above
> > SecretManager
> >     >
> >     >    > implementation) define generic Hooks for accessing secrets
> from
> > those
> >     >
> >     >    > services (just generic secrets, not connection, variables).
> >     > Simply treat
> >     >
> >     >    > each of the backends above as another "provider" and create a
> >     > Hook to
> >     >
> >     >    > access the service. Such Hook could have just one method:
> >     >
> >     >    >
> >     >
> >     >    > def get_secret(self, path_prefix: str, secret_id: str) ->
> > Optional[str]
> >     >
> >     >    >
> >     >
> >     >    > It would use a connection defined (as usual) in ENV variables
> > or database
> >     >
> >     >    > of Airflow to authenticate with the secret service and
> retrieve
> > the
> >     >
> >     >    > secrets.
> >     >
> >     >    >
> >     >
> >     >    > The good thing with it is that you could have easily multiple
> > secret
> >     >
> >     >    > backends configured to retrieve secrets for specific "service"
> > (so
> >     >
> >     >    > that you
> >     >
> >     >    > could keep "generic airflow's secerts" in one backend but
> still
> > have
> >     >
> >     >    > possibility of custom operators to use other backends (with
> > different
> >     >
> >     >    > authentication,  scopes etc.). And it is not touching any of
> the
> >     >
> >     >    > "core" of
> >     >
> >     >    > Airflow. It's just a set of hooks with corresponding
> connections
> >     > that work
> >     >
> >     >    > the same way as accessing any other provider in Airflow. No
> core
> >     > of Airflow
> >     >
> >     >    > will be touched with this change.
> >     >
> >     >    >
> >     >
> >     >    > *Pros/Cons*
> >     >
> >     >    >
> >     >
> >     >    > *Con:*
> >     >
> >     >    >
> >     >
> >     >    > I do realise it is a bit of duplication in functionality. We
> > already
> >     >
> >     >    > have a
> >     >
> >     >    > way to connect to a secret backend via airflow configuration
> and
> >     > we should
> >     >
> >     >    > likely promote it rather than introduce additional mechanism.
> >     >
> >     >    >
> >     >
> >     >    > *Pros:*
> >     >
> >     >    >
> >     >
> >     >    > * Most of all -> it adds flexibility of accessing several
> > secret backends
> >     >
> >     >    > for different use-cases. I looked at it so far in the way
> those
> >     > hooks are
> >     >
> >     >    > merely another set of "provider hooks". For me this is nothing
> > different
> >     >
> >     >    > than "providers" for any other services we have.  fFr example
> > "cloudant"
> >     >
> >     >    > provider has only "CloudantHook" that other custom operators
> > can use.
> >     >
> >     >    > And I
> >     >
> >     >    > well imagine this might be actually even more convenient to
> > configure
> >     >
> >     >    > connections in the DB and access secrets this way rather than
> >     > having to
> >     >
> >     >    > configure Secret Backends in Airflow configuration.
> >     >
> >     >    >
> >     >
> >     >    > * The dupication there it is very, very limited (basically a
> > method
> >     >
> >     >    > call to
> >     >
> >     >    > secret backend).
> >     >
> >     >    >
> >     >
> >     >    > * Another benefit of it is that it would allow people still
> > stuck
> >     > on pre
> >     >
> >     >    > 1.10.10 to  write custom operators that would like to use
> > secret backends
> >     >
> >     >    > (via backport operators). And still continue doing it in the
> > future
> >     >
> >     >    > (possibly migrating to 2.0/1.10.10+ in cases when there is one
> > secret
> >     >
> >     >    > backed only - but continue ot use connections/hooks where some
> > specific
> >     >
> >     >    > secrets shoudl be kept in different secret backend.
> >     >
> >     >    >
> >     >
> >     >    > I would like to hear your opinion on that.
> >     >
> >     >    >
> >     >
> >     >    > J.
> >     >
> >     >    >
> >     >
> >     >    > --
> >     >
> >     >    >
> >     >
> >     >    > Jarek Potiuk
> >     >
> >     >    > Polidea
> >     > <
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.polidea.com_&d=DwICaQ&c=-0jfte1J3SKEE6FyZmTngg&r=cgex0jmJ1tJ3A5nVgQ7Pjo7sdo3NkXzIHPolJPlCwBw&m=AIc65Hls2sR87-APqi_0oh4N0F-NOUCC2ulfyS04GGU&s=r-sJroKu0X4XYmnboaHwbpbEIgk5TLwTErkDtwEgvog&e=
> >     > > | Principal Software Engineer
> >     >
> >     >    >
> >     >
> >     >    > M: +48 660 796 129 <+48660796129>
> >     >
> >     >    > [image: Polidea]
> >     > <
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.polidea.com_&d=DwICaQ&c=-0jfte1J3SKEE6FyZmTngg&r=cgex0jmJ1tJ3A5nVgQ7Pjo7sdo3NkXzIHPolJPlCwBw&m=AIc65Hls2sR87-APqi_0oh4N0F-NOUCC2ulfyS04GGU&s=r-sJroKu0X4XYmnboaHwbpbEIgk5TLwTErkDtwEgvog&e=
> > >
> >     >
> >     >    >
> >     >
> >
> >
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>

Re: [PROPOSAL] Secret Backend Hooks

Posted by Jarek Potiuk <Ja...@polidea.com>.
Yep. That is exactly why I think we need the hooks. Is it possible you
donate your code for the Vault Hook implementation ?

I would love to use it for my implementation. (And make you or whoever the
author is as co-author :)

J.


wt., 19 maj 2020, 09:41 użytkownik Nathan Hadfield <Na...@king.com>
napisał:

> Jarek,
>
> We are already using the secret backend for Airflow variables.  But,
> because of the example I explained and also a programmatic need to update
> our GCP Airflow connections every day, then we still have to maintain a
> secondary, custom method for Vault authentication and manipulation of other
> secrets.
>
> Cheers,
>
> Nathan
>
> On 18/05/2020, 20:07, "Jarek Potiuk" <Ja...@polidea.com> wrote:
>
>     Thanks Nathan,
>
>     I think your case is really good example where the Hook might be really
>     useful (and apparently somebody did it already via Hooks).
>
>     I wonder Nathan if you (in the future) switch to secret backend -
> would you
>     use the same secret backend for Airlfow connections/variables? Or do
> you
>     foresee that you will have another backend/credentials to access it?
>
>     Maybe others had similar experiences - and would like to share it here?
>
>     I still think there is a valid point in having separate hooks. Those
> are my
>     points:
>
>     1) Seems that the use pattern is close to what I described - separe
> secret
>     backend that contains more "dynamic" secrets. And I think still being
> able
>     to used different connections is a nice way of accessing multiple
> backend
>     credentials within Airflow core. I think there was a good reason why
> only
>     one backend is considered for "core" and it really ill-sutied to
> support
>     multiple credential backends. I can hardly imagine reading
> connections, or
>     variables from multiple secret backends. How would you choose which
> backend
>     to use for different variables? Fallback mechanisms? I think it's
> hardly
>     useful.  Hooks on the other hand (via connections) has built in way to
>     choose different backends and it's use pattern for custom operators is
>     really standard "airflow" way.
>
>     2) Python operator is not the best idea, because you need to provide
>     credentials to access secret backend. It can be done - of course - via
>     environment variables. but using connection from Airlfow has the
> additional
>     advantage of being encrypted at rest in the database. And with Hooks
> being
>     the common denominator of accessing external services (secret backend
> being
>     one of them) - it can hide all the authorisation and communication
> details
>     from the operators using the hook (this is basically what hook is for).
>
>     3) I have a good parallell here I think.  I would compare my proposal
> to
>     the current way we use Postgres and MySQL hooks vs. using SQLAlchemy
> for
>     Airflow itself. While Airflow uses Postgres and MySQL to provide it's
>     internal database, it also has the "postgres" and "MySQL" providers
> that
>     provide hooks that access the database in a "generic" way (and those
> hooks
>     are used by a number of operators). We still can choose various
> databases
>     to connect to via hooks - even if "Airflow core" uses that single
> database.
>
>     J.
>
>     On Mon, May 18, 2020 at 5:57 PM Nathan Hadfield <
> Nathan.Hadfield@king.com>
>     wrote:
>
>     > Yep, I understand.  I wasn't necessarily advocating for a Vault
> hook; just
>     > wanted to give some real world colour to the conversation and what
> we did
>     > to solve our needs prior to the secrets backend.
>     >
>     > I'm sure that extending the class would also enable the same
> functionality.
>     >
>     > Cheers,
>     >
>     > Nathan
>     >
>     > On 18/05/2020, 16:46, "Ash Berlin-Taylor" <as...@apache.org> wrote:
>     >
>     >     Accessing things that aren't connections or variables is,
> essentially
>     >     creating a third class of thing that Secrets store.
>     >
>     >     But that is a separate issue to what Jarek is proposing, which is
>     > Hooks.
>     >
>     >     For your use case a Python operator sounds like the best fit. A
> hook is
>     >     going to have to target the lowest common denominator, which
> means
>     >     vault-specific things are just a needless layer over the top.
>     >
>     >     Extending the existing Secrets Backend interface to support that
> is
>     >     doable, but I don't see the need for a Hook. Not everything
> needs to be
>     >     a hook :)
>     >
>     >     -ash
>     >
>     >
>     >     On May 18 2020, at 4:41 pm, Nathan Hadfield <
> Nathan.Hadfield@king.com>
>     > wrote:
>     >
>     >     > Hey,
>     >     >
>     >     >
>     >     >
>     >     > My quick two cents are that it would be good to access secrets
> that
>     >     > are not explicitly either connections or variables
>     >     >
>     >     >
>     >     >
>     >     > We have a need for DAGs that feature more complex interactions
> with
>     >     > Vault - which typically end up being custom operators - that I
> think
>     >     > would be helped by more generic capabilities.
>     >     >
>     >     >
>     >     >
>     >     > For example, we have an automated system that regularly
> rotates GCP
>     >     > service accounts across the whole company and stores them in
> Vault.
>     >     > We then have to ensure that our different Looker environments
> always
>     >     > have these SAs before the old ones expire every 48 hours.  To
> do
>     > this,
>     >     > we wrote a Vault Hook and a Looker Hook and them combine them
> in an
>     >     > operator which would read every SA from a specific Vault path
> and
>     > then
>     >     > update the connection inside Looker.
>     >     >
>     >     >
>     >     >
>     >     > I don’t know if this will influence your thinking in any way
> but just
>     >     > wanted to briefly share our experiences.  If anyone would like
> to
>     >     > learn more then please reach out and I’d be happy to share
> more.
>     >     >
>     >     >
>     >     >
>     >     > Cheers,
>     >     >
>     >     > Nathan
>     >     >
>     >     >
>     >     >
>     >     > On 18/05/2020, 15:21, "Ash Berlin-Taylor" <as...@apache.org>
> wrote:
>     >     >
>     >     >
>     >     >
>     >     >
>     >     >
>     >     >    > The good thing with it is that you could have easily
> multiple
>     > secret
>     >     >
>     >     >    > backends configured to retrieve secrets for specific
> "service"
>     > (so
>     >     >
>     >     >    > that you
>     >     >
>     >     >    > could keep "generic airflow's secerts" in one backend but
> still
>     > have
>     >     >
>     >     >    > possibility of custom operators to use other backends
> (with
>     > different
>     >     >
>     >     >    > authentication, scopes etc.).
>     >     >
>     >     >
>     >     >
>     >     >    Having the ability to configure multiple secrets backends is
>     > independent
>     >     >
>     >     >    of this feature. The original PR/AIP to add Secrets Backends
>     >     > decided to
>     >     >
>     >     >    leave this ability out as it was more complex to configure.
> We
>     >     > could add
>     >     >
>     >     >    that back in.
>     >     >
>     >     >
>     >     >
>     >     >    I still don't quite get from your example where you are
> proposing
>     > this
>     >     >
>     >     >    would be used? Can you give a fuller example please? Do you
> have a
>     >     >
>     >     >    concrete use case where you need this?
>     >     >
>     >     >
>     >     >
>     >     >    Not everything in Airflow needs to be a hook; just access
> the
>     > secrets
>     >     >
>     >     >    backend directly. I'm not sure what wrapping an extra layer
>     > around these
>     >     >
>     >     >    classes gives us?
>     >     >
>     >     >
>     >     >
>     >     >    Without a concrete example I can't see anything other than
> this
>     >     > adds a
>     >     >
>     >     >    lot of complexity.
>     >     >
>     >     >
>     >     >
>     >     >    -ash
>     >     >
>     >     >
>     >     >
>     >     >
>     >     >
>     >     >    On May 18 2020, at 2:45 pm, Jarek Potiuk <
>     > Jarek.Potiuk@polidea.com> wrote:
>     >     >
>     >     >
>     >     >
>     >     >    > Hello Everyone,
>     >     >
>     >     >    >
>     >     >
>     >     >    > TL;DR; I was just about to start to work on a small set of
>     > Hooks -
>     >     >
>     >     >    > dedicated to retrieving screts from the Secret Backend. I
>     >     > discussed it
>     >     >
>     >     >    > with Ash
>     >     >
>     >     >    > and Kamil
>     >     >
>     >     >    >
>     >     >
>     >     >
>     >     > <
>     >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__apache-2Dairflow.slack.com_archives_C0145R4NPS5_p1589805908013700&d=DwICaQ&c=-0jfte1J3SKEE6FyZmTngg&r=cgex0jmJ1tJ3A5nVgQ7Pjo7sdo3NkXzIHPolJPlCwBw&m=AIc65Hls2sR87-APqi_0oh4N0F-NOUCC2ulfyS04GGU&s=NBBItsFcPZR-C26VepQEehBPNPEWUsxar_DatX5ulco&e=
>     >     > > on
>     >     >
>     >     >    > Slack today. So far I thought I treat them as usual
> providers,
>     >     > but Ash
>     >     >
>     >     >    > raised some valid concenrs. so I wanted to raise teh
> proposal
>     >     > before I
>     >     >
>     >     >    > start working on it/
>     >     >
>     >     >    >
>     >     >
>     >     >    > *Context:*
>     >     >
>     >     >    >
>     >     >
>     >     >    > Currently we have "Secret Backend" support built in in
> 2.0 and
>     >     >
>     >     >    > 1.10.10+. It
>     >     >
>     >     >    > includes retrieving the variable and connections (via
> Secret
>     >     > Manager class)
>     >     >
>     >     >    > for:
>     >     >
>     >     >    >
>     >     >
>     >     >    >   -  Hashicorp Vault
>     >     >
>     >     >    >   -  Secret Manager
>     >     >
>     >     >    >   -  KMS
>     >     >
>     >     >    >   -  AWS secret manager
>     >     >
>     >     >    >
>     >     >
>     >     >    > Those secret managers are configured in:
>     >     >
>     >     >    >
>     >     >
>     >     >    > [secret]
>     >     >
>     >     >    > backend=<SecretManagerClass>
>     >     >
>     >     >    > backend_kwargs={}
>     >     >
>     >     >    >
>     >     >
>     >     >    > Those are available for use in a nice way (via Jinja
> templates
>     >     > and the
>     >     >
>     >     >    > like), but they need support in the Core of Airlfow (so
> require
>     > 1.10.10+).
>     >     >
>     >     >    > This means that if you are on pre 1.10.10 you cannot use
> those
>     > secrets.
>     >     >
>     >     >    > Currently you can only use one secret per whole Airflow
>     > installation
>     >     >
>     >     >    > so if
>     >     >
>     >     >    > your secrets are split between several secret managers
> (or if
>     >     > secrets for
>     >     >
>     >     >    > particular service require different credentials) - you
> cannot
>     >     > use the
>     >     >
>     >     >    > mechanism to access such distributed secrets. It's not
> often
>     >     > case, but I
>     >     >
>     >     >    > very well imagine it might happen that there are
> different sets
>     > of
>     >     >
>     >     >    > credentials to access different secrets - some services
> might
>     > have
>     >     >
>     >     >    > different scopes/level of access needed. .
>     >     >
>     >     >    >
>     >     >
>     >     >    > *Proposal*
>     >     >
>     >     >    >
>     >     >
>     >     >    > We have an idea that we might want also (on top of the
> above
>     > SecretManager
>     >     >
>     >     >    > implementation) define generic Hooks for accessing
> secrets from
>     > those
>     >     >
>     >     >    > services (just generic secrets, not connection,
> variables).
>     >     > Simply treat
>     >     >
>     >     >    > each of the backends above as another "provider" and
> create a
>     >     > Hook to
>     >     >
>     >     >    > access the service. Such Hook could have just one method:
>     >     >
>     >     >    >
>     >     >
>     >     >    > def get_secret(self, path_prefix: str, secret_id: str) ->
>     > Optional[str]
>     >     >
>     >     >    >
>     >     >
>     >     >    > It would use a connection defined (as usual) in ENV
> variables
>     > or database
>     >     >
>     >     >    > of Airflow to authenticate with the secret service and
> retrieve
>     > the
>     >     >
>     >     >    > secrets.
>     >     >
>     >     >    >
>     >     >
>     >     >    > The good thing with it is that you could have easily
> multiple
>     > secret
>     >     >
>     >     >    > backends configured to retrieve secrets for specific
> "service"
>     > (so
>     >     >
>     >     >    > that you
>     >     >
>     >     >    > could keep "generic airflow's secerts" in one backend but
> still
>     > have
>     >     >
>     >     >    > possibility of custom operators to use other backends
> (with
>     > different
>     >     >
>     >     >    > authentication,  scopes etc.). And it is not touching any
> of the
>     >     >
>     >     >    > "core" of
>     >     >
>     >     >    > Airflow. It's just a set of hooks with corresponding
> connections
>     >     > that work
>     >     >
>     >     >    > the same way as accessing any other provider in Airflow.
> No core
>     >     > of Airflow
>     >     >
>     >     >    > will be touched with this change.
>     >     >
>     >     >    >
>     >     >
>     >     >    > *Pros/Cons*
>     >     >
>     >     >    >
>     >     >
>     >     >    > *Con:*
>     >     >
>     >     >    >
>     >     >
>     >     >    > I do realise it is a bit of duplication in functionality.
> We
>     > already
>     >     >
>     >     >    > have a
>     >     >
>     >     >    > way to connect to a secret backend via airflow
> configuration and
>     >     > we should
>     >     >
>     >     >    > likely promote it rather than introduce additional
> mechanism.
>     >     >
>     >     >    >
>     >     >
>     >     >    > *Pros:*
>     >     >
>     >     >    >
>     >     >
>     >     >    > * Most of all -> it adds flexibility of accessing several
>     > secret backends
>     >     >
>     >     >    > for different use-cases. I looked at it so far in the way
> those
>     >     > hooks are
>     >     >
>     >     >    > merely another set of "provider hooks". For me this is
> nothing
>     > different
>     >     >
>     >     >    > than "providers" for any other services we have.  fFr
> example
>     > "cloudant"
>     >     >
>     >     >    > provider has only "CloudantHook" that other custom
> operators
>     > can use.
>     >     >
>     >     >    > And I
>     >     >
>     >     >    > well imagine this might be actually even more convenient
> to
>     > configure
>     >     >
>     >     >    > connections in the DB and access secrets this way rather
> than
>     >     > having to
>     >     >
>     >     >    > configure Secret Backends in Airflow configuration.
>     >     >
>     >     >    >
>     >     >
>     >     >    > * The dupication there it is very, very limited
> (basically a
>     > method
>     >     >
>     >     >    > call to
>     >     >
>     >     >    > secret backend).
>     >     >
>     >     >    >
>     >     >
>     >     >    > * Another benefit of it is that it would allow people
> still
>     > stuck
>     >     > on pre
>     >     >
>     >     >    > 1.10.10 to  write custom operators that would like to use
>     > secret backends
>     >     >
>     >     >    > (via backport operators). And still continue doing it in
> the
>     > future
>     >     >
>     >     >    > (possibly migrating to 2.0/1.10.10+ in cases when there
> is one
>     > secret
>     >     >
>     >     >    > backed only - but continue ot use connections/hooks where
> some
>     > specific
>     >     >
>     >     >    > secrets shoudl be kept in different secret backend.
>     >     >
>     >     >    >
>     >     >
>     >     >    > I would like to hear your opinion on that.
>     >     >
>     >     >    >
>     >     >
>     >     >    > J.
>     >     >
>     >     >    >
>     >     >
>     >     >    > --
>     >     >
>     >     >    >
>     >     >
>     >     >    > Jarek Potiuk
>     >     >
>     >     >    > Polidea
>     >     > <
>     >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.polidea.com_&d=DwICaQ&c=-0jfte1J3SKEE6FyZmTngg&r=cgex0jmJ1tJ3A5nVgQ7Pjo7sdo3NkXzIHPolJPlCwBw&m=AIc65Hls2sR87-APqi_0oh4N0F-NOUCC2ulfyS04GGU&s=r-sJroKu0X4XYmnboaHwbpbEIgk5TLwTErkDtwEgvog&e=
>     >     > > | Principal Software Engineer
>     >     >
>     >     >    >
>     >     >
>     >     >    > M: +48 660 796 129 <+48660796129>
>     >     >
>     >     >    > [image: Polidea]
>     >     > <
>     >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.polidea.com_&d=DwICaQ&c=-0jfte1J3SKEE6FyZmTngg&r=cgex0jmJ1tJ3A5nVgQ7Pjo7sdo3NkXzIHPolJPlCwBw&m=AIc65Hls2sR87-APqi_0oh4N0F-NOUCC2ulfyS04GGU&s=r-sJroKu0X4XYmnboaHwbpbEIgk5TLwTErkDtwEgvog&e=
>     > >
>     >     >
>     >     >    >
>     >     >
>     >
>     >
>
>     --
>
>     Jarek Potiuk
>     Polidea <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.polidea.com_&d=DwIFaQ&c=-0jfte1J3SKEE6FyZmTngg&r=cgex0jmJ1tJ3A5nVgQ7Pjo7sdo3NkXzIHPolJPlCwBw&m=2Np0DHPYBn3aIynz3Rjb_Chh91zIO8nPv_zlsiom6cU&s=bTrXyNYkUkvEsg8UsK8c5R5LlmeljEtcrp9EqmxS-hM&e=
> > | Principal Software Engineer
>
>     M: +48 660 796 129 <+48660796129>
>     [image: Polidea] <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.polidea.com_&d=DwIFaQ&c=-0jfte1J3SKEE6FyZmTngg&r=cgex0jmJ1tJ3A5nVgQ7Pjo7sdo3NkXzIHPolJPlCwBw&m=2Np0DHPYBn3aIynz3Rjb_Chh91zIO8nPv_zlsiom6cU&s=bTrXyNYkUkvEsg8UsK8c5R5LlmeljEtcrp9EqmxS-hM&e=
> >
>
>

Re: [PROPOSAL] Secret Backend Hooks

Posted by Nathan Hadfield <Na...@king.com>.
Jarek,

We are already using the secret backend for Airflow variables.  But, because of the example I explained and also a programmatic need to update our GCP Airflow connections every day, then we still have to maintain a secondary, custom method for Vault authentication and manipulation of other secrets.

Cheers,
 
Nathan

On 18/05/2020, 20:07, "Jarek Potiuk" <Ja...@polidea.com> wrote:

    Thanks Nathan,

    I think your case is really good example where the Hook might be really
    useful (and apparently somebody did it already via Hooks).

    I wonder Nathan if you (in the future) switch to secret backend - would you
    use the same secret backend for Airlfow connections/variables? Or do you
    foresee that you will have another backend/credentials to access it?

    Maybe others had similar experiences - and would like to share it here?

    I still think there is a valid point in having separate hooks. Those are my
    points:

    1) Seems that the use pattern is close to what I described - separe secret
    backend that contains more "dynamic" secrets. And I think still being able
    to used different connections is a nice way of accessing multiple backend
    credentials within Airflow core. I think there was a good reason why only
    one backend is considered for "core" and it really ill-sutied to support
    multiple credential backends. I can hardly imagine reading connections, or
    variables from multiple secret backends. How would you choose which backend
    to use for different variables? Fallback mechanisms? I think it's hardly
    useful.  Hooks on the other hand (via connections) has built in way to
    choose different backends and it's use pattern for custom operators is
    really standard "airflow" way.

    2) Python operator is not the best idea, because you need to provide
    credentials to access secret backend. It can be done - of course - via
    environment variables. but using connection from Airlfow has the additional
    advantage of being encrypted at rest in the database. And with Hooks being
    the common denominator of accessing external services (secret backend being
    one of them) - it can hide all the authorisation and communication details
    from the operators using the hook (this is basically what hook is for).

    3) I have a good parallell here I think.  I would compare my proposal to
    the current way we use Postgres and MySQL hooks vs. using SQLAlchemy for
    Airflow itself. While Airflow uses Postgres and MySQL to provide it's
    internal database, it also has the "postgres" and "MySQL" providers that
    provide hooks that access the database in a "generic" way (and those hooks
    are used by a number of operators). We still can choose various databases
    to connect to via hooks - even if "Airflow core" uses that single database.

    J.

    On Mon, May 18, 2020 at 5:57 PM Nathan Hadfield <Na...@king.com>
    wrote:

    > Yep, I understand.  I wasn't necessarily advocating for a Vault hook; just
    > wanted to give some real world colour to the conversation and what we did
    > to solve our needs prior to the secrets backend.
    >
    > I'm sure that extending the class would also enable the same functionality.
    >
    > Cheers,
    >
    > Nathan
    >
    > On 18/05/2020, 16:46, "Ash Berlin-Taylor" <as...@apache.org> wrote:
    >
    >     Accessing things that aren't connections or variables is, essentially
    >     creating a third class of thing that Secrets store.
    >
    >     But that is a separate issue to what Jarek is proposing, which is
    > Hooks.
    >
    >     For your use case a Python operator sounds like the best fit. A hook is
    >     going to have to target the lowest common denominator, which means
    >     vault-specific things are just a needless layer over the top.
    >
    >     Extending the existing Secrets Backend interface to support that is
    >     doable, but I don't see the need for a Hook. Not everything needs to be
    >     a hook :)
    >
    >     -ash
    >
    >
    >     On May 18 2020, at 4:41 pm, Nathan Hadfield <Na...@king.com>
    > wrote:
    >
    >     > Hey,
    >     >
    >     >
    >     >
    >     > My quick two cents are that it would be good to access secrets that
    >     > are not explicitly either connections or variables
    >     >
    >     >
    >     >
    >     > We have a need for DAGs that feature more complex interactions with
    >     > Vault - which typically end up being custom operators - that I think
    >     > would be helped by more generic capabilities.
    >     >
    >     >
    >     >
    >     > For example, we have an automated system that regularly rotates GCP
    >     > service accounts across the whole company and stores them in Vault.
    >     > We then have to ensure that our different Looker environments always
    >     > have these SAs before the old ones expire every 48 hours.  To do
    > this,
    >     > we wrote a Vault Hook and a Looker Hook and them combine them in an
    >     > operator which would read every SA from a specific Vault path and
    > then
    >     > update the connection inside Looker.
    >     >
    >     >
    >     >
    >     > I don’t know if this will influence your thinking in any way but just
    >     > wanted to briefly share our experiences.  If anyone would like to
    >     > learn more then please reach out and I’d be happy to share more.
    >     >
    >     >
    >     >
    >     > Cheers,
    >     >
    >     > Nathan
    >     >
    >     >
    >     >
    >     > On 18/05/2020, 15:21, "Ash Berlin-Taylor" <as...@apache.org> wrote:
    >     >
    >     >
    >     >
    >     >
    >     >
    >     >    > The good thing with it is that you could have easily multiple
    > secret
    >     >
    >     >    > backends configured to retrieve secrets for specific "service"
    > (so
    >     >
    >     >    > that you
    >     >
    >     >    > could keep "generic airflow's secerts" in one backend but still
    > have
    >     >
    >     >    > possibility of custom operators to use other backends (with
    > different
    >     >
    >     >    > authentication, scopes etc.).
    >     >
    >     >
    >     >
    >     >    Having the ability to configure multiple secrets backends is
    > independent
    >     >
    >     >    of this feature. The original PR/AIP to add Secrets Backends
    >     > decided to
    >     >
    >     >    leave this ability out as it was more complex to configure. We
    >     > could add
    >     >
    >     >    that back in.
    >     >
    >     >
    >     >
    >     >    I still don't quite get from your example where you are proposing
    > this
    >     >
    >     >    would be used? Can you give a fuller example please? Do you have a
    >     >
    >     >    concrete use case where you need this?
    >     >
    >     >
    >     >
    >     >    Not everything in Airflow needs to be a hook; just access the
    > secrets
    >     >
    >     >    backend directly. I'm not sure what wrapping an extra layer
    > around these
    >     >
    >     >    classes gives us?
    >     >
    >     >
    >     >
    >     >    Without a concrete example I can't see anything other than this
    >     > adds a
    >     >
    >     >    lot of complexity.
    >     >
    >     >
    >     >
    >     >    -ash
    >     >
    >     >
    >     >
    >     >
    >     >
    >     >    On May 18 2020, at 2:45 pm, Jarek Potiuk <
    > Jarek.Potiuk@polidea.com> wrote:
    >     >
    >     >
    >     >
    >     >    > Hello Everyone,
    >     >
    >     >    >
    >     >
    >     >    > TL;DR; I was just about to start to work on a small set of
    > Hooks -
    >     >
    >     >    > dedicated to retrieving screts from the Secret Backend. I
    >     > discussed it
    >     >
    >     >    > with Ash
    >     >
    >     >    > and Kamil
    >     >
    >     >    >
    >     >
    >     >
    >     > <
    > https://urldefense.proofpoint.com/v2/url?u=https-3A__apache-2Dairflow.slack.com_archives_C0145R4NPS5_p1589805908013700&d=DwICaQ&c=-0jfte1J3SKEE6FyZmTngg&r=cgex0jmJ1tJ3A5nVgQ7Pjo7sdo3NkXzIHPolJPlCwBw&m=AIc65Hls2sR87-APqi_0oh4N0F-NOUCC2ulfyS04GGU&s=NBBItsFcPZR-C26VepQEehBPNPEWUsxar_DatX5ulco&e=
    >     > > on
    >     >
    >     >    > Slack today. So far I thought I treat them as usual providers,
    >     > but Ash
    >     >
    >     >    > raised some valid concenrs. so I wanted to raise teh proposal
    >     > before I
    >     >
    >     >    > start working on it/
    >     >
    >     >    >
    >     >
    >     >    > *Context:*
    >     >
    >     >    >
    >     >
    >     >    > Currently we have "Secret Backend" support built in in 2.0 and
    >     >
    >     >    > 1.10.10+. It
    >     >
    >     >    > includes retrieving the variable and connections (via Secret
    >     > Manager class)
    >     >
    >     >    > for:
    >     >
    >     >    >
    >     >
    >     >    >   -  Hashicorp Vault
    >     >
    >     >    >   -  Secret Manager
    >     >
    >     >    >   -  KMS
    >     >
    >     >    >   -  AWS secret manager
    >     >
    >     >    >
    >     >
    >     >    > Those secret managers are configured in:
    >     >
    >     >    >
    >     >
    >     >    > [secret]
    >     >
    >     >    > backend=<SecretManagerClass>
    >     >
    >     >    > backend_kwargs={}
    >     >
    >     >    >
    >     >
    >     >    > Those are available for use in a nice way (via Jinja templates
    >     > and the
    >     >
    >     >    > like), but they need support in the Core of Airlfow (so require
    > 1.10.10+).
    >     >
    >     >    > This means that if you are on pre 1.10.10 you cannot use those
    > secrets.
    >     >
    >     >    > Currently you can only use one secret per whole Airflow
    > installation
    >     >
    >     >    > so if
    >     >
    >     >    > your secrets are split between several secret managers (or if
    >     > secrets for
    >     >
    >     >    > particular service require different credentials) - you cannot
    >     > use the
    >     >
    >     >    > mechanism to access such distributed secrets. It's not often
    >     > case, but I
    >     >
    >     >    > very well imagine it might happen that there are different sets
    > of
    >     >
    >     >    > credentials to access different secrets - some services might
    > have
    >     >
    >     >    > different scopes/level of access needed. .
    >     >
    >     >    >
    >     >
    >     >    > *Proposal*
    >     >
    >     >    >
    >     >
    >     >    > We have an idea that we might want also (on top of the above
    > SecretManager
    >     >
    >     >    > implementation) define generic Hooks for accessing secrets from
    > those
    >     >
    >     >    > services (just generic secrets, not connection, variables).
    >     > Simply treat
    >     >
    >     >    > each of the backends above as another "provider" and create a
    >     > Hook to
    >     >
    >     >    > access the service. Such Hook could have just one method:
    >     >
    >     >    >
    >     >
    >     >    > def get_secret(self, path_prefix: str, secret_id: str) ->
    > Optional[str]
    >     >
    >     >    >
    >     >
    >     >    > It would use a connection defined (as usual) in ENV variables
    > or database
    >     >
    >     >    > of Airflow to authenticate with the secret service and retrieve
    > the
    >     >
    >     >    > secrets.
    >     >
    >     >    >
    >     >
    >     >    > The good thing with it is that you could have easily multiple
    > secret
    >     >
    >     >    > backends configured to retrieve secrets for specific "service"
    > (so
    >     >
    >     >    > that you
    >     >
    >     >    > could keep "generic airflow's secerts" in one backend but still
    > have
    >     >
    >     >    > possibility of custom operators to use other backends (with
    > different
    >     >
    >     >    > authentication,  scopes etc.). And it is not touching any of the
    >     >
    >     >    > "core" of
    >     >
    >     >    > Airflow. It's just a set of hooks with corresponding connections
    >     > that work
    >     >
    >     >    > the same way as accessing any other provider in Airflow. No core
    >     > of Airflow
    >     >
    >     >    > will be touched with this change.
    >     >
    >     >    >
    >     >
    >     >    > *Pros/Cons*
    >     >
    >     >    >
    >     >
    >     >    > *Con:*
    >     >
    >     >    >
    >     >
    >     >    > I do realise it is a bit of duplication in functionality. We
    > already
    >     >
    >     >    > have a
    >     >
    >     >    > way to connect to a secret backend via airflow configuration and
    >     > we should
    >     >
    >     >    > likely promote it rather than introduce additional mechanism.
    >     >
    >     >    >
    >     >
    >     >    > *Pros:*
    >     >
    >     >    >
    >     >
    >     >    > * Most of all -> it adds flexibility of accessing several
    > secret backends
    >     >
    >     >    > for different use-cases. I looked at it so far in the way those
    >     > hooks are
    >     >
    >     >    > merely another set of "provider hooks". For me this is nothing
    > different
    >     >
    >     >    > than "providers" for any other services we have.  fFr example
    > "cloudant"
    >     >
    >     >    > provider has only "CloudantHook" that other custom operators
    > can use.
    >     >
    >     >    > And I
    >     >
    >     >    > well imagine this might be actually even more convenient to
    > configure
    >     >
    >     >    > connections in the DB and access secrets this way rather than
    >     > having to
    >     >
    >     >    > configure Secret Backends in Airflow configuration.
    >     >
    >     >    >
    >     >
    >     >    > * The dupication there it is very, very limited (basically a
    > method
    >     >
    >     >    > call to
    >     >
    >     >    > secret backend).
    >     >
    >     >    >
    >     >
    >     >    > * Another benefit of it is that it would allow people still
    > stuck
    >     > on pre
    >     >
    >     >    > 1.10.10 to  write custom operators that would like to use
    > secret backends
    >     >
    >     >    > (via backport operators). And still continue doing it in the
    > future
    >     >
    >     >    > (possibly migrating to 2.0/1.10.10+ in cases when there is one
    > secret
    >     >
    >     >    > backed only - but continue ot use connections/hooks where some
    > specific
    >     >
    >     >    > secrets shoudl be kept in different secret backend.
    >     >
    >     >    >
    >     >
    >     >    > I would like to hear your opinion on that.
    >     >
    >     >    >
    >     >
    >     >    > J.
    >     >
    >     >    >
    >     >
    >     >    > --
    >     >
    >     >    >
    >     >
    >     >    > Jarek Potiuk
    >     >
    >     >    > Polidea
    >     > <
    > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.polidea.com_&d=DwICaQ&c=-0jfte1J3SKEE6FyZmTngg&r=cgex0jmJ1tJ3A5nVgQ7Pjo7sdo3NkXzIHPolJPlCwBw&m=AIc65Hls2sR87-APqi_0oh4N0F-NOUCC2ulfyS04GGU&s=r-sJroKu0X4XYmnboaHwbpbEIgk5TLwTErkDtwEgvog&e=
    >     > > | Principal Software Engineer
    >     >
    >     >    >
    >     >
    >     >    > M: +48 660 796 129 <+48660796129>
    >     >
    >     >    > [image: Polidea]
    >     > <
    > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.polidea.com_&d=DwICaQ&c=-0jfte1J3SKEE6FyZmTngg&r=cgex0jmJ1tJ3A5nVgQ7Pjo7sdo3NkXzIHPolJPlCwBw&m=AIc65Hls2sR87-APqi_0oh4N0F-NOUCC2ulfyS04GGU&s=r-sJroKu0X4XYmnboaHwbpbEIgk5TLwTErkDtwEgvog&e=
    > >
    >     >
    >     >    >
    >     >
    >
    >

    -- 

    Jarek Potiuk
    Polidea <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.polidea.com_&d=DwIFaQ&c=-0jfte1J3SKEE6FyZmTngg&r=cgex0jmJ1tJ3A5nVgQ7Pjo7sdo3NkXzIHPolJPlCwBw&m=2Np0DHPYBn3aIynz3Rjb_Chh91zIO8nPv_zlsiom6cU&s=bTrXyNYkUkvEsg8UsK8c5R5LlmeljEtcrp9EqmxS-hM&e= > | Principal Software Engineer

    M: +48 660 796 129 <+48660796129>
    [image: Polidea] <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.polidea.com_&d=DwIFaQ&c=-0jfte1J3SKEE6FyZmTngg&r=cgex0jmJ1tJ3A5nVgQ7Pjo7sdo3NkXzIHPolJPlCwBw&m=2Np0DHPYBn3aIynz3Rjb_Chh91zIO8nPv_zlsiom6cU&s=bTrXyNYkUkvEsg8UsK8c5R5LlmeljEtcrp9EqmxS-hM&e= >


Re: [PROPOSAL] Secret Backend Hooks

Posted by Jarek Potiuk <Ja...@polidea.com>.
Thanks Nathan,

I think your case is really good example where the Hook might be really
useful (and apparently somebody did it already via Hooks).

I wonder Nathan if you (in the future) switch to secret backend - would you
use the same secret backend for Airlfow connections/variables? Or do you
foresee that you will have another backend/credentials to access it?

Maybe others had similar experiences - and would like to share it here?

I still think there is a valid point in having separate hooks. Those are my
points:

1) Seems that the use pattern is close to what I described - separe secret
backend that contains more "dynamic" secrets. And I think still being able
to used different connections is a nice way of accessing multiple backend
credentials within Airflow core. I think there was a good reason why only
one backend is considered for "core" and it really ill-sutied to support
multiple credential backends. I can hardly imagine reading connections, or
variables from multiple secret backends. How would you choose which backend
to use for different variables? Fallback mechanisms? I think it's hardly
useful.  Hooks on the other hand (via connections) has built in way to
choose different backends and it's use pattern for custom operators is
really standard "airflow" way.

2) Python operator is not the best idea, because you need to provide
credentials to access secret backend. It can be done - of course - via
environment variables. but using connection from Airlfow has the additional
advantage of being encrypted at rest in the database. And with Hooks being
the common denominator of accessing external services (secret backend being
one of them) - it can hide all the authorisation and communication details
from the operators using the hook (this is basically what hook is for).

3) I have a good parallell here I think.  I would compare my proposal to
the current way we use Postgres and MySQL hooks vs. using SQLAlchemy for
Airflow itself. While Airflow uses Postgres and MySQL to provide it's
internal database, it also has the "postgres" and "MySQL" providers that
provide hooks that access the database in a "generic" way (and those hooks
are used by a number of operators). We still can choose various databases
to connect to via hooks - even if "Airflow core" uses that single database.

J.

On Mon, May 18, 2020 at 5:57 PM Nathan Hadfield <Na...@king.com>
wrote:

> Yep, I understand.  I wasn't necessarily advocating for a Vault hook; just
> wanted to give some real world colour to the conversation and what we did
> to solve our needs prior to the secrets backend.
>
> I'm sure that extending the class would also enable the same functionality.
>
> Cheers,
>
> Nathan
>
> On 18/05/2020, 16:46, "Ash Berlin-Taylor" <as...@apache.org> wrote:
>
>     Accessing things that aren't connections or variables is, essentially
>     creating a third class of thing that Secrets store.
>
>     But that is a separate issue to what Jarek is proposing, which is
> Hooks.
>
>     For your use case a Python operator sounds like the best fit. A hook is
>     going to have to target the lowest common denominator, which means
>     vault-specific things are just a needless layer over the top.
>
>     Extending the existing Secrets Backend interface to support that is
>     doable, but I don't see the need for a Hook. Not everything needs to be
>     a hook :)
>
>     -ash
>
>
>     On May 18 2020, at 4:41 pm, Nathan Hadfield <Na...@king.com>
> wrote:
>
>     > Hey,
>     >
>     >
>     >
>     > My quick two cents are that it would be good to access secrets that
>     > are not explicitly either connections or variables
>     >
>     >
>     >
>     > We have a need for DAGs that feature more complex interactions with
>     > Vault - which typically end up being custom operators - that I think
>     > would be helped by more generic capabilities.
>     >
>     >
>     >
>     > For example, we have an automated system that regularly rotates GCP
>     > service accounts across the whole company and stores them in Vault.
>     > We then have to ensure that our different Looker environments always
>     > have these SAs before the old ones expire every 48 hours.  To do
> this,
>     > we wrote a Vault Hook and a Looker Hook and them combine them in an
>     > operator which would read every SA from a specific Vault path and
> then
>     > update the connection inside Looker.
>     >
>     >
>     >
>     > I don’t know if this will influence your thinking in any way but just
>     > wanted to briefly share our experiences.  If anyone would like to
>     > learn more then please reach out and I’d be happy to share more.
>     >
>     >
>     >
>     > Cheers,
>     >
>     > Nathan
>     >
>     >
>     >
>     > On 18/05/2020, 15:21, "Ash Berlin-Taylor" <as...@apache.org> wrote:
>     >
>     >
>     >
>     >
>     >
>     >    > The good thing with it is that you could have easily multiple
> secret
>     >
>     >    > backends configured to retrieve secrets for specific "service"
> (so
>     >
>     >    > that you
>     >
>     >    > could keep "generic airflow's secerts" in one backend but still
> have
>     >
>     >    > possibility of custom operators to use other backends (with
> different
>     >
>     >    > authentication, scopes etc.).
>     >
>     >
>     >
>     >    Having the ability to configure multiple secrets backends is
> independent
>     >
>     >    of this feature. The original PR/AIP to add Secrets Backends
>     > decided to
>     >
>     >    leave this ability out as it was more complex to configure. We
>     > could add
>     >
>     >    that back in.
>     >
>     >
>     >
>     >    I still don't quite get from your example where you are proposing
> this
>     >
>     >    would be used? Can you give a fuller example please? Do you have a
>     >
>     >    concrete use case where you need this?
>     >
>     >
>     >
>     >    Not everything in Airflow needs to be a hook; just access the
> secrets
>     >
>     >    backend directly. I'm not sure what wrapping an extra layer
> around these
>     >
>     >    classes gives us?
>     >
>     >
>     >
>     >    Without a concrete example I can't see anything other than this
>     > adds a
>     >
>     >    lot of complexity.
>     >
>     >
>     >
>     >    -ash
>     >
>     >
>     >
>     >
>     >
>     >    On May 18 2020, at 2:45 pm, Jarek Potiuk <
> Jarek.Potiuk@polidea.com> wrote:
>     >
>     >
>     >
>     >    > Hello Everyone,
>     >
>     >    >
>     >
>     >    > TL;DR; I was just about to start to work on a small set of
> Hooks -
>     >
>     >    > dedicated to retrieving screts from the Secret Backend. I
>     > discussed it
>     >
>     >    > with Ash
>     >
>     >    > and Kamil
>     >
>     >    >
>     >
>     >
>     > <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__apache-2Dairflow.slack.com_archives_C0145R4NPS5_p1589805908013700&d=DwICaQ&c=-0jfte1J3SKEE6FyZmTngg&r=cgex0jmJ1tJ3A5nVgQ7Pjo7sdo3NkXzIHPolJPlCwBw&m=AIc65Hls2sR87-APqi_0oh4N0F-NOUCC2ulfyS04GGU&s=NBBItsFcPZR-C26VepQEehBPNPEWUsxar_DatX5ulco&e=
>     > > on
>     >
>     >    > Slack today. So far I thought I treat them as usual providers,
>     > but Ash
>     >
>     >    > raised some valid concenrs. so I wanted to raise teh proposal
>     > before I
>     >
>     >    > start working on it/
>     >
>     >    >
>     >
>     >    > *Context:*
>     >
>     >    >
>     >
>     >    > Currently we have "Secret Backend" support built in in 2.0 and
>     >
>     >    > 1.10.10+. It
>     >
>     >    > includes retrieving the variable and connections (via Secret
>     > Manager class)
>     >
>     >    > for:
>     >
>     >    >
>     >
>     >    >   -  Hashicorp Vault
>     >
>     >    >   -  Secret Manager
>     >
>     >    >   -  KMS
>     >
>     >    >   -  AWS secret manager
>     >
>     >    >
>     >
>     >    > Those secret managers are configured in:
>     >
>     >    >
>     >
>     >    > [secret]
>     >
>     >    > backend=<SecretManagerClass>
>     >
>     >    > backend_kwargs={}
>     >
>     >    >
>     >
>     >    > Those are available for use in a nice way (via Jinja templates
>     > and the
>     >
>     >    > like), but they need support in the Core of Airlfow (so require
> 1.10.10+).
>     >
>     >    > This means that if you are on pre 1.10.10 you cannot use those
> secrets.
>     >
>     >    > Currently you can only use one secret per whole Airflow
> installation
>     >
>     >    > so if
>     >
>     >    > your secrets are split between several secret managers (or if
>     > secrets for
>     >
>     >    > particular service require different credentials) - you cannot
>     > use the
>     >
>     >    > mechanism to access such distributed secrets. It's not often
>     > case, but I
>     >
>     >    > very well imagine it might happen that there are different sets
> of
>     >
>     >    > credentials to access different secrets - some services might
> have
>     >
>     >    > different scopes/level of access needed. .
>     >
>     >    >
>     >
>     >    > *Proposal*
>     >
>     >    >
>     >
>     >    > We have an idea that we might want also (on top of the above
> SecretManager
>     >
>     >    > implementation) define generic Hooks for accessing secrets from
> those
>     >
>     >    > services (just generic secrets, not connection, variables).
>     > Simply treat
>     >
>     >    > each of the backends above as another "provider" and create a
>     > Hook to
>     >
>     >    > access the service. Such Hook could have just one method:
>     >
>     >    >
>     >
>     >    > def get_secret(self, path_prefix: str, secret_id: str) ->
> Optional[str]
>     >
>     >    >
>     >
>     >    > It would use a connection defined (as usual) in ENV variables
> or database
>     >
>     >    > of Airflow to authenticate with the secret service and retrieve
> the
>     >
>     >    > secrets.
>     >
>     >    >
>     >
>     >    > The good thing with it is that you could have easily multiple
> secret
>     >
>     >    > backends configured to retrieve secrets for specific "service"
> (so
>     >
>     >    > that you
>     >
>     >    > could keep "generic airflow's secerts" in one backend but still
> have
>     >
>     >    > possibility of custom operators to use other backends (with
> different
>     >
>     >    > authentication,  scopes etc.). And it is not touching any of the
>     >
>     >    > "core" of
>     >
>     >    > Airflow. It's just a set of hooks with corresponding connections
>     > that work
>     >
>     >    > the same way as accessing any other provider in Airflow. No core
>     > of Airflow
>     >
>     >    > will be touched with this change.
>     >
>     >    >
>     >
>     >    > *Pros/Cons*
>     >
>     >    >
>     >
>     >    > *Con:*
>     >
>     >    >
>     >
>     >    > I do realise it is a bit of duplication in functionality. We
> already
>     >
>     >    > have a
>     >
>     >    > way to connect to a secret backend via airflow configuration and
>     > we should
>     >
>     >    > likely promote it rather than introduce additional mechanism.
>     >
>     >    >
>     >
>     >    > *Pros:*
>     >
>     >    >
>     >
>     >    > * Most of all -> it adds flexibility of accessing several
> secret backends
>     >
>     >    > for different use-cases. I looked at it so far in the way those
>     > hooks are
>     >
>     >    > merely another set of "provider hooks". For me this is nothing
> different
>     >
>     >    > than "providers" for any other services we have.  fFr example
> "cloudant"
>     >
>     >    > provider has only "CloudantHook" that other custom operators
> can use.
>     >
>     >    > And I
>     >
>     >    > well imagine this might be actually even more convenient to
> configure
>     >
>     >    > connections in the DB and access secrets this way rather than
>     > having to
>     >
>     >    > configure Secret Backends in Airflow configuration.
>     >
>     >    >
>     >
>     >    > * The dupication there it is very, very limited (basically a
> method
>     >
>     >    > call to
>     >
>     >    > secret backend).
>     >
>     >    >
>     >
>     >    > * Another benefit of it is that it would allow people still
> stuck
>     > on pre
>     >
>     >    > 1.10.10 to  write custom operators that would like to use
> secret backends
>     >
>     >    > (via backport operators). And still continue doing it in the
> future
>     >
>     >    > (possibly migrating to 2.0/1.10.10+ in cases when there is one
> secret
>     >
>     >    > backed only - but continue ot use connections/hooks where some
> specific
>     >
>     >    > secrets shoudl be kept in different secret backend.
>     >
>     >    >
>     >
>     >    > I would like to hear your opinion on that.
>     >
>     >    >
>     >
>     >    > J.
>     >
>     >    >
>     >
>     >    > --
>     >
>     >    >
>     >
>     >    > Jarek Potiuk
>     >
>     >    > Polidea
>     > <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.polidea.com_&d=DwICaQ&c=-0jfte1J3SKEE6FyZmTngg&r=cgex0jmJ1tJ3A5nVgQ7Pjo7sdo3NkXzIHPolJPlCwBw&m=AIc65Hls2sR87-APqi_0oh4N0F-NOUCC2ulfyS04GGU&s=r-sJroKu0X4XYmnboaHwbpbEIgk5TLwTErkDtwEgvog&e=
>     > > | Principal Software Engineer
>     >
>     >    >
>     >
>     >    > M: +48 660 796 129 <+48660796129>
>     >
>     >    > [image: Polidea]
>     > <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.polidea.com_&d=DwICaQ&c=-0jfte1J3SKEE6FyZmTngg&r=cgex0jmJ1tJ3A5nVgQ7Pjo7sdo3NkXzIHPolJPlCwBw&m=AIc65Hls2sR87-APqi_0oh4N0F-NOUCC2ulfyS04GGU&s=r-sJroKu0X4XYmnboaHwbpbEIgk5TLwTErkDtwEgvog&e=
> >
>     >
>     >    >
>     >
>
>

-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Re: [PROPOSAL] Secret Backend Hooks

Posted by Nathan Hadfield <Na...@king.com>.
Yep, I understand.  I wasn't necessarily advocating for a Vault hook; just wanted to give some real world colour to the conversation and what we did to solve our needs prior to the secrets backend.

I'm sure that extending the class would also enable the same functionality.

Cheers,
 
Nathan

On 18/05/2020, 16:46, "Ash Berlin-Taylor" <as...@apache.org> wrote:

    Accessing things that aren't connections or variables is, essentially
    creating a third class of thing that Secrets store.

    But that is a separate issue to what Jarek is proposing, which is Hooks.

    For your use case a Python operator sounds like the best fit. A hook is
    going to have to target the lowest common denominator, which means
    vault-specific things are just a needless layer over the top.

    Extending the existing Secrets Backend interface to support that is
    doable, but I don't see the need for a Hook. Not everything needs to be
    a hook :)

    -ash


    On May 18 2020, at 4:41 pm, Nathan Hadfield <Na...@king.com> wrote:

    > Hey,
    >  
    >  
    >  
    > My quick two cents are that it would be good to access secrets that
    > are not explicitly either connections or variables
    >  
    >  
    >  
    > We have a need for DAGs that feature more complex interactions with
    > Vault - which typically end up being custom operators - that I think
    > would be helped by more generic capabilities.
    >  
    >  
    >  
    > For example, we have an automated system that regularly rotates GCP
    > service accounts across the whole company and stores them in Vault.  
    > We then have to ensure that our different Looker environments always
    > have these SAs before the old ones expire every 48 hours.  To do this,
    > we wrote a Vault Hook and a Looker Hook and them combine them in an
    > operator which would read every SA from a specific Vault path and then
    > update the connection inside Looker.
    >  
    >  
    >  
    > I don’t know if this will influence your thinking in any way but just
    > wanted to briefly share our experiences.  If anyone would like to
    > learn more then please reach out and I’d be happy to share more.
    >  
    >  
    >  
    > Cheers,
    >  
    > Nathan
    >  
    >  
    >  
    > On 18/05/2020, 15:21, "Ash Berlin-Taylor" <as...@apache.org> wrote:
    >  
    >  
    >  
    >  
    >  
    >    > The good thing with it is that you could have easily multiple secret
    >  
    >    > backends configured to retrieve secrets for specific "service" (so
    >  
    >    > that you
    >  
    >    > could keep "generic airflow's secerts" in one backend but still have
    >  
    >    > possibility of custom operators to use other backends (with different
    >  
    >    > authentication, scopes etc.).
    >  
    >  
    >  
    >    Having the ability to configure multiple secrets backends is independent
    >  
    >    of this feature. The original PR/AIP to add Secrets Backends
    > decided to
    >  
    >    leave this ability out as it was more complex to configure. We
    > could add
    >  
    >    that back in.
    >  
    >  
    >  
    >    I still don't quite get from your example where you are proposing this
    >  
    >    would be used? Can you give a fuller example please? Do you have a
    >  
    >    concrete use case where you need this?
    >  
    >  
    >  
    >    Not everything in Airflow needs to be a hook; just access the secrets
    >  
    >    backend directly. I'm not sure what wrapping an extra layer around these
    >  
    >    classes gives us?
    >  
    >  
    >  
    >    Without a concrete example I can't see anything other than this
    > adds a
    >  
    >    lot of complexity.
    >  
    >  
    >  
    >    -ash
    >  
    >  
    >  
    >  
    >  
    >    On May 18 2020, at 2:45 pm, Jarek Potiuk <Ja...@polidea.com> wrote:
    >  
    >  
    >  
    >    > Hello Everyone,
    >  
    >    >
    >  
    >    > TL;DR; I was just about to start to work on a small set of Hooks -
    >  
    >    > dedicated to retrieving screts from the Secret Backend. I
    > discussed it
    >  
    >    > with Ash
    >  
    >    > and Kamil
    >  
    >    >
    >  
    >    
    > <https://urldefense.proofpoint.com/v2/url?u=https-3A__apache-2Dairflow.slack.com_archives_C0145R4NPS5_p1589805908013700&d=DwICaQ&c=-0jfte1J3SKEE6FyZmTngg&r=cgex0jmJ1tJ3A5nVgQ7Pjo7sdo3NkXzIHPolJPlCwBw&m=AIc65Hls2sR87-APqi_0oh4N0F-NOUCC2ulfyS04GGU&s=NBBItsFcPZR-C26VepQEehBPNPEWUsxar_DatX5ulco&e=
    > > on
    >  
    >    > Slack today. So far I thought I treat them as usual providers,
    > but Ash
    >  
    >    > raised some valid concenrs. so I wanted to raise teh proposal
    > before I
    >  
    >    > start working on it/
    >  
    >    >
    >  
    >    > *Context:*
    >  
    >    >
    >  
    >    > Currently we have "Secret Backend" support built in in 2.0 and
    >  
    >    > 1.10.10+. It
    >  
    >    > includes retrieving the variable and connections (via Secret
    > Manager class)
    >  
    >    > for:
    >  
    >    >
    >  
    >    >   -  Hashicorp Vault
    >  
    >    >   -  Secret Manager
    >  
    >    >   -  KMS
    >  
    >    >   -  AWS secret manager
    >  
    >    >
    >  
    >    > Those secret managers are configured in:
    >  
    >    >
    >  
    >    > [secret]
    >  
    >    > backend=<SecretManagerClass>
    >  
    >    > backend_kwargs={}
    >  
    >    >
    >  
    >    > Those are available for use in a nice way (via Jinja templates
    > and the
    >  
    >    > like), but they need support in the Core of Airlfow (so require 1.10.10+).
    >  
    >    > This means that if you are on pre 1.10.10 you cannot use those secrets.
    >  
    >    > Currently you can only use one secret per whole Airflow installation
    >  
    >    > so if
    >  
    >    > your secrets are split between several secret managers (or if
    > secrets for
    >  
    >    > particular service require different credentials) - you cannot
    > use the
    >  
    >    > mechanism to access such distributed secrets. It's not often
    > case, but I
    >  
    >    > very well imagine it might happen that there are different sets of
    >  
    >    > credentials to access different secrets - some services might have
    >  
    >    > different scopes/level of access needed. .
    >  
    >    >
    >  
    >    > *Proposal*
    >  
    >    >
    >  
    >    > We have an idea that we might want also (on top of the above SecretManager
    >  
    >    > implementation) define generic Hooks for accessing secrets from those
    >  
    >    > services (just generic secrets, not connection, variables).
    > Simply treat
    >  
    >    > each of the backends above as another "provider" and create a
    > Hook to
    >  
    >    > access the service. Such Hook could have just one method:
    >  
    >    >
    >  
    >    > def get_secret(self, path_prefix: str, secret_id: str) -> Optional[str]
    >  
    >    >
    >  
    >    > It would use a connection defined (as usual) in ENV variables or database
    >  
    >    > of Airflow to authenticate with the secret service and retrieve the
    >  
    >    > secrets.
    >  
    >    >
    >  
    >    > The good thing with it is that you could have easily multiple secret
    >  
    >    > backends configured to retrieve secrets for specific "service" (so
    >  
    >    > that you
    >  
    >    > could keep "generic airflow's secerts" in one backend but still have
    >  
    >    > possibility of custom operators to use other backends (with different
    >  
    >    > authentication,  scopes etc.). And it is not touching any of the
    >  
    >    > "core" of
    >  
    >    > Airflow. It's just a set of hooks with corresponding connections
    > that work
    >  
    >    > the same way as accessing any other provider in Airflow. No core
    > of Airflow
    >  
    >    > will be touched with this change.
    >  
    >    >
    >  
    >    > *Pros/Cons*
    >  
    >    >
    >  
    >    > *Con:*
    >  
    >    >
    >  
    >    > I do realise it is a bit of duplication in functionality. We already
    >  
    >    > have a
    >  
    >    > way to connect to a secret backend via airflow configuration and
    > we should
    >  
    >    > likely promote it rather than introduce additional mechanism.
    >  
    >    >
    >  
    >    > *Pros:*
    >  
    >    >
    >  
    >    > * Most of all -> it adds flexibility of accessing several secret backends
    >  
    >    > for different use-cases. I looked at it so far in the way those
    > hooks are
    >  
    >    > merely another set of "provider hooks". For me this is nothing different
    >  
    >    > than "providers" for any other services we have.  fFr example "cloudant"
    >  
    >    > provider has only "CloudantHook" that other custom operators can use.
    >  
    >    > And I
    >  
    >    > well imagine this might be actually even more convenient to configure
    >  
    >    > connections in the DB and access secrets this way rather than
    > having to
    >  
    >    > configure Secret Backends in Airflow configuration.
    >  
    >    >
    >  
    >    > * The dupication there it is very, very limited (basically a method
    >  
    >    > call to
    >  
    >    > secret backend).
    >  
    >    >
    >  
    >    > * Another benefit of it is that it would allow people still stuck
    > on pre
    >  
    >    > 1.10.10 to  write custom operators that would like to use secret backends
    >  
    >    > (via backport operators). And still continue doing it in the future
    >  
    >    > (possibly migrating to 2.0/1.10.10+ in cases when there is one secret
    >  
    >    > backed only - but continue ot use connections/hooks where some specific
    >  
    >    > secrets shoudl be kept in different secret backend.
    >  
    >    >
    >  
    >    > I would like to hear your opinion on that.
    >  
    >    >
    >  
    >    > J.
    >  
    >    >
    >  
    >    > --
    >  
    >    >
    >  
    >    > Jarek Potiuk
    >  
    >    > Polidea
    > <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.polidea.com_&d=DwICaQ&c=-0jfte1J3SKEE6FyZmTngg&r=cgex0jmJ1tJ3A5nVgQ7Pjo7sdo3NkXzIHPolJPlCwBw&m=AIc65Hls2sR87-APqi_0oh4N0F-NOUCC2ulfyS04GGU&s=r-sJroKu0X4XYmnboaHwbpbEIgk5TLwTErkDtwEgvog&e=
    > > | Principal Software Engineer
    >  
    >    >
    >  
    >    > M: +48 660 796 129 <+48660796129>
    >  
    >    > [image: Polidea]
    > <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.polidea.com_&d=DwICaQ&c=-0jfte1J3SKEE6FyZmTngg&r=cgex0jmJ1tJ3A5nVgQ7Pjo7sdo3NkXzIHPolJPlCwBw&m=AIc65Hls2sR87-APqi_0oh4N0F-NOUCC2ulfyS04GGU&s=r-sJroKu0X4XYmnboaHwbpbEIgk5TLwTErkDtwEgvog&e= >
    >  
    >    >
    > 


Re: [PROPOSAL] Secret Backend Hooks

Posted by Ash Berlin-Taylor <as...@apache.org>.
Accessing things that aren't connections or variables is, essentially
creating a third class of thing that Secrets store.

But that is a separate issue to what Jarek is proposing, which is Hooks.

For your use case a Python operator sounds like the best fit. A hook is
going to have to target the lowest common denominator, which means
vault-specific things are just a needless layer over the top.

Extending the existing Secrets Backend interface to support that is
doable, but I don't see the need for a Hook. Not everything needs to be
a hook :)

-ash


On May 18 2020, at 4:41 pm, Nathan Hadfield <Na...@king.com> wrote:

> Hey,
>  
>  
>  
> My quick two cents are that it would be good to access secrets that
> are not explicitly either connections or variables
>  
>  
>  
> We have a need for DAGs that feature more complex interactions with
> Vault - which typically end up being custom operators - that I think
> would be helped by more generic capabilities.
>  
>  
>  
> For example, we have an automated system that regularly rotates GCP
> service accounts across the whole company and stores them in Vault.  
> We then have to ensure that our different Looker environments always
> have these SAs before the old ones expire every 48 hours.  To do this,
> we wrote a Vault Hook and a Looker Hook and them combine them in an
> operator which would read every SA from a specific Vault path and then
> update the connection inside Looker.
>  
>  
>  
> I don’t know if this will influence your thinking in any way but just
> wanted to briefly share our experiences.  If anyone would like to
> learn more then please reach out and I’d be happy to share more.
>  
>  
>  
> Cheers,
>  
> Nathan
>  
>  
>  
> On 18/05/2020, 15:21, "Ash Berlin-Taylor" <as...@apache.org> wrote:
>  
>  
>  
>  
>  
>    > The good thing with it is that you could have easily multiple secret
>  
>    > backends configured to retrieve secrets for specific "service" (so
>  
>    > that you
>  
>    > could keep "generic airflow's secerts" in one backend but still have
>  
>    > possibility of custom operators to use other backends (with different
>  
>    > authentication, scopes etc.).
>  
>  
>  
>    Having the ability to configure multiple secrets backends is independent
>  
>    of this feature. The original PR/AIP to add Secrets Backends
> decided to
>  
>    leave this ability out as it was more complex to configure. We
> could add
>  
>    that back in.
>  
>  
>  
>    I still don't quite get from your example where you are proposing this
>  
>    would be used? Can you give a fuller example please? Do you have a
>  
>    concrete use case where you need this?
>  
>  
>  
>    Not everything in Airflow needs to be a hook; just access the secrets
>  
>    backend directly. I'm not sure what wrapping an extra layer around these
>  
>    classes gives us?
>  
>  
>  
>    Without a concrete example I can't see anything other than this
> adds a
>  
>    lot of complexity.
>  
>  
>  
>    -ash
>  
>  
>  
>  
>  
>    On May 18 2020, at 2:45 pm, Jarek Potiuk <Ja...@polidea.com> wrote:
>  
>  
>  
>    > Hello Everyone,
>  
>    >
>  
>    > TL;DR; I was just about to start to work on a small set of Hooks -
>  
>    > dedicated to retrieving screts from the Secret Backend. I
> discussed it
>  
>    > with Ash
>  
>    > and Kamil
>  
>    >
>  
>    
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__apache-2Dairflow.slack.com_archives_C0145R4NPS5_p1589805908013700&d=DwICaQ&c=-0jfte1J3SKEE6FyZmTngg&r=cgex0jmJ1tJ3A5nVgQ7Pjo7sdo3NkXzIHPolJPlCwBw&m=AIc65Hls2sR87-APqi_0oh4N0F-NOUCC2ulfyS04GGU&s=NBBItsFcPZR-C26VepQEehBPNPEWUsxar_DatX5ulco&e=
> > on
>  
>    > Slack today. So far I thought I treat them as usual providers,
> but Ash
>  
>    > raised some valid concenrs. so I wanted to raise teh proposal
> before I
>  
>    > start working on it/
>  
>    >
>  
>    > *Context:*
>  
>    >
>  
>    > Currently we have "Secret Backend" support built in in 2.0 and
>  
>    > 1.10.10+. It
>  
>    > includes retrieving the variable and connections (via Secret
> Manager class)
>  
>    > for:
>  
>    >
>  
>    >   -  Hashicorp Vault
>  
>    >   -  Secret Manager
>  
>    >   -  KMS
>  
>    >   -  AWS secret manager
>  
>    >
>  
>    > Those secret managers are configured in:
>  
>    >
>  
>    > [secret]
>  
>    > backend=<SecretManagerClass>
>  
>    > backend_kwargs={}
>  
>    >
>  
>    > Those are available for use in a nice way (via Jinja templates
> and the
>  
>    > like), but they need support in the Core of Airlfow (so require 1.10.10+).
>  
>    > This means that if you are on pre 1.10.10 you cannot use those secrets.
>  
>    > Currently you can only use one secret per whole Airflow installation
>  
>    > so if
>  
>    > your secrets are split between several secret managers (or if
> secrets for
>  
>    > particular service require different credentials) - you cannot
> use the
>  
>    > mechanism to access such distributed secrets. It's not often
> case, but I
>  
>    > very well imagine it might happen that there are different sets of
>  
>    > credentials to access different secrets - some services might have
>  
>    > different scopes/level of access needed. .
>  
>    >
>  
>    > *Proposal*
>  
>    >
>  
>    > We have an idea that we might want also (on top of the above SecretManager
>  
>    > implementation) define generic Hooks for accessing secrets from those
>  
>    > services (just generic secrets, not connection, variables).
> Simply treat
>  
>    > each of the backends above as another "provider" and create a
> Hook to
>  
>    > access the service. Such Hook could have just one method:
>  
>    >
>  
>    > def get_secret(self, path_prefix: str, secret_id: str) -> Optional[str]
>  
>    >
>  
>    > It would use a connection defined (as usual) in ENV variables or database
>  
>    > of Airflow to authenticate with the secret service and retrieve the
>  
>    > secrets.
>  
>    >
>  
>    > The good thing with it is that you could have easily multiple secret
>  
>    > backends configured to retrieve secrets for specific "service" (so
>  
>    > that you
>  
>    > could keep "generic airflow's secerts" in one backend but still have
>  
>    > possibility of custom operators to use other backends (with different
>  
>    > authentication,  scopes etc.). And it is not touching any of the
>  
>    > "core" of
>  
>    > Airflow. It's just a set of hooks with corresponding connections
> that work
>  
>    > the same way as accessing any other provider in Airflow. No core
> of Airflow
>  
>    > will be touched with this change.
>  
>    >
>  
>    > *Pros/Cons*
>  
>    >
>  
>    > *Con:*
>  
>    >
>  
>    > I do realise it is a bit of duplication in functionality. We already
>  
>    > have a
>  
>    > way to connect to a secret backend via airflow configuration and
> we should
>  
>    > likely promote it rather than introduce additional mechanism.
>  
>    >
>  
>    > *Pros:*
>  
>    >
>  
>    > * Most of all -> it adds flexibility of accessing several secret backends
>  
>    > for different use-cases. I looked at it so far in the way those
> hooks are
>  
>    > merely another set of "provider hooks". For me this is nothing different
>  
>    > than "providers" for any other services we have.  fFr example "cloudant"
>  
>    > provider has only "CloudantHook" that other custom operators can use.
>  
>    > And I
>  
>    > well imagine this might be actually even more convenient to configure
>  
>    > connections in the DB and access secrets this way rather than
> having to
>  
>    > configure Secret Backends in Airflow configuration.
>  
>    >
>  
>    > * The dupication there it is very, very limited (basically a method
>  
>    > call to
>  
>    > secret backend).
>  
>    >
>  
>    > * Another benefit of it is that it would allow people still stuck
> on pre
>  
>    > 1.10.10 to  write custom operators that would like to use secret backends
>  
>    > (via backport operators). And still continue doing it in the future
>  
>    > (possibly migrating to 2.0/1.10.10+ in cases when there is one secret
>  
>    > backed only - but continue ot use connections/hooks where some specific
>  
>    > secrets shoudl be kept in different secret backend.
>  
>    >
>  
>    > I would like to hear your opinion on that.
>  
>    >
>  
>    > J.
>  
>    >
>  
>    > --
>  
>    >
>  
>    > Jarek Potiuk
>  
>    > Polidea
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.polidea.com_&d=DwICaQ&c=-0jfte1J3SKEE6FyZmTngg&r=cgex0jmJ1tJ3A5nVgQ7Pjo7sdo3NkXzIHPolJPlCwBw&m=AIc65Hls2sR87-APqi_0oh4N0F-NOUCC2ulfyS04GGU&s=r-sJroKu0X4XYmnboaHwbpbEIgk5TLwTErkDtwEgvog&e=
> > | Principal Software Engineer
>  
>    >
>  
>    > M: +48 660 796 129 <+48660796129>
>  
>    > [image: Polidea]
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.polidea.com_&d=DwICaQ&c=-0jfte1J3SKEE6FyZmTngg&r=cgex0jmJ1tJ3A5nVgQ7Pjo7sdo3NkXzIHPolJPlCwBw&m=AIc65Hls2sR87-APqi_0oh4N0F-NOUCC2ulfyS04GGU&s=r-sJroKu0X4XYmnboaHwbpbEIgk5TLwTErkDtwEgvog&e= >
>  
>    >
> 

Re: [PROPOSAL] Secret Backend Hooks

Posted by Nathan Hadfield <Na...@king.com>.
Hey,



My quick two cents are that it would be good to access secrets that are not explicitly either connections or variables



We have a need for DAGs that feature more complex interactions with Vault - which typically end up being custom operators - that I think would be helped by more generic capabilities.



For example, we have an automated system that regularly rotates GCP service accounts across the whole company and stores them in Vault.  We then have to ensure that our different Looker environments always have these SAs before the old ones expire every 48 hours.  To do this, we wrote a Vault Hook and a Looker Hook and them combine them in an operator which would read every SA from a specific Vault path and then update the connection inside Looker.



I don’t know if this will influence your thinking in any way but just wanted to briefly share our experiences.  If anyone would like to learn more then please reach out and I’d be happy to share more.



Cheers,

Nathan



On 18/05/2020, 15:21, "Ash Berlin-Taylor" <as...@apache.org> wrote:





    > The good thing with it is that you could have easily multiple secret

    > backends configured to retrieve secrets for specific "service" (so

    > that you

    > could keep "generic airflow's secerts" in one backend but still have

    > possibility of custom operators to use other backends (with different

    > authentication, scopes etc.).



    Having the ability to configure multiple secrets backends is independent

    of this feature. The original PR/AIP to add Secrets Backends decided to

    leave this ability out as it was more complex to configure. We could add

    that back in.



    I still don't quite get from your example where you are proposing this

    would be used? Can you give a fuller example please? Do you have a

    concrete use case where you need this?



    Not everything in Airflow needs to be a hook; just access the secrets

    backend directly. I'm not sure what wrapping an extra layer around these

    classes gives us?



    Without a concrete example I can't see anything other than this adds a

    lot of complexity.



    -ash





    On May 18 2020, at 2:45 pm, Jarek Potiuk <Ja...@polidea.com> wrote:



    > Hello Everyone,

    >

    > TL;DR; I was just about to start to work on a small set of Hooks -

    > dedicated to retrieving screts from the Secret Backend. I discussed it

    > with Ash

    > and Kamil

    >

    <https://urldefense.proofpoint.com/v2/url?u=https-3A__apache-2Dairflow.slack.com_archives_C0145R4NPS5_p1589805908013700&d=DwICaQ&c=-0jfte1J3SKEE6FyZmTngg&r=cgex0jmJ1tJ3A5nVgQ7Pjo7sdo3NkXzIHPolJPlCwBw&m=AIc65Hls2sR87-APqi_0oh4N0F-NOUCC2ulfyS04GGU&s=NBBItsFcPZR-C26VepQEehBPNPEWUsxar_DatX5ulco&e= > on

    > Slack today. So far I thought I treat them as usual providers, but Ash

    > raised some valid concenrs. so I wanted to raise teh proposal before I

    > start working on it/

    >

    > *Context:*

    >

    > Currently we have "Secret Backend" support built in in 2.0 and

    > 1.10.10+. It

    > includes retrieving the variable and connections (via Secret Manager class)

    > for:

    >

    >   -  Hashicorp Vault

    >   -  Secret Manager

    >   -  KMS

    >   -  AWS secret manager

    >

    > Those secret managers are configured in:

    >

    > [secret]

    > backend=<SecretManagerClass>

    > backend_kwargs={}

    >

    > Those are available for use in a nice way (via Jinja templates and the

    > like), but they need support in the Core of Airlfow (so require 1.10.10+).

    > This means that if you are on pre 1.10.10 you cannot use those secrets.

    > Currently you can only use one secret per whole Airflow installation

    > so if

    > your secrets are split between several secret managers (or if secrets for

    > particular service require different credentials) - you cannot use the

    > mechanism to access such distributed secrets. It's not often case, but I

    > very well imagine it might happen that there are different sets of

    > credentials to access different secrets - some services might have

    > different scopes/level of access needed. .

    >

    > *Proposal*

    >

    > We have an idea that we might want also (on top of the above SecretManager

    > implementation) define generic Hooks for accessing secrets from those

    > services (just generic secrets, not connection, variables). Simply treat

    > each of the backends above as another "provider" and create a Hook to

    > access the service. Such Hook could have just one method:

    >

    > def get_secret(self, path_prefix: str, secret_id: str) -> Optional[str]

    >

    > It would use a connection defined (as usual) in ENV variables or database

    > of Airflow to authenticate with the secret service and retrieve the

    > secrets.

    >

    > The good thing with it is that you could have easily multiple secret

    > backends configured to retrieve secrets for specific "service" (so

    > that you

    > could keep "generic airflow's secerts" in one backend but still have

    > possibility of custom operators to use other backends (with different

    > authentication,  scopes etc.). And it is not touching any of the

    > "core" of

    > Airflow. It's just a set of hooks with corresponding connections that work

    > the same way as accessing any other provider in Airflow. No core of Airflow

    > will be touched with this change.

    >

    > *Pros/Cons*

    >

    > *Con:*

    >

    > I do realise it is a bit of duplication in functionality. We already

    > have a

    > way to connect to a secret backend via airflow configuration and we should

    > likely promote it rather than introduce additional mechanism.

    >

    > *Pros:*

    >

    > * Most of all -> it adds flexibility of accessing several secret backends

    > for different use-cases. I looked at it so far in the way those hooks are

    > merely another set of "provider hooks". For me this is nothing different

    > than "providers" for any other services we have.  fFr example "cloudant"

    > provider has only "CloudantHook" that other custom operators can use.

    > And I

    > well imagine this might be actually even more convenient to configure

    > connections in the DB and access secrets this way rather than having to

    > configure Secret Backends in Airflow configuration.

    >

    > * The dupication there it is very, very limited (basically a method

    > call to

    > secret backend).

    >

    > * Another benefit of it is that it would allow people still stuck on pre

    > 1.10.10 to  write custom operators that would like to use secret backends

    > (via backport operators). And still continue doing it in the future

    > (possibly migrating to 2.0/1.10.10+ in cases when there is one secret

    > backed only - but continue ot use connections/hooks where some specific

    > secrets shoudl be kept in different secret backend.

    >

    > I would like to hear your opinion on that.

    >

    > J.

    >

    > --

    >

    > Jarek Potiuk

    > Polidea <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.polidea.com_&d=DwICaQ&c=-0jfte1J3SKEE6FyZmTngg&r=cgex0jmJ1tJ3A5nVgQ7Pjo7sdo3NkXzIHPolJPlCwBw&m=AIc65Hls2sR87-APqi_0oh4N0F-NOUCC2ulfyS04GGU&s=r-sJroKu0X4XYmnboaHwbpbEIgk5TLwTErkDtwEgvog&e= > | Principal Software Engineer

    >

    > M: +48 660 796 129 <+48660796129>

    > [image: Polidea] <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.polidea.com_&d=DwICaQ&c=-0jfte1J3SKEE6FyZmTngg&r=cgex0jmJ1tJ3A5nVgQ7Pjo7sdo3NkXzIHPolJPlCwBw&m=AIc65Hls2sR87-APqi_0oh4N0F-NOUCC2ulfyS04GGU&s=r-sJroKu0X4XYmnboaHwbpbEIgk5TLwTErkDtwEgvog&e= >

    >

Re: [PROPOSAL] Secret Backend Hooks

Posted by Ash Berlin-Taylor <as...@apache.org>.
> The good thing with it is that you could have easily multiple secret
> backends configured to retrieve secrets for specific "service" (so
> that you
> could keep "generic airflow's secerts" in one backend but still have
> possibility of custom operators to use other backends (with different
> authentication, scopes etc.).

Having the ability to configure multiple secrets backends is independent
of this feature. The original PR/AIP to add Secrets Backends decided to
leave this ability out as it was more complex to configure. We could add
that back in.

I still don't quite get from your example where you are proposing this
would be used? Can you give a fuller example please? Do you have a
concrete use case where you need this?

Not everything in Airflow needs to be a hook; just access the secrets
backend directly. I'm not sure what wrapping an extra layer around these
classes gives us?

Without a concrete example I can't see anything other than this adds a
lot of complexity.

-ash


On May 18 2020, at 2:45 pm, Jarek Potiuk <Ja...@polidea.com> wrote:

> Hello Everyone,
> 
> TL;DR; I was just about to start to work on a small set of Hooks -
> dedicated to retrieving screts from the Secret Backend. I discussed it
> with Ash
> and Kamil
>
<https://apache-airflow.slack.com/archives/C0145R4NPS5/p1589805908013700> on
> Slack today. So far I thought I treat them as usual providers, but Ash
> raised some valid concenrs. so I wanted to raise teh proposal before I
> start working on it/
> 
> *Context:*
> 
> Currently we have "Secret Backend" support built in in 2.0 and
> 1.10.10+. It
> includes retrieving the variable and connections (via Secret Manager class)
> for:
> 
>   -  Hashicorp Vault
>   -  Secret Manager
>   -  KMS
>   -  AWS secret manager
> 
> Those secret managers are configured in:
> 
> [secret]
> backend=<SecretManagerClass>
> backend_kwargs={}
> 
> Those are available for use in a nice way (via Jinja templates and the
> like), but they need support in the Core of Airlfow (so require 1.10.10+).
> This means that if you are on pre 1.10.10 you cannot use those secrets.
> Currently you can only use one secret per whole Airflow installation
> so if
> your secrets are split between several secret managers (or if secrets for
> particular service require different credentials) - you cannot use the
> mechanism to access such distributed secrets. It's not often case, but I
> very well imagine it might happen that there are different sets of
> credentials to access different secrets - some services might have
> different scopes/level of access needed. .
> 
> *Proposal*
> 
> We have an idea that we might want also (on top of the above SecretManager
> implementation) define generic Hooks for accessing secrets from those
> services (just generic secrets, not connection, variables). Simply treat
> each of the backends above as another "provider" and create a Hook to
> access the service. Such Hook could have just one method:
> 
> def get_secret(self, path_prefix: str, secret_id: str) -> Optional[str]
> 
> It would use a connection defined (as usual) in ENV variables or database
> of Airflow to authenticate with the secret service and retrieve the
> secrets.
> 
> The good thing with it is that you could have easily multiple secret
> backends configured to retrieve secrets for specific "service" (so
> that you
> could keep "generic airflow's secerts" in one backend but still have
> possibility of custom operators to use other backends (with different
> authentication,  scopes etc.). And it is not touching any of the
> "core" of
> Airflow. It's just a set of hooks with corresponding connections that work
> the same way as accessing any other provider in Airflow. No core of Airflow
> will be touched with this change.
> 
> *Pros/Cons*
> 
> *Con:*
> 
> I do realise it is a bit of duplication in functionality. We already
> have a
> way to connect to a secret backend via airflow configuration and we should
> likely promote it rather than introduce additional mechanism.
> 
> *Pros:*
> 
> * Most of all -> it adds flexibility of accessing several secret backends
> for different use-cases. I looked at it so far in the way those hooks are
> merely another set of "provider hooks". For me this is nothing different
> than "providers" for any other services we have.  fFr example "cloudant"
> provider has only "CloudantHook" that other custom operators can use.
> And I
> well imagine this might be actually even more convenient to configure
> connections in the DB and access secrets this way rather than having to
> configure Secret Backends in Airflow configuration.
> 
> * The dupication there it is very, very limited (basically a method
> call to
> secret backend).
> 
> * Another benefit of it is that it would allow people still stuck on pre
> 1.10.10 to  write custom operators that would like to use secret backends
> (via backport operators). And still continue doing it in the future
> (possibly migrating to 2.0/1.10.10+ in cases when there is one secret
> backed only - but continue ot use connections/hooks where some specific
> secrets shoudl be kept in different secret backend.
> 
> I would like to hear your opinion on that.
> 
> J.
> 
> -- 
> 
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
> 
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>