You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/05/10 03:20:52 UTC

[GitHub] [airflow] amithmathew opened a new issue #8803: Impersonate service accounts while running GCP Operators

amithmathew opened a new issue #8803:
URL: https://github.com/apache/airflow/issues/8803


   **Description**
   Allow running Google Cloud operators using Service Accounts, without having to provide key material while running on GCP. If the Compute instance Service Accounts on which Airflow is running have been granted "Service Account Token Creator" role on the target Service Account with which I want to run my operator, I do not need to download, or provide any key material for the impersonation to happen. This is a much more secure way to impersonate service accounts.
   
   **Use case / motivation**
   
   Allow running Google Cloud operators using Service Accounts, without having to provide key material while running on GCP. If the Compute instance Service Accounts on which Airflow is running have been granted "Service Account Token Creator" role on the target Service Account with which I want to run my operator, I do not need to download, or provide any key material for the impersonation to happen. This is a much more secure way to impersonate service accounts.
   
   https://github.com/googleapis/google-auth-library-python/blob/master/docs/user-guide.rst#impersonated-credentials
   
   **Related Issues**
   
   None
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj closed issue #8803: Impersonate service accounts while running GCP Operators without key material (if airflow is running on GCP)

Posted by GitBox <gi...@apache.org>.
mik-laj closed issue #8803:
URL: https://github.com/apache/airflow/issues/8803


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ad-m commented on issue #8803: Impersonate service accounts while running GCP Operators without key material (if airflow is running on GCP)

Posted by GitBox <gi...@apache.org>.
ad-m commented on issue #8803:
URL: https://github.com/apache/airflow/issues/8803#issuecomment-659326019


   Dear @olchas, I will be very happy to support you to implement this in an effective way. I have spent hours analyzing these kinds of mechanisms at different cloud providers. I do not have deep knowledge about Airflow, but @mik-laj is on the issue and he will surely support us with his excellent knowledge about Airflow.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj edited a comment on issue #8803: Impersonate service accounts while running GCP Operators without key material (if airflow is running on GCP)

Posted by GitBox <gi...@apache.org>.
mik-laj edited a comment on issue #8803:
URL: https://github.com/apache/airflow/issues/8803#issuecomment-660222575


   @olchas It seems to me that we should define it at the task level.  From the user's point of view, this should be as easy to use as in gcloud.
   ```bash
   gcloud \
   --account=kamil.bregula@polidea.com \
   --impersonate-service-account=test-kamil@polidea-airflow.iam.gserviceaccount.com \
   auth print-access-token
   ```
   There is only one difference. Instead of using the `--account` option, we have `gcp_conn_id`.
   
   If you want to play around with it then you can use the script below.
   ```bash
   MAIN_ACCOUNT="kamil.bregula@polidea.com"
   SECONDARY_ACCOUNT="test-kamil@polidea-airflow.iam.gserviceaccount.com"
   
   ACCESS_TOKEN="$(gcloud \
       --account=${MAIN_ACCOUNT} \
       auth print-access-token)"
   curl -q "https://www.googleapis.com/oauth2/v3/tokeninfo?access_token=${ACCESS_TOKEN}"
   
   ACCESS_TOKEN="$(gcloud \
       --account=${MAIN_ACCOUNT} \
       --impersonate-service-account=${SECONDARY_ACCOUNT} \
       auth print-access-token)"
   curl -q "https://www.googleapis.com/oauth2/v3/tokeninfo?access_token=${ACCESS_TOKEN}"
   ```
   
   Remember that you need to have the appropriate permissions to use this feature
   * The main account has access to the secondary account. You can set-up it in the permissions of the secondary account.
   * The main account has "roles/iam.serviceAccountTokenCreator" role.
   
   If you are messing around with tokens in gcloud then you might want to enable the options below as well, which will allow you to better understand the flow.
   ```
   gcloud config set core/log_http true
   gcloud config set core/log_http_redact_token = false
   ```
   Please note that the second option is not described in the public documentation, so be careful.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] olchas edited a comment on issue #8803: Impersonate service accounts while running GCP Operators without key material (if airflow is running on GCP)

Posted by GitBox <gi...@apache.org>.
olchas edited a comment on issue #8803:
URL: https://github.com/apache/airflow/issues/8803#issuecomment-659540464


   @mik-laj sure. Could you assign me to this issue, please?
   
   @ad-m, thanks for the offer. I am uncertain about supporting impersonation via environment variable. As far as I know, there is no mechanism to provide a value of environment variable specifically for a single Task Instance, so with this approach all Task Instances would be impersonating the same account (for example, GOOGLE_APPLICATION_CREDENTIALS is used only if no other account details have been provided in gcp connection).
   
   I was looking into the [GoogleBaseHook](https://github.com/apache/airflow/blob/master/airflow/providers/google/common/hooks/base_google.py#L125) implementation and I was thinking about specifying the chain of accounts to impersonate in the `extras` field of connection used by the hook, similarly to how other fields, like `scopes` or `key_path`, are provided. This way we would avoid the necessity to modify every hook derived from GoogleBaseHook and operators that are using them, as the information would be provided in the connection, not in operator definition. I guess we could follow the same solution in hooks dedicated for other cloud providers but I haven't looked at them yet.
   
   However, this would still require a separate connection for every impersonated account, even if all of them were using the same service account as a source. You would not have to generate or rotate keys for impersonated accounts, but with a lot of accounts being impersonated by the same source account, this would put the effort to keep connections consistent on team managing airflow.
   
   WDYT about this approach?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ad-m commented on issue #8803: Impersonate service accounts while running GCP Operators without key material (if airflow is running on GCP)

Posted by GitBox <gi...@apache.org>.
ad-m commented on issue #8803:
URL: https://github.com/apache/airflow/issues/8803#issuecomment-651373150


   I like this idea, which I already discuss regarding API flow in detail with @mik-laj in a long conversation. Similar solutions are currently available from other cloud provider, among others AssumeRole at AWS, TokenExchange at HyperOne. I notice that when designing this element, interoperability should be strongly taken into account, because - if you go to the extreme - we will lead Airflow to work effectively in GCP only.
   
   I'm thinking of a user interface that needs to be developed to integrate impersonate with Apache Airflow. In this issue, I have the following questions about GCP, which seem to me crucial for designing the appropriate change in Apache Airflow:
   
   * What projects currently support "impersonate" in GCP ecosystem? How they solved user interface?
   * Is support for "impersonate" planned by an environment variable similar to GOOGLE_APPLICATION_CREDENTIALS? Amazon uses the AWS_ROLE_ARN environment variable already.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #8803: Impersonate service accounts while running GCP Operators without key material (if airflow is running on GCP)

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #8803:
URL: https://github.com/apache/airflow/issues/8803#issuecomment-649715399


   Unless I am mistaken account impersonation does not essentially solve any of those problems you mentioned (like key rotation or key management) because the main service account that you have can do anything via impersonation - and you continue having access to this account. So my first thought is that there is no added value in using impersonation for the purpose you described.
   
   For example if someone steals the main "service account" credentials, that someone can still impersonate any of the other service accounts and do whatever those service accounts can do. You still have to manage the main service account key I believe and rotate it, and additionally you do not have separate access for each key, instead you have one "uber" service account that can impersonate any other service account and do everything. Which is not a good idea I think.
   
   But maybe I do not fully understand what exactly you want to achieve and how this all plays with different roles you have in mind (like admin/dag user etc.) - I'd love to understand more from you and maybe see some diagram (? not sure if I can ask for it) where you would show how the key management and service account structure would look like?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on issue #8803: Impersonate service accounts while running GCP Operators without key material (if airflow is running on GCP)

Posted by GitBox <gi...@apache.org>.
mik-laj commented on issue #8803:
URL: https://github.com/apache/airflow/issues/8803#issuecomment-679079422


   @amithmathew  We are still working on the documentation, but could you please have a look if the current implementation looks good for you?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj edited a comment on issue #8803: Impersonate service accounts while running GCP Operators without key material (if airflow is running on GCP)

Posted by GitBox <gi...@apache.org>.
mik-laj edited a comment on issue #8803:
URL: https://github.com/apache/airflow/issues/8803#issuecomment-660222575


   @olchas It seems to me that we should define it at the task level.  From the user's point of view, this should be as easy to use as in gcloud.
   ```bash
   gcloud \
   --account=kamil.bregula@polidea.com \
   --impersonate-service-account=test-kamil@polidea-airflow.iam.gserviceaccount.com \
   auth print-access-token
   ```
   There is only one difference. Instead of using the `--account` option, we have `gcp_conn_id`.
   
   If you want to play around with it then you can use the script below.
   ```bash
   MAIN_ACCOUNT="kamil.bregula@polidea.com"
   SECONDARY_ACCOUNT="test-kamil@polidea-airflow.iam.gserviceaccount.com"
   
   ACCESS_TOKEN="$(gcloud \
       --account=${MAIN_ACCOUNT} \
       auth print-access-token)"
   curl -q "https://www.googleapis.com/oauth2/v3/tokeninfo?access_token=${ACCESS_TOKEN}"
   
   ACCESS_TOKEN="$(gcloud \
       --account=${MAIN_ACCOUNT} \
       --impersonate-service-account=${SECONDARY_ACCOUNT} \
       auth print-access-token)"
   curl -q "https://www.googleapis.com/oauth2/v3/tokeninfo?access_token=${ACCESS_TOKEN}"
   ```
   
   Remember that you need to have the appropriate permissions to use this feature
   * The main account has access to the secondary account. You can set-up it in the permissions of the secondary account.
   * The main account has "roles/iam.serviceAccountTokenCreator" role.
   
   If you are using gcloud then you might want to enable the options below as well, which will allow you to better understand the flow.
   ```bash
   gcloud config set core/log_http true
   gcloud config set core/log_http_redact_token false
   ```
   Please note that the second option is not described in the public documentation, so be careful.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj edited a comment on issue #8803: Impersonate service accounts while running GCP Operators without key material (if airflow is running on GCP)

Posted by GitBox <gi...@apache.org>.
mik-laj edited a comment on issue #8803:
URL: https://github.com/apache/airflow/issues/8803#issuecomment-660222575


   @olchas It seems to me that we should define it at the task level.  From the user's point of view, this should be as easy to use as in gcloud.
   ```bash
   gcloud \
   --account=kamil.bregula@polidea.com \
   --impersonate-service-account=test-kamil@polidea-airflow.iam.gserviceaccount.com \
   auth print-access-token
   ```
   There is only one difference. Instead of using the `--account` option, we have `gcp_conn_id`.
   
   If you want to play around with it then you can use the script below.
   ```bash
   MAIN_ACCOUNT="kamil.bregula@polidea.com"
   SECONDARY_ACCOUNT="test-kamil@polidea-airflow.iam.gserviceaccount.com"
   
   ACCESS_TOKEN="$(gcloud \
       --account=${MAIN_ACCOUNT} \
       auth print-access-token)"
   curl -q "https://www.googleapis.com/oauth2/v3/tokeninfo?access_token=${ACCESS_TOKEN}"
   
   ACCESS_TOKEN="$(gcloud \
       --account=${MAIN_ACCOUNT} \
       --impersonate-service-account=${SECONDARY_ACCOUNT} \
       auth print-access-token)"
   curl -q "https://www.googleapis.com/oauth2/v3/tokeninfo?access_token=${ACCESS_TOKEN}"
   ```
   
   Remember that you need to have the appropriate permissions to use this feature
   * The main account has access to the secondary account. You set it up in the permissions of the secondary account.
   * The main account has "roles/iam.serviceAccountTokenCreator" role.
   
   If you are messing around with tokens in gcloud then you might want to enable the options below as well, which will allow you to better understand the flow.
   ```
   gcloud config set core/log_http true
   gcloud config set core/log_http_redact_token = false
   ```
   Please note that the second option is not described in the public documentation, so be careful.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #8803: Impersonate service accounts while running GCP Operators without key material (if airflow is running on GCP)

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #8803:
URL: https://github.com/apache/airflow/issues/8803#issuecomment-649739739


   Crystal-clear! Thank you. It does look reasonable. I will circle it back with a few people to see what they think and come back to you  !


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on issue #8803: Impersonate service accounts while running GCP Operators without key material (if airflow is running on GCP)

Posted by GitBox <gi...@apache.org>.
mik-laj commented on issue #8803:
URL: https://github.com/apache/airflow/issues/8803#issuecomment-659320792


   @olchas] Do you want to work on it? It looks like this is a task for you.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] amithmathew commented on issue #8803: Impersonate service accounts while running GCP Operators without key material (if airflow is running on GCP)

Posted by GitBox <gi...@apache.org>.
amithmathew commented on issue #8803:
URL: https://github.com/apache/airflow/issues/8803#issuecomment-679092155


   Will do, will take a look at it this week.
   
   > On Aug 24, 2020, at 7:49 AM, Kamil Breguła <no...@github.com> wrote:
   > 
   > 
   > @amithmathew We are still working on the documentation, but could you please have a look if the current implementation looks good for you?
   > 
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub, or unsubscribe.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] amithmathew commented on issue #8803: Impersonate service accounts while running GCP Operators without key material (if airflow is running on GCP)

Posted by GitBox <gi...@apache.org>.
amithmathew commented on issue #8803:
URL: https://github.com/apache/airflow/issues/8803#issuecomment-649709601


   Thanks Kamil. I don't this this statement is accurate in the context I was proposing - "The implementation of this feature in the current Airflow architecture meant that DAG or the operator could access the access key or service account file that allows you to log in to any other account. This is unacceptable."
   
   The account used by the airflow worker can impersonate another service account only if granted the appropriate permissions through IAM, so I don't think you can log into *any* account. Secondly, even when using a secrets backend, your DAGs still have permissions to access any secret in there (unless I'm missing something).
   
   [This](https://cloud.google.com/iam/docs/understanding-service-accounts#directly_impersonating_a_service_account) is the relevant documentation.
   
   Not having to deal with key material allows for -
    1: Do not have to deal with key rotations.
    2: When airflow operations is centralized in an organization, eliminate any coordination required for key management and transfer for setup - everything is controlled through IAM.
    3: Controlling IAM access through terraform becomes easier, no key generation, transfer or load required.
   
   I may be missing something, of course!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #8803: Impersonate service accounts while running GCP Operators

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #8803:
URL: https://github.com/apache/airflow/issues/8803#issuecomment-626266912


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] olchas commented on issue #8803: Impersonate service accounts while running GCP Operators without key material (if airflow is running on GCP)

Posted by GitBox <gi...@apache.org>.
olchas commented on issue #8803:
URL: https://github.com/apache/airflow/issues/8803#issuecomment-659540464


   @mik-laj sure. Could you assign me to this issue, please?
   
   @ad-m, thanks for the offer. I am uncertain about supporting impersonation via environment variable. As far as I know, there is no mechanism to provide a value of environment variable specifically for a single Task Instance, so with this approach all Task Instances would be impersonating the same account (for example, GOOGLE_APPLICATION_CREDENTIALS is used only if no other account details have been provided in gcp connection).
   
   I was looking into the [GoogleBaseHook](https://github.com/apache/airflow/blob/master/airflow/providers/google/common/hooks/base_google.py#L125) implementation and I was thinking about specifying the chain of accounts to impersonate in the `extras` field of connection used by the hook, similarly to how other fields, like `scopes` or `key_path`, are provided. This way we would avoid the necessity to modify every hook derived from GoogleBaseHook and operators that are using them, as the information would be provided in the connection, not in operator definition. I guess we could follow the same solution in hooks dedicated for other cloud providers but I haven't looked at them yet.
   
   However, this would still require a separate connection for every impersonated account, even if all of them were using the same service account as a source. You would not have to generate or rotate keys for impersonated accounts, but with a lot of accounts being impersonated by the same source account, this would put the effort to keep connections consistent on team managing airflow.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on issue #8803: Impersonate service accounts while running GCP Operators without key material (if airflow is running on GCP)

Posted by GitBox <gi...@apache.org>.
mik-laj commented on issue #8803:
URL: https://github.com/apache/airflow/issues/8803#issuecomment-701334321


   @olchas Is is done? Is there anything else to do that is not described in #10596?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] olchas commented on issue #8803: Impersonate service accounts while running GCP Operators without key material (if airflow is running on GCP)

Posted by GitBox <gi...@apache.org>.
olchas commented on issue #8803:
URL: https://github.com/apache/airflow/issues/8803#issuecomment-720529569


   Hi, @mik-laj, sorry for not responding. I guess one more thing not covered in https://github.com/apache/airflow/issues/10596 might be adding an example dag showing the usage of impersonation.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj edited a comment on issue #8803: Impersonate service accounts while running GCP Operators without key material (if airflow is running on GCP)

Posted by GitBox <gi...@apache.org>.
mik-laj edited a comment on issue #8803:
URL: https://github.com/apache/airflow/issues/8803#issuecomment-660222575


   @olchas It seems to me that we should define it at the task level.  From the user's point of view, this should be as easy to use as in gcloud.
   ```bash
   gcloud \
   --account=kamil.bregula@polidea.com \
   --impersonate-service-account=test-kamil@polidea-airflow.iam.gserviceaccount.com \
   auth print-access-token
   ```
   There is only one difference. Instead of using the `--account` option, we have `gcp_conn_id`.
   
   If you want to play around with it then you can use the script below.
   ```bash
   MAIN_ACCOUNT="kamil.bregula@polidea.com"
   SECONDARY_ACCOUNT="test-kamil@polidea-airflow.iam.gserviceaccount.com"
   
   ACCESS_TOKEN="$(gcloud \
       --account=${MAIN_ACCOUNT} \
       auth print-access-token)"
   curl -q "https://www.googleapis.com/oauth2/v3/tokeninfo?access_token=${ACCESS_TOKEN}"
   
   ACCESS_TOKEN="$(gcloud \
       --account=${MAIN_ACCOUNT} \
       --impersonate-service-account=${SECONDARY_ACCOUNT} \
       auth print-access-token)"
   curl -q "https://www.googleapis.com/oauth2/v3/tokeninfo?access_token=${ACCESS_TOKEN}"
   ```
   
   Remember that you need to have the appropriate permissions to use this feature
   * The main account has access to the secondary account. You can set-up it in the permissions of the secondary account.
   * The main account has "roles/iam.serviceAccountTokenCreator" role.
   
   If you are using gcloud then you might want to enable the options below as well, which will allow you to better understand the flow.
   ```
   gcloud config set core/log_http true
   gcloud config set core/log_http_redact_token = false
   ```
   Please note that the second option is not described in the public documentation, so be careful.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on issue #8803: Impersonate service accounts while running GCP Operators without key material (if airflow is running on GCP)

Posted by GitBox <gi...@apache.org>.
mik-laj commented on issue #8803:
URL: https://github.com/apache/airflow/issues/8803#issuecomment-720551844


   @olchas I am closing this ticket. Can you create a ticket about the missing DAG example?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] amithmathew edited a comment on issue #8803: Impersonate service accounts while running GCP Operators without key material (if airflow is running on GCP)

Posted by GitBox <gi...@apache.org>.
amithmathew edited a comment on issue #8803:
URL: https://github.com/apache/airflow/issues/8803#issuecomment-649709601


   Thanks Kamil. I don't think this statement is accurate in the context I was proposing - "The implementation of this feature in the current Airflow architecture meant that DAG or the operator could access the access key or service account file that allows you to log in to any other account. This is unacceptable."
   
   The account used by the airflow worker can impersonate another service account only if granted the appropriate permissions through IAM, so I don't think you can log into *any* account. Secondly, even when using a secrets backend, your DAGs still have permissions to access any secret in there (unless I'm missing something).
   
   [This](https://cloud.google.com/iam/docs/understanding-service-accounts#directly_impersonating_a_service_account) is the relevant documentation.
   
   Not having to deal with key material allows for -
    1: Do not have to deal with key rotations.
    2: When airflow operations is centralized in an organization, eliminate any coordination required for key management and transfer for setup - everything is controlled through IAM.
    3: Controlling IAM access through terraform becomes easier, no key generation, transfer or load required.
   
   I may be missing something, of course!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on issue #8803: Impersonate service accounts while running GCP Operators without key material (if airflow is running on GCP)

Posted by GitBox <gi...@apache.org>.
mik-laj commented on issue #8803:
URL: https://github.com/apache/airflow/issues/8803#issuecomment-649512860


   I talked to the Google team about this feature. We had serious security concerns. The implementation of this feature in the current Airflow architecture meant that DAG or the operator could access the access key or service account file that allows you to log in to any other account. This is unacceptable.  We must think about how to provide these feature without introducing such a security risk.  Ideally, the scheduler would not have access to any object from Connection and would only communicate using the API. However, it is unlikely to happen in the near future. 
   
   Another solution is to create a separate component that will generate the access code based on the allowed list.  Such a thing can be based on Hashicorp Vault (https://www.vaultproject.io/docs/secrets/gcp#access-tokens) or other.
   
   Another solution is to create worker for each service account and use workflow identity to provide access to access token.
   
   Probably, each of them would require a lot of work. 
   
   The easiest will be to generate keys and add them to secret backends.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] olchas commented on issue #8803: Impersonate service accounts while running GCP Operators without key material (if airflow is running on GCP)

Posted by GitBox <gi...@apache.org>.
olchas commented on issue #8803:
URL: https://github.com/apache/airflow/issues/8803#issuecomment-660234954


   @mik-laj I agree, for users it will definitely be better to be able to specify it at task level. 
   
   I was looking at implementation of [google.auth.impersonated_credentials module](https://google-auth.readthedocs.io/en/latest/reference/google.auth.impersonated_credentials.html) and there I found that you can actually specify a chain of service accounts leading to the final one, that is supposed to grant the access token used for request. So I think the new argument for operators and hooks should accept both a string with single service account as well as a list, in case of which the last one is used as `target_principal`, while the rest are used as `delegates`. I think keeping one argument for operators/hooks designated for impersonation should suffice. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on issue #8803: Impersonate service accounts while running GCP Operators without key material (if airflow is running on GCP)

Posted by GitBox <gi...@apache.org>.
mik-laj commented on issue #8803:
URL: https://github.com/apache/airflow/issues/8803#issuecomment-660222575


   @olchas It seems to me that we should define it at the task level.  From the user's point of view, this should be as easy to use as in gcloud.
   ```bash
   gcloud \
   --account=kamil.bregula@polidea.com \
   --impersonate-service-account=test-kamil@polidea-airflow.iam.gserviceaccount.com \
   auth print-access-token
   ```
   There is only one difference. Instead of using the `--account` option, we have `gcp_conn_id`.
   
   If you want to play around with it then you can use the script below.
   ```bash
   MAIN_ACCOUNT="kamil.bregula@polidea.com"
   SECONDARY_ACCOUNT="test-kamil@polidea-airflow.iam.gserviceaccount.com"
   
   ACCESS_TOKEN="$(gcloud \
       --account=${MAIN_ACCOUNT} \
       auth print-access-token)"
   curl -q "https://www.googleapis.com/oauth2/v3/tokeninfo?access_token=${ACCESS_TOKEN}"
   
   ACCESS_TOKEN="$(gcloud \
       --account=${MAIN_ACCOUNT} \
       --impersonate-service-account=${SECONDARY_ACCOUNT} \
       auth print-access-token)"
   curl -q "https://www.googleapis.com/oauth2/v3/tokeninfo?access_token=${ACCESS_TOKEN}"
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] amithmathew commented on issue #8803: Impersonate service accounts while running GCP Operators without key material (if airflow is running on GCP)

Posted by GitBox <gi...@apache.org>.
amithmathew commented on issue #8803:
URL: https://github.com/apache/airflow/issues/8803#issuecomment-683527681


   Took a look and did some quick and dirty tests with the `impersonation_chain` parameter. Looks good to me. 
   
   I would love to see this implemented for Dataflow as well, will follow #10596.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] amithmathew commented on issue #8803: Impersonate service accounts while running GCP Operators without key material (if airflow is running on GCP)

Posted by GitBox <gi...@apache.org>.
amithmathew commented on issue #8803:
URL: https://github.com/apache/airflow/issues/8803#issuecomment-649727163


   Thanks for the response Jarek. I should state my assumptions :) 
   
   #### Assumptions
   1. Airflow is running on GCP (so workers use the instance accounts) - no long lived keys generated for the main service account.
   2. Airflow is centralized and lives in its own project - DAGs can be run in other projects, and owned by other teams in the organization.
   
   #### Current model using connections and a secrets backend. -
   1. The DAG owner/user should generate a long lived key
   2. Either store it in the secrets backend themselves or transfer it to the team managing airflow to set it up.
   3. The Airflow connection needs to be created.
   
   ##### Challenges
   1. Long lived keys are generated.
   2. To rotate the key used by the connection, the DAG user must coordinate with the team managing airflow (if that is how the org is setup).
   3. If using terraform for these steps, the key transfer and connection setup end up adding manual steps to the process.
   
   #### Proposed model using IAM permissions for impersonation
   1. The DAG owner/user determines whether to grant permissions to the Airflow service account.
   2. No long lived keys generated or need to be managed.
   3. Service Account user permissions (required for impersonation) can be controlled within the DAG Owner/Users project itself and does not require cross-gcp project or cross-team coordination.
   
   
   
   #### Risks:
   ##### Assumption:
   Long lived keys were generated for the main service account used by airflow, and these were compromised.
   
   ##### Detail
   The risk of access to all accounts remains the same in both models. They can access all connections (in the first model) and retrieve key information, or impersonate all other service accounts that have granted permissions to the main account.
   
   ###### Current Model (using connections)
   The resolution is harder in the current model - the compromised main account key will need to be disabled, and there's a risk that the key material for all other accounts were extracted and thus compromised as well. So keys used in all connections will need to be disabled as well.
   
   ###### Proposed Model (using IAM and impersonation)
   In the proposed model, only the main account key will need to be disabled.
   
   
   Hope this makes my request clearer.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on issue #8803: Impersonate service accounts while running GCP Operators without key material (if airflow is running on GCP)

Posted by GitBox <gi...@apache.org>.
mik-laj commented on issue #8803:
URL: https://github.com/apache/airflow/issues/8803#issuecomment-660250450


   @olchas Sounds goods to me. Can you prepare a POC with one operator and no unit tests?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org