You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "sudeepgupta90 (via GitHub)" <gi...@apache.org> on 2023/02/12 08:53:12 UTC

[GitHub] [airflow] sudeepgupta90 opened a new issue, #29486: Secrets Caching for Azure KeyVault

sudeepgupta90 opened a new issue, #29486:
URL: https://github.com/apache/airflow/issues/29486

   ### Apache Airflow Provider(s)
   
   microsoft-azure
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow-providers-microsoft-azure:  5.2.0
   
   ### Apache Airflow version
   
   2.3.3
   
   ### Operating System
   
   NAME="Ubuntu" VERSION="18.04.6 LTS (Bionic Beaver)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 18.04.6 LTS" VERSION_ID="18.04" HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" VERSION_CODENAME=bionic UBUNTU_CODENAME=bionic
   
   ### Deployment
   
   Official Apache Airflow Helm Chart
   
   ### Deployment details
   
   Metadatabase: Azure postgres
   Secret Backend: Azure key Vault
   Deployment Mode: Helm Chart
   
   
   ### What happened
   
   I have dags which can contain upto 500 tasks - most of these tasks call a secret connection lookup from the Azure KeyVault
   
   During Dag parsing, and task execution I find that often there's a timeout due to AKV, and I have received a throttling error from the Key Vault. I understand I can reduce dagbag parsing timeout by increasing this value `AIRFLOW__CORE__DAGBAG_IMPORT_TIMEOUT` , but tasks still timeout during execution. I also know that this problem of double parsing of dag during dagrun has been detailed and fixed [here](https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-45+Remove+double+dag+parsing+in+airflow+run)
   
   
   But what I want to explore is a simple caching mechanism at the AKV without changing/touching any of the Airflow internals
   
   ### What you think should happen instead
   
   There should be a caching mechanism while fetching secrets with a cache expiry without changing the behaviour of Airflow Internals.
   
   I have a working example which does not depend on any other external library. [We change this function signature](https://github.com/apache/airflow/blob/main/airflow/providers/microsoft/azure/secrets/key_vault.py#L178)
   
   ```
   # include the ttl_time= 1, so default behaviour is not changed at all.
   # I have set this to 600 = 10 mins for my example reference working locally
   # the ttl_time is declared as one of init parameters for this class object
   
   @lru_cache(@lru_cache(maxsize= 128)
   def _get_secret(self, path_prefix: str, secret_id: str, ttl_hash = round(time.time()/self.ttl_time)) -> str | None:
           """
           Get an Azure Key Vault secret value
   
           :param path_prefix: Prefix for the Path to get Secret
           :param secret_id: Secret Key
           """
           name = self.build_path(path_prefix, secret_id, self.sep)
           try:
               secret = self.client.get_secret(name=name)
               return secret.value
           except ResourceNotFoundError as ex:
               self.log.debug("Secret %s not found: %s", name, ex)
               return None
   ```
   
   
   
   ### How to reproduce
   
   If you have large dags which query your AKV secrets backend during dag parsing, and task execution, the KAV will start throttling you and tasks will fail
   
   ### Anything else
   
   Happens on large dags, and frequently
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] sudeepgupta90 closed issue #29486: Secrets Caching for Azure KeyVault

Posted by "sudeepgupta90 (via GitHub)" <gi...@apache.org>.
sudeepgupta90 closed issue #29486: Secrets Caching for Azure KeyVault
URL: https://github.com/apache/airflow/issues/29486


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] sudeepgupta90 commented on issue #29486: Secrets Caching for Azure KeyVault

Posted by "sudeepgupta90 (via GitHub)" <gi...@apache.org>.
sudeepgupta90 commented on issue #29486:
URL: https://github.com/apache/airflow/issues/29486#issuecomment-1427066894

   @eladkal Can you please review my PR. Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] sudeepgupta90 commented on issue #29486: Secrets Caching for Azure KeyVault

Posted by "sudeepgupta90 (via GitHub)" <gi...@apache.org>.
sudeepgupta90 commented on issue #29486:
URL: https://github.com/apache/airflow/issues/29486#issuecomment-1426978534

   I am happy to submit my code as a PR if this issue is accepted


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] eladkal commented on issue #29486: Secrets Caching for Azure KeyVault

Posted by "eladkal (via GitHub)" <gi...@apache.org>.
eladkal commented on issue #29486:
URL: https://github.com/apache/airflow/issues/29486#issuecomment-1427042536

   > I am happy to submit my code as a PR if this issue is accepted
   
   If you have code suggestion it's always recommend to start a PR. It's much easier to discuss when seeing the suggested changes. In Airflow we don't require submitting issues anyone is welcome to open PR directly and explain the details in the PR description


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] sudeepgupta90 commented on issue #29486: Secrets Caching for Azure KeyVault

Posted by "sudeepgupta90 (via GitHub)" <gi...@apache.org>.
sudeepgupta90 commented on issue #29486:
URL: https://github.com/apache/airflow/issues/29486#issuecomment-1427075458

   Closing this PR, as I realise that the TTL variable is not customisable in my implementation


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org