You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/04/26 17:23:25 UTC

[GitHub] [airflow] snjypl opened a new issue, #23266: wasb hook not using AZURE_CLIENT_ID environment variable as client_id for ManagedIdentityCredential

snjypl opened a new issue, #23266:
URL: https://github.com/apache/airflow/issues/23266

   ### Apache Airflow Provider(s)
   
   microsoft-azure
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow-providers-microsoft-azure==3.8.0
   
   ### Apache Airflow version
   
   2.2.4
   
   ### Operating System
   
   Ubuntu 20.04.2 LTS
   
   ### Deployment
   
   Official Apache Airflow Helm Chart
   
   ### Deployment details
   
   Have deployed airflow using the official helm chart on aks cluster.  
   
   
   
   
   
   ### What happened
   
   I have deployed apache airflow using the official helm chart on an AKS cluster.
   The pod has multiple user assigned identity assigned to it. 
   i have set the AZURE_CLIENT_ID environment variable to the client id that i want to use for authentication. 
   
   _Airflow connection:_
   
   wasb_default = '{"login":"storageaccountname"}'
   
   **Env**
   AZURE_CLIENT_ID="user-managed-identity-client-id"
   
   _**code**_
   ```
   # suppress azure.core logs
   import logging 
   logger = logging.getLogger("azure.core")
   logger.setLevel(logging.ERROR)
   
   from airflow.providers.microsoft.azure.hooks.wasb import WasbHook
   conn_id = 'wasb-default'
   hook = WasbHook(conn_id)
   for blob_name in hook.get_blobs_list("testcontainer"):
           print(blob_name)
   
   ```
   **error**
   ```
   azure.core.exceptions.ClientAuthenticationError: Unexpected content type "text/plain; charset=utf-8"
   Content: failed to get service principal token, error: adal: Refresh request failed. Status Code = '400'. Response body: {"error":"invalid_request","error_description":"Multiple user assigned identities exist, please specify the clientId / resourceId of the identity in the token request"} Endpoint http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https%3A%2F%2Fstorage.azure.com
   
   ```
   
   
   **trace**
   ```
   
   [2022-04-26 16:37:23,446] {environment.py:103} WARNING - Incomplete environment configuration. These variables are set: AZURE_CLIENT_ID
   [2022-04-26 16:37:23,446] {managed_identity.py:89} INFO - ManagedIdentityCredential will use IMDS
   [2022-04-26 16:37:23,605] {chained.py:84} INFO - DefaultAzureCredential acquired a token from ManagedIdentityCredential
   
   #Note: azure key vault azure.secrets.key_vault.AzureKeyVaultBackend uses DefaultAzureCredential to get the connection 
   
   [2022-04-26 16:37:23,687] {base.py:68} INFO - Using connection ID 'wasb-default' for task execution.
   [2022-04-26 16:37:23,687] {managed_identity.py:89} INFO - ManagedIdentityCredential will use IMDS
   [2022-04-26 16:37:23,688] {wasb.py:155} INFO - Using managed identity as credential
   Traceback (most recent call last):
     File "/home/airflow/.local/lib/python3.10/site-packages/azure/core/pipeline/policies/_universal.py", line 561, in deserialize_from_text
       return json.loads(data_as_str)
     File "/usr/local/lib/python3.10/json/__init__.py", line 346, in loads
       return _default_decoder.decode(s)
     File "/usr/local/lib/python3.10/json/decoder.py", line 337, in decode
       obj, end = self.raw_decode(s, idx=_w(s, 0).end())
     File "/usr/local/lib/python3.10/json/decoder.py", line 355, in raw_decode
       raise JSONDecodeError("Expecting value", s, err.value) from None
   json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
   
   During handling of the above exception, another exception occurred:
   
   Traceback (most recent call last):
     File "/home/airflow/.local/lib/python3.10/site-packages/azure/identity/_internal/managed_identity_client.py", line 51, in _process_response
       content = ContentDecodePolicy.deserialize_from_text(
     File "/home/airflow/.local/lib/python3.10/site-packages/azure/core/pipeline/policies/_universal.py", line 563, in deserialize_from_text
       raise DecodeError(message="JSON is invalid: {}".format(err), response=response, error=err)
   azure.core.exceptions.DecodeError: JSON is invalid: Expecting value: line 1 column 1 (char 0)
   
   The above exception was the direct cause of the following exception:
   
   Traceback (most recent call last):
     File "/home/airflow/.local/lib/python3.10/site-packages/azure/identity/_credentials/imds.py", line 97, in _request_token
       token = self._client.request_token(*scopes, headers={"Metadata": "true"})
     File "/home/airflow/.local/lib/python3.10/site-packages/azure/identity/_internal/managed_identity_client.py", line 126, in request_token
       token = self._process_response(response, request_time)
     File "/home/airflow/.local/lib/python3.10/site-packages/azure/identity/_internal/managed_identity_client.py", line 59, in _process_response
       six.raise_from(ClientAuthenticationError(message=message, response=response.http_response), ex)
     File "<string>", line 3, in raise_from
   azure.core.exceptions.ClientAuthenticationError: Unexpected content type "text/plain; charset=utf-8"
   Content: failed to get service principal token, error: adal: Refresh request failed. Status Code = '400'. Response body: {"error":"invalid_request","error_description":"Multiple user assigned identities exist, please specify the clientId / resourceId of the identity in the token request"} Endpoint http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https%3A%2F%2Fstorage.azure.com
   
   
   The above exception was the direct cause of the following exception:
   
   Traceback (most recent call last):
     File "/tmp/test.py", line 7, in <module>
       for blob_name in hook.get_blobs_list("test_container"):
     File "/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/microsoft/azure/hooks/wasb.py", line 231, in get_blobs_list
       for blob in blobs:
     File "/home/airflow/.local/lib/python3.10/site-packages/azure/core/paging.py", line 129, in __next__
       return next(self._page_iterator)
     File "/home/airflow/.local/lib/python3.10/site-packages/azure/core/paging.py", line 76, in __next__
       self._response = self._get_next(self.continuation_token)
     File "/home/airflow/.local/lib/python3.10/site-packages/azure/storage/blob/_list_blobs_helper.py", line 79, in _get_next_cb
       process_storage_error(error)
     File "/home/airflow/.local/lib/python3.10/site-packages/azure/storage/blob/_shared/response_handlers.py", line 89, in process_storage_error
       raise storage_error
     File "/home/airflow/.local/lib/python3.10/site-packages/azure/storage/blob/_list_blobs_helper.py", line 72, in _get_next_cb
       return self._command(
     File "/home/airflow/.local/lib/python3.10/site-packages/azure/storage/blob/_generated/operations/_container_operations.py", line 1572, in list_blob_hierarchy_segment
       pipeline_response = self._client._pipeline.run(request, stream=False, **kwargs)
     File "/home/airflow/.local/lib/python3.10/site-packages/azure/core/pipeline/_base.py", line 211, in run
       return first_node.send(pipeline_request)  # type: ignore
     File "/home/airflow/.local/lib/python3.10/site-packages/azure/core/pipeline/_base.py", line 71, in send
       response = self.next.send(request)
     File "/home/airflow/.local/lib/python3.10/site-packages/azure/core/pipeline/_base.py", line 71, in send
       response = self.next.send(request)
     File "/home/airflow/.local/lib/python3.10/site-packages/azure/core/pipeline/_base.py", line 71, in send
       response = self.next.send(request)
     [Previous line repeated 2 more times]
     File "/home/airflow/.local/lib/python3.10/site-packages/azure/core/pipeline/policies/_redirect.py", line 158, in send
       response = self.next.send(request)
     File "/home/airflow/.local/lib/python3.10/site-packages/azure/core/pipeline/_base.py", line 71, in send
       response = self.next.send(request)
     File "/home/airflow/.local/lib/python3.10/site-packages/azure/storage/blob/_shared/policies.py", line 515, in send
       raise err
     File "/home/airflow/.local/lib/python3.10/site-packages/azure/storage/blob/_shared/policies.py", line 489, in send
       response = self.next.send(request)
     File "/home/airflow/.local/lib/python3.10/site-packages/azure/core/pipeline/_base.py", line 71, in send
       response = self.next.send(request)
     File "/home/airflow/.local/lib/python3.10/site-packages/azure/core/pipeline/_base.py", line 71, in send
       response = self.next.send(request)
     File "/home/airflow/.local/lib/python3.10/site-packages/azure/core/pipeline/policies/_authentication.py", line 117, in send
       self.on_request(request)
     File "/home/airflow/.local/lib/python3.10/site-packages/azure/core/pipeline/policies/_authentication.py", line 94, in on_request
       self._token = self._credential.get_token(*self._scopes)
     File "/home/airflow/.local/lib/python3.10/site-packages/azure/identity/_internal/decorators.py", line 32, in wrapper
       token = fn(*args, **kwargs)
     File "/home/airflow/.local/lib/python3.10/site-packages/azure/identity/_credentials/managed_identity.py", line 123, in get_token
       return self._credential.get_token(*scopes, **kwargs)
     File "/home/airflow/.local/lib/python3.10/site-packages/azure/identity/_internal/get_token_mixin.py", line 76, in get_token
       token = self._request_token(*scopes, **kwargs)
     File "/home/airflow/.local/lib/python3.10/site-packages/azure/identity/_credentials/imds.py", line 111, in _request_token
       six.raise_from(ClientAuthenticationError(message=ex.message, response=ex.response), ex)
     File "<string>", line 3, in raise_from
   azure.core.exceptions.ClientAuthenticationError: Unexpected content type "text/plain; charset=utf-8"
   Content: failed to get service principal token, error: adal: Refresh request failed. Status Code = '400'. Response body: {"error":"invalid_request","error_description":"Multiple user assigned identities exist, please specify the clientId / resourceId of the identity in the token request"} Endpoint http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https%3A%2F%2Fstorage.azure.com
   ```
   
   
   ### What you think should happen instead
   
   The wasb hook should be able to authenticate using the user identity specified in the AZURE_CLIENT_ID and list the blobs
   
   ### How to reproduce
   
   In an environment with multiple user assigned identity. 
   
   ```
   import logging 
   logger = logging.getLogger("azure.core")
   logger.setLevel(logging.ERROR)
   from airflow.providers.microsoft.azure.hooks.wasb import WasbHook
   conn_id = 'wasb-default'
   hook = WasbHook(conn_id)
   for blob_name in hook.get_blobs_list("testcontainer"):
           print(blob_name)
   ```
   
   
   ### Anything else
   
   the issue is caused because we are not passing client_id to ManagedIdentityCredential in 
   [azure.hooks.wasb.WasbHook](https://github.com/apache/airflow/blob/1d875a45994540adef23ad6f638d78c9945ef873/airflow/providers/microsoft/azure/hooks/wasb.py#L153-L160)
    ```
     if not credential:
               credential = ManagedIdentityCredential()
               self.log.info("Using managed identity as credential")
           return BlobServiceClient(
               account_url=f"https://{conn.login}.blob.core.windows.net/",
               credential=credential,
               **extra,
           )
   ```
   
   solution 1:
   instead of ManagedIdentityCredential use [Azure.identity.DefaultAzureCredential](https://github.com/Azure/azure-sdk-for-python/blob/aa35d07aebf062393f14d147da54f0342e6b94a8/sdk/identity/azure-identity/azure/identity/_credentials/default.py#L32)
   
   solution 2:
   pass the client id from env [as done in DefaultAzureCredential](https://github.com/Azure/azure-sdk-for-python/blob/aa35d07aebf062393f14d147da54f0342e6b94a8/sdk/identity/azure-identity/azure/identity/_credentials/default.py#L104-L106): 
   
   `ManagedIdentityCredential(client_id=os.environ.get("AZURE_CLIENT_ID")`
   
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on issue #23266: wasb hook not using AZURE_CLIENT_ID environment variable as client_id for ManagedIdentityCredential

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #23266:
URL: https://github.com/apache/airflow/issues/23266#issuecomment-1114041270

   Feel free to fix it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] boring-cyborg[bot] commented on issue #23266: wasb hook not using AZURE_CLIENT_ID environment variable as client_id for ManagedIdentityCredential

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #23266:
URL: https://github.com/apache/airflow/issues/23266#issuecomment-1110059602

   Thanks for opening your first issue here! Be sure to follow the issue template!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk closed issue #23266: wasb hook not using AZURE_CLIENT_ID environment variable as client_id for ManagedIdentityCredential

Posted by GitBox <gi...@apache.org>.
potiuk closed issue #23266: wasb hook not using  AZURE_CLIENT_ID environment variable as client_id for ManagedIdentityCredential
URL: https://github.com/apache/airflow/issues/23266


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org