You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/04/26 17:23:25 UTC
[GitHub] [airflow] snjypl opened a new issue, #23266: wasb hook not using AZURE_CLIENT_ID environment variable as client_id for ManagedIdentityCredential
snjypl opened a new issue, #23266:
URL: https://github.com/apache/airflow/issues/23266
### Apache Airflow Provider(s)
microsoft-azure
### Versions of Apache Airflow Providers
apache-airflow-providers-microsoft-azure==3.8.0
### Apache Airflow version
2.2.4
### Operating System
Ubuntu 20.04.2 LTS
### Deployment
Official Apache Airflow Helm Chart
### Deployment details
Have deployed airflow using the official helm chart on aks cluster.
### What happened
I have deployed apache airflow using the official helm chart on an AKS cluster.
The pod has multiple user assigned identity assigned to it.
i have set the AZURE_CLIENT_ID environment variable to the client id that i want to use for authentication.
_Airflow connection:_
wasb_default = '{"login":"storageaccountname"}'
**Env**
AZURE_CLIENT_ID="user-managed-identity-client-id"
_**code**_
```
# suppress azure.core logs
import logging
logger = logging.getLogger("azure.core")
logger.setLevel(logging.ERROR)
from airflow.providers.microsoft.azure.hooks.wasb import WasbHook
conn_id = 'wasb-default'
hook = WasbHook(conn_id)
for blob_name in hook.get_blobs_list("testcontainer"):
print(blob_name)
```
**error**
```
azure.core.exceptions.ClientAuthenticationError: Unexpected content type "text/plain; charset=utf-8"
Content: failed to get service principal token, error: adal: Refresh request failed. Status Code = '400'. Response body: {"error":"invalid_request","error_description":"Multiple user assigned identities exist, please specify the clientId / resourceId of the identity in the token request"} Endpoint http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https%3A%2F%2Fstorage.azure.com
```
**trace**
```
[2022-04-26 16:37:23,446] {environment.py:103} WARNING - Incomplete environment configuration. These variables are set: AZURE_CLIENT_ID
[2022-04-26 16:37:23,446] {managed_identity.py:89} INFO - ManagedIdentityCredential will use IMDS
[2022-04-26 16:37:23,605] {chained.py:84} INFO - DefaultAzureCredential acquired a token from ManagedIdentityCredential
#Note: azure key vault azure.secrets.key_vault.AzureKeyVaultBackend uses DefaultAzureCredential to get the connection
[2022-04-26 16:37:23,687] {base.py:68} INFO - Using connection ID 'wasb-default' for task execution.
[2022-04-26 16:37:23,687] {managed_identity.py:89} INFO - ManagedIdentityCredential will use IMDS
[2022-04-26 16:37:23,688] {wasb.py:155} INFO - Using managed identity as credential
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.10/site-packages/azure/core/pipeline/policies/_universal.py", line 561, in deserialize_from_text
return json.loads(data_as_str)
File "/usr/local/lib/python3.10/json/__init__.py", line 346, in loads
return _default_decoder.decode(s)
File "/usr/local/lib/python3.10/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/local/lib/python3.10/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.10/site-packages/azure/identity/_internal/managed_identity_client.py", line 51, in _process_response
content = ContentDecodePolicy.deserialize_from_text(
File "/home/airflow/.local/lib/python3.10/site-packages/azure/core/pipeline/policies/_universal.py", line 563, in deserialize_from_text
raise DecodeError(message="JSON is invalid: {}".format(err), response=response, error=err)
azure.core.exceptions.DecodeError: JSON is invalid: Expecting value: line 1 column 1 (char 0)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.10/site-packages/azure/identity/_credentials/imds.py", line 97, in _request_token
token = self._client.request_token(*scopes, headers={"Metadata": "true"})
File "/home/airflow/.local/lib/python3.10/site-packages/azure/identity/_internal/managed_identity_client.py", line 126, in request_token
token = self._process_response(response, request_time)
File "/home/airflow/.local/lib/python3.10/site-packages/azure/identity/_internal/managed_identity_client.py", line 59, in _process_response
six.raise_from(ClientAuthenticationError(message=message, response=response.http_response), ex)
File "<string>", line 3, in raise_from
azure.core.exceptions.ClientAuthenticationError: Unexpected content type "text/plain; charset=utf-8"
Content: failed to get service principal token, error: adal: Refresh request failed. Status Code = '400'. Response body: {"error":"invalid_request","error_description":"Multiple user assigned identities exist, please specify the clientId / resourceId of the identity in the token request"} Endpoint http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https%3A%2F%2Fstorage.azure.com
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/tmp/test.py", line 7, in <module>
for blob_name in hook.get_blobs_list("test_container"):
File "/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/microsoft/azure/hooks/wasb.py", line 231, in get_blobs_list
for blob in blobs:
File "/home/airflow/.local/lib/python3.10/site-packages/azure/core/paging.py", line 129, in __next__
return next(self._page_iterator)
File "/home/airflow/.local/lib/python3.10/site-packages/azure/core/paging.py", line 76, in __next__
self._response = self._get_next(self.continuation_token)
File "/home/airflow/.local/lib/python3.10/site-packages/azure/storage/blob/_list_blobs_helper.py", line 79, in _get_next_cb
process_storage_error(error)
File "/home/airflow/.local/lib/python3.10/site-packages/azure/storage/blob/_shared/response_handlers.py", line 89, in process_storage_error
raise storage_error
File "/home/airflow/.local/lib/python3.10/site-packages/azure/storage/blob/_list_blobs_helper.py", line 72, in _get_next_cb
return self._command(
File "/home/airflow/.local/lib/python3.10/site-packages/azure/storage/blob/_generated/operations/_container_operations.py", line 1572, in list_blob_hierarchy_segment
pipeline_response = self._client._pipeline.run(request, stream=False, **kwargs)
File "/home/airflow/.local/lib/python3.10/site-packages/azure/core/pipeline/_base.py", line 211, in run
return first_node.send(pipeline_request) # type: ignore
File "/home/airflow/.local/lib/python3.10/site-packages/azure/core/pipeline/_base.py", line 71, in send
response = self.next.send(request)
File "/home/airflow/.local/lib/python3.10/site-packages/azure/core/pipeline/_base.py", line 71, in send
response = self.next.send(request)
File "/home/airflow/.local/lib/python3.10/site-packages/azure/core/pipeline/_base.py", line 71, in send
response = self.next.send(request)
[Previous line repeated 2 more times]
File "/home/airflow/.local/lib/python3.10/site-packages/azure/core/pipeline/policies/_redirect.py", line 158, in send
response = self.next.send(request)
File "/home/airflow/.local/lib/python3.10/site-packages/azure/core/pipeline/_base.py", line 71, in send
response = self.next.send(request)
File "/home/airflow/.local/lib/python3.10/site-packages/azure/storage/blob/_shared/policies.py", line 515, in send
raise err
File "/home/airflow/.local/lib/python3.10/site-packages/azure/storage/blob/_shared/policies.py", line 489, in send
response = self.next.send(request)
File "/home/airflow/.local/lib/python3.10/site-packages/azure/core/pipeline/_base.py", line 71, in send
response = self.next.send(request)
File "/home/airflow/.local/lib/python3.10/site-packages/azure/core/pipeline/_base.py", line 71, in send
response = self.next.send(request)
File "/home/airflow/.local/lib/python3.10/site-packages/azure/core/pipeline/policies/_authentication.py", line 117, in send
self.on_request(request)
File "/home/airflow/.local/lib/python3.10/site-packages/azure/core/pipeline/policies/_authentication.py", line 94, in on_request
self._token = self._credential.get_token(*self._scopes)
File "/home/airflow/.local/lib/python3.10/site-packages/azure/identity/_internal/decorators.py", line 32, in wrapper
token = fn(*args, **kwargs)
File "/home/airflow/.local/lib/python3.10/site-packages/azure/identity/_credentials/managed_identity.py", line 123, in get_token
return self._credential.get_token(*scopes, **kwargs)
File "/home/airflow/.local/lib/python3.10/site-packages/azure/identity/_internal/get_token_mixin.py", line 76, in get_token
token = self._request_token(*scopes, **kwargs)
File "/home/airflow/.local/lib/python3.10/site-packages/azure/identity/_credentials/imds.py", line 111, in _request_token
six.raise_from(ClientAuthenticationError(message=ex.message, response=ex.response), ex)
File "<string>", line 3, in raise_from
azure.core.exceptions.ClientAuthenticationError: Unexpected content type "text/plain; charset=utf-8"
Content: failed to get service principal token, error: adal: Refresh request failed. Status Code = '400'. Response body: {"error":"invalid_request","error_description":"Multiple user assigned identities exist, please specify the clientId / resourceId of the identity in the token request"} Endpoint http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https%3A%2F%2Fstorage.azure.com
```
### What you think should happen instead
The wasb hook should be able to authenticate using the user identity specified in the AZURE_CLIENT_ID and list the blobs
### How to reproduce
In an environment with multiple user assigned identity.
```
import logging
logger = logging.getLogger("azure.core")
logger.setLevel(logging.ERROR)
from airflow.providers.microsoft.azure.hooks.wasb import WasbHook
conn_id = 'wasb-default'
hook = WasbHook(conn_id)
for blob_name in hook.get_blobs_list("testcontainer"):
print(blob_name)
```
### Anything else
the issue is caused because we are not passing client_id to ManagedIdentityCredential in
[azure.hooks.wasb.WasbHook](https://github.com/apache/airflow/blob/1d875a45994540adef23ad6f638d78c9945ef873/airflow/providers/microsoft/azure/hooks/wasb.py#L153-L160)
```
if not credential:
credential = ManagedIdentityCredential()
self.log.info("Using managed identity as credential")
return BlobServiceClient(
account_url=f"https://{conn.login}.blob.core.windows.net/",
credential=credential,
**extra,
)
```
solution 1:
instead of ManagedIdentityCredential use [Azure.identity.DefaultAzureCredential](https://github.com/Azure/azure-sdk-for-python/blob/aa35d07aebf062393f14d147da54f0342e6b94a8/sdk/identity/azure-identity/azure/identity/_credentials/default.py#L32)
solution 2:
pass the client id from env [as done in DefaultAzureCredential](https://github.com/Azure/azure-sdk-for-python/blob/aa35d07aebf062393f14d147da54f0342e6b94a8/sdk/identity/azure-identity/azure/identity/_credentials/default.py#L104-L106):
`ManagedIdentityCredential(client_id=os.environ.get("AZURE_CLIENT_ID")`
### Are you willing to submit PR?
- [X] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #23266: wasb hook not using AZURE_CLIENT_ID environment variable as client_id for ManagedIdentityCredential
Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #23266:
URL: https://github.com/apache/airflow/issues/23266#issuecomment-1114041270
Feel free to fix it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] boring-cyborg[bot] commented on issue #23266: wasb hook not using AZURE_CLIENT_ID environment variable as client_id for ManagedIdentityCredential
Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #23266:
URL: https://github.com/apache/airflow/issues/23266#issuecomment-1110059602
Thanks for opening your first issue here! Be sure to follow the issue template!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk closed issue #23266: wasb hook not using AZURE_CLIENT_ID environment variable as client_id for ManagedIdentityCredential
Posted by GitBox <gi...@apache.org>.
potiuk closed issue #23266: wasb hook not using AZURE_CLIENT_ID environment variable as client_id for ManagedIdentityCredential
URL: https://github.com/apache/airflow/issues/23266
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org