You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "gabor-one (via GitHub)" <gi...@apache.org> on 2024/04/15 12:53:07 UTC

[I] Airflow produces an unnecessary ' ' (space) in the middle of the WASB URL when WASB connection is read from Azure Key Vault secret backed. [airflow]

gabor-one opened a new issue, #39028:
URL: https://github.com/apache/airflow/issues/39028

   ### Apache Airflow Provider(s)
   
   microsoft-azure
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow-providers-microsoft-azure==9.0.1
   
   ### Apache Airflow version
   
   2.9.0
   
   ### Operating System
   
   Debian GNU/Linux 12 (bookworm)
   
   ### Deployment
   
   Official Apache Airflow Helm Chart
   
   ### Deployment details
   
   - Platform: Kubernetes (AKS)
   - Executor: KubernetesExecutor
   - Using Azure Key-Vault as the secret provider via Workload Identity.
   - Using azure_remote_logging (Azure Blob Storage)
   
   ### What happened
   
   If the connection is defined in Azure Key Vault then the task pods cannot write logs to Azure Blob Storage at the end of the execution. There is a random ' ' (Space character) in the storage account URL (see the last line in log).
   
   WASB Airflow connection is defined as this in Key Vault: `wasb://https://<STORAGE_ACCOUNT_NAME>.blob.core.windows.net`
   
   If the connection is created via UI and 'remote_log_conn_id' is changed to use that connection for logging everything works fine.
   
   Logs:
   ```
   [2024-04-15, 12:03:07 UTC] {retries.py:91} DEBUG - Running Job._fetch_from_db with retries. Try 1 of 3
   [2024-04-15, 12:03:07 UTC] {retries.py:91} DEBUG - Running Job._update_heartbeat with retries. Try 1 of 3
   [2024-04-15, 12:03:07 UTC] {job.py:214} DEBUG - [heartbeat]
   [2024-04-15, 12:03:12 UTC] {retries.py:91} DEBUG - Running Job._fetch_from_db with retries. Try 1 of 3
   [2024-04-15, 12:03:12 UTC] {retries.py:91} DEBUG - Running Job._update_heartbeat with retries. Try 1 of 3
   [2024-04-15, 12:03:12 UTC] {job.py:214} DEBUG - [heartbeat]
   [2024-04-15, 12:03:13 UTC] {taskinstance.py:441} ▼ Post task execution logs
   [2024-04-15, 12:03:13 UTC] {taskinstance.py:2890} ERROR - Task failed with exception
   Traceback (most recent call last):
     File "/home/airflow/.local/lib/python3.10/site-packages/aiohttp/connector.py", line 1173, in _create_direct_connection
       hosts = await asyncio.shield(host_resolved)
     File "/home/airflow/.local/lib/python3.10/site-packages/aiohttp/connector.py", line 884, in _resolve_host
       addrs = await self._resolver.resolve(host, port, family=self._family)
     File "/home/airflow/.local/lib/python3.10/site-packages/aiohttp/resolver.py", line 33, in resolve
       infos = await self._loop.getaddrinfo(
     File "/usr/local/lib/python3.10/asyncio/base_events.py", line 863, in getaddrinfo
       return await self.run_in_executor(
     File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run
       result = self.fn(*self.args, **self.kwargs)
     File "/usr/local/lib/python3.10/socket.py", line 955, in getaddrinfo
       for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
   socket.gaierror: [Errno -2] Name or service not known
   The above exception was the direct cause of the following exception:
   Traceback (most recent call last):
     File "/home/airflow/.local/lib/python3.10/site-packages/azure/core/pipeline/transport/_aiohttp.py", line 294, in send
       result = await self.session.request(  # type: ignore
     File "/home/airflow/.local/lib/python3.10/site-packages/aiohttp/client.py", line 578, in _request
       conn = await self._connector.connect(
     File "/home/airflow/.local/lib/python3.10/site-packages/aiohttp/connector.py", line 544, in connect
       proto = await self._create_connection(req, traces, timeout)
     File "/home/airflow/.local/lib/python3.10/site-packages/aiohttp/connector.py", line 911, in _create_connection
       _, proto = await self._create_direct_connection(req, traces, timeout)
     File "/home/airflow/.local/lib/python3.10/site-packages/aiohttp/connector.py", line 1187, in _create_direct_connection
       raise ClientConnectorError(req.connection_key, exc) from exc
   aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host <STORAGE_ACCOUNT_NAME> .blob.core.windows.net:443 ssl:default [Name or service not known]
   The above exception was the direct cause of the following exception:
   Traceback (most recent call last):
     File "/home/airflow/.local/lib/python3.10/site-packages/airflow/models/taskinstance.py", line 465, in _execute_task
       result = _execute_callable(context=context, **execute_callable_kwargs)
     File "/home/airflow/.local/lib/python3.10/site-packages/airflow/models/taskinstance.py", line 432, in _execute_callable
       return execute_callable(context=context, **execute_callable_kwargs)
     File "/home/airflow/.local/lib/python3.10/site-packages/airflow/models/baseoperator.py", line 400, in wrapper
       return func(self, *args, **kwargs)
     File "/home/airflow/.local/lib/python3.10/site-packages/airflow/decorators/base.py", line 265, in execute
       return_value = super().execute(context)
     File "/home/airflow/.local/lib/python3.10/site-packages/airflow/models/baseoperator.py", line 400, in wrapper
       return func(self, *args, **kwargs)
     File "/home/airflow/.local/lib/python3.10/site-packages/airflow/operators/python.py", line 235, in execute
       return_value = self.execute_callable()
     File "/home/airflow/.local/lib/python3.10/site-packages/airflow/operators/python.py", line 252, in execute_callable
       return self.python_callable(*self.op_args, **self.op_kwargs)
     File "/opt/airflow/dags/repo/src/workflows/test.py", line 24, in test_features
       print(f"Got access to datalake. ls: {fs.ls(datalake_folder)}")
     File "/home/airflow/.local/lib/python3.10/site-packages/fsspec/asyn.py", line 118, in wrapper
       return sync(self.loop, func, *args, **kwargs)
     File "/home/airflow/.local/lib/python3.10/site-packages/fsspec/asyn.py", line 103, in sync
       raise return_result
     File "/home/airflow/.local/lib/python3.10/site-packages/fsspec/asyn.py", line 56, in _runner
       result[0] = await coro
     File "/home/airflow/.local/lib/python3.10/site-packages/adlfs/spec.py", line 823, in _ls
       output = await self._ls_blobs(
     File "/home/airflow/.local/lib/python3.10/site-packages/adlfs/spec.py", line 724, in _ls_blobs
       async for next_blob in blobs:
     File "/home/airflow/.local/lib/python3.10/site-packages/azure/core/async_paging.py", line 142, in __anext__
       return await self.__anext__()
     File "/home/airflow/.local/lib/python3.10/site-packages/azure/core/async_paging.py", line 145, in __anext__
       self._page = await self._page_iterator.__anext__()
     File "/home/airflow/.local/lib/python3.10/site-packages/azure/core/async_paging.py", line 94, in __anext__
       self._response = await self._get_next(self.continuation_token)
     File "/home/airflow/.local/lib/python3.10/site-packages/azure/storage/blob/aio/_list_blobs_helper.py", line 83, in _get_next_cb
       return await self._command(
     File "/home/airflow/.local/lib/python3.10/site-packages/azure/core/tracing/decorator_async.py", line 77, in wrapper_use_tracer
       return await func(*args, **kwargs)
     File "/home/airflow/.local/lib/python3.10/site-packages/azure/storage/blob/_generated/aio/operations/_container_operations.py", line 1886, in list_blob_hierarchy_segment
       pipeline_response: PipelineResponse = await self._client._pipeline.run(  # pylint: disable=protected-access
     File "/home/airflow/.local/lib/python3.10/site-packages/azure/core/pipeline/_base_async.py", line 221, in run
       return await first_node.send(pipeline_request)
     File "/home/airflow/.local/lib/python3.10/site-packages/azure/core/pipeline/_base_async.py", line 69, in send
       response = await self.next.send(request)
     File "/home/airflow/.local/lib/python3.10/site-packages/azure/core/pipeline/_base_async.py", line 69, in send
       response = await self.next.send(request)
     File "/home/airflow/.local/lib/python3.10/site-packages/azure/core/pipeline/_base_async.py", line 69, in send
       response = await self.next.send(request)
     [Previous line repeated 3 more times]
     File "/home/airflow/.local/lib/python3.10/site-packages/azure/core/pipeline/policies/_authentication_async.py", line 100, in send
       response = await self.next.send(request)
     File "/home/airflow/.local/lib/python3.10/site-packages/azure/core/pipeline/_base_async.py", line 69, in send
       response = await self.next.send(request)
     File "/home/airflow/.local/lib/python3.10/site-packages/azure/core/pipeline/policies/_redirect_async.py", line 73, in send
       response = await self.next.send(request)
     File "/home/airflow/.local/lib/python3.10/site-packages/azure/core/pipeline/_base_async.py", line 69, in send
       response = await self.next.send(request)
     File "/home/airflow/.local/lib/python3.10/site-packages/azure/storage/blob/_shared/policies_async.py", line 137, in send
       raise err
     File "/home/airflow/.local/lib/python3.10/site-packages/azure/storage/blob/_shared/policies_async.py", line 111, in send
       response = await self.next.send(request)
     File "/home/airflow/.local/lib/python3.10/site-packages/azure/core/pipeline/_base_async.py", line 69, in send
       response = await self.next.send(request)
     File "/home/airflow/.local/lib/python3.10/site-packages/azure/storage/blob/_shared/policies_async.py", line 64, in send
       response = await self.next.send(request)
     File "/home/airflow/.local/lib/python3.10/site-packages/azure/core/pipeline/_base_async.py", line 69, in send
       response = await self.next.send(request)
     File "/home/airflow/.local/lib/python3.10/site-packages/azure/core/pipeline/_base_async.py", line 69, in send
       response = await self.next.send(request)
     File "/home/airflow/.local/lib/python3.10/site-packages/azure/core/pipeline/_base_async.py", line 106, in send
       await self._sender.send(request.http_request, **request.context.options),
     File "/home/airflow/.local/lib/python3.10/site-packages/azure/storage/blob/_shared/base_client_async.py", line 175, in send
       return await self._transport.send(request, **kwargs)
     File "/home/airflow/.local/lib/python3.10/site-packages/azure/core/pipeline/transport/_aiohttp.py", line 332, in send
       raise ServiceRequestError(err, error=err) from err
   azure.core.exceptions.ServiceRequestError: Cannot connect to host <STORAGE_ACCOUNT_NAME> .blob.core.windows.net:443 ssl:default [Name or service not known]
   ```
   
   WASB-DEFAULT connection defined in the Key-Vault that produces a random space in the URL:
   ```
   >airflow connections get wasb-default -o yaml
   - conn_id: wasb-default
     conn_type: wasb
     description: null
     extra_dejson: {}
     get_uri: wasb://https://<STORAGE_ACCOUNT_NAME>.blob.core.windows.net
     host: https://<STORAGE_ACCOUNT_NAME>.blob.core.windows.net
     id: null
     is_encrypted: null
     is_extra_encrypted: null
     login: null
     password: null
     port: null
     schema: ''
   ```
   
   WASB connection defined via UI that works:
   ```
   >airflow connections get abc -o yaml
   - conn_id: abc
     conn_type: wasb
     description: ''
     extra_dejson: {}
     get_uri: wasb://https://<STORAGE_ACCOUNT_NAME>.blob.core.windows.net
     host: https://<STORAGE_ACCOUNT_NAME>.blob.core.windows.net
     id: '1'
     is_encrypted: 'False'
     is_extra_encrypted: 'False'
     login: ''
     password: null
     port: null
     schema: ''
   ```
   
   ### What you think should happen instead
   
   WASB connections defined via Key-Vault should not produce an extra ' ' (space) character in the URL for no reason just as connections create via UI don't.
   
   ### How to reproduce
   
   1. Setup Azure Kubernetes to use Workload Identity. Attach service account to pods. Federate identity to service account. Give that federated identity access to Azure Storage Account.
   2. Configure Airflow to use Azure Key-Vault as secret backend.
   3. Configure Airflow to use azure_remote_logging.
   4. Create an Airflow WASB connection secret in Key-Vault. Use example from above.
   5. Run a DAG.
   6. Task will fail due to task will not be able to write logs to Storage Container.
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Airflow produces an unnecessary ' ' (space) in the middle of the WASB URL when WASB connection is read from Azure Key Vault secret backed. [airflow]

Posted by "boring-cyborg[bot] (via GitHub)" <gi...@apache.org>.
boring-cyborg[bot] commented on issue #39028:
URL: https://github.com/apache/airflow/issues/39028#issuecomment-2056784221

   Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Airflow produces an unnecessary ' ' (space) in the middle of the WASB URL when WASB connection is read from Azure Key Vault secret backed. [airflow]

Posted by "gabor-one (via GitHub)" <gi...@apache.org>.
gabor-one closed issue #39028: Airflow produces an unnecessary ' ' (space) in the middle of the WASB URL when WASB connection is read from Azure Key Vault secret backed.
URL: https://github.com/apache/airflow/issues/39028


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Airflow produces an unnecessary ' ' (space) in the middle of the WASB URL when WASB connection is read from Azure Key Vault secret backed. [airflow]

Posted by "gabor-one (via GitHub)" <gi...@apache.org>.
gabor-one commented on issue #39028:
URL: https://github.com/apache/airflow/issues/39028#issuecomment-2057211976

   I made a mistake reading the log. You don't need https:// in the connection URL. Please ignore this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org