You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "jaruji (via GitHub)" <gi...@apache.org> on 2024/02/15 17:43:55 UTC
[I] Azure blob remote logging - The specified container does not exist. [airflow]
jaruji opened a new issue, #37459:
URL: https://github.com/apache/airflow/issues/37459
### Official Helm Chart version
1.12.0 (latest released)
### Apache Airflow version
2.8.1
### Kubernetes Version
1.26.6
### Helm Chart configuration
```
config:
webserver:
expose_config: 'True'
logging:
remote_logging: 'True'
remote_base_log_folder: wasb-airflow/logs
remote_wasb_log_container: airflow
remote_log_conn_id: wasb_default
images:
airflow:
# define custom airflow image here (with PyPi packages installed)
repository: org.azurecr.io/internal-airflow
# CHANGE THIS when updating
tag: "0.2.0"
executor: KubernetesExecutor
fernetKeySecretName: airflow-fernet-secret
webserverSecretKeySecretName: airflow-webserver-secret
createUserJob:
useHelmHooks: false
applyCustomEnv: false
migrateDatabaseJob:
enabled: true
useHelmHooks: false
applyCustomEnv: false
jobAnnotations:
"argocd.argoproj.io/hook": Sync
useStandardNaming: true
dags:
gitSync:
enabled: true
repo: git@github.com:ORG/custom-airflow.git
branch: master
subPath: "dags"
sshKeySecret: airflow-ssh-secret
ingress:
web:
enabled: true
annotations:
cert-manager.io/cluster-issuer: "letsencrypt"
# The path for the web Ingress
path: "/"
# The pathType for the above path (used only with Kubernetes v1.19 and above)
pathType: "ImplementationSpecific"
# The hostnames or hosts configuration for the web Ingress
# Set in argoCD application yaml
hosts: []
# # The hostname for the web Ingress (can be templated)
# - name: ""
# # configs for web Ingress TLS
# tls:
# # Enable TLS termination for the web Ingress
# enabled: false
# # the name of a pre-created Secret containing a TLS private key and certificate
# secretName: ""
# The Ingress Class for the web Ingress (used only with Kubernetes v1.19 and above)
ingressClassName: "nginx"
```
### Docker Image customizations
# Use the specified Apache Airflow image as a base
FROM apache/airflow:2.8.1
# Install dependencies required for building pymssql
USER root
RUN apt-get update && apt-get install -y \
freetds-dev \
build-essential \
&& rm -rf /var/lib/apt/lists/*
RUN apt-get update -y \
&& apt-get install -y \
libglib2.0-0 \
libnss3 \
libnspr4 \
libdbus-1-3 \
libatk1.0-0 \
libatk-bridge2.0-0 \
libcups2 \
libdrm2 \
libxkbcommon0 \
libatspi2.0-0 \
libxcomposite1 \
libxdamage1 \
libxext6 \
libxfixes3 \
libxrandr2 \
libgbm1 \
libpango-1.0-0 \
libcairo2 \
libasound2 \
&& rm -rf /var/lib/apt/lists/*
# Copy the requirements file into the container
COPY requirements.txt /
COPY .env /
# Switch back to the airflow user
USER airflow
# Install the requirements, including Apache Airflow
RUN pip install --no-cache-dir "apache-airflow==${AIRFLOW_VERSION}" -r /requirements.txt
RUN pip install python-dotenv
#install azure provider for airflow, needed for remote logging to azure blob
RUN pip install apache-airflow-providers-microsoft-azure
RUN playwright install
### What happened
When I define the connection manually using the webserver UI (I add a wasb connection using the azure blob connection string), the DAG execution always fails to remotely log the logs - saying that the provided container does not exist. The error I get:
```
[2024-02-15T17:26:13.076+0000] {wasb_task_handler.py:238} ERROR - Could not write logs to wasb-airflow/logs/dag_id=internal_dag/run_id=manual__2024-02-15T17:25:55.528727+00:00/task_id=read_product_feed/attempt=1.log │
│ Traceback (most recent call last): │
│ File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/microsoft/azure/log/wasb_task_handler.py", line 236, in wasb_write │
│ self.hook.load_string(log, self.wasb_container, remote_log_location, overwrite=True) │
│ File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/microsoft/azure/hooks/wasb.py", line 373, in load_string │
│ self.upload( │
│ File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/microsoft/azure/hooks/wasb.py", line 431, in upload │
│ return blob_client.upload_blob(data, blob_type, length=length, **kwargs) │
│ File "/home/airflow/.local/lib/python3.8/site-packages/azure/core/tracing/decorator.py", line 78, in wrapper_use_tracer │
│ return func(*args, **kwargs) │
│ File "/home/airflow/.local/lib/python3.8/site-packages/azure/storage/blob/_blob_client.py", line 765, in upload_blob │
│ return upload_block_blob(**options) │
│ File "/home/airflow/.local/lib/python3.8/site-packages/azure/storage/blob/_upload_helpers.py", line 195, in upload_block_blob │
│ process_storage_error(error) │
│ File "/home/airflow/.local/lib/python3.8/site-packages/azure/storage/blob/_shared/response_handlers.py", line 184, in process_storage_error │
│ exec("raise error from None") # pylint: disable=exec-used # nosec │
│ File "<string>", line 1, in <module> │
│ File "/home/airflow/.local/lib/python3.8/site-packages/azure/storage/blob/_upload_helpers.py", line 105, in upload_block_blob │
│ response = client.upload( │
│ File "/home/airflow/.local/lib/python3.8/site-packages/azure/core/tracing/decorator.py", line 78, in wrapper_use_tracer │
│ return func(*args, **kwargs) │
│ File "/home/airflow/.local/lib/python3.8/site-packages/azure/storage/blob/_generated/operations/_block_blob_operations.py", line 864, in upload │
│ map_error(status_code=response.status_code, response=response, error_map=error_map) │
│ File "/home/airflow/.local/lib/python3.8/site-packages/azure/core/exceptions.py", line 164, in map_error │
│ raise error │
│ azure.core.exceptions.ResourceNotFoundError: The specified container does not exist. │
│ RequestId:b1e6ba42-b01e-005c-1f34-60c86e000000 │
│ Time:2024-02-15T17:26:13.0720393Z │
│ ErrorCode:ContainerNotFound │
│ Content: <?xml version="1.0" encoding="utf-8"?><Error><Code>ContainerNotFound</Code><Message>The specified container does not exist. │
│ RequestId:b1e6ba42-b01e-005c-1f34-60c86e000000 │
│ Time:2024-02-15T17:26:13.0720393Z</Message></Error>
```
### What you think should happen instead
The logs should get uploaded to the provided location in the blob using the configured Azure Blob connection.
### How to reproduce
Deploy airflow to kubernetes cluster using the official helm chart and use the configurations for remote logging into azure blob with the kubernetes executor. I use the azure blob connection string to authenticate.
### Anything else
This problem occurs every time the log upload process is initiated. I checked multiple times whether the airflow container exists on the blob, and it does. It's also possible I'm overlooking something / missing something obvious. I was following the docs at: https://airflow.apache.org/docs/apache-airflow-providers-microsoft-azure/stable/logging/index.html
### Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
Re: [I] Azure blob remote logging - The specified container does not exist. [airflow]
Posted by "jaruji (via GitHub)" <gi...@apache.org>.
jaruji commented on issue #37459:
URL: https://github.com/apache/airflow/issues/37459#issuecomment-1947350151
Okay that fixed it. The docs for this should get updated, as the information written there regarding the azure blob remote logging setup is not up to date. It took me a fair bit of time to figure out that the issue was not on my side. Thanks for the patience and help!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
Re: [I] Azure blob remote logging - The specified container does not exist. [airflow]
Posted by "boring-cyborg[bot] (via GitHub)" <gi...@apache.org>.
boring-cyborg[bot] commented on issue #37459:
URL: https://github.com/apache/airflow/issues/37459#issuecomment-1946735479
Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
Re: [I] Azure blob remote logging - The specified container does not exist. [airflow]
Posted by "jaruji (via GitHub)" <gi...@apache.org>.
jaruji commented on issue #37459:
URL: https://github.com/apache/airflow/issues/37459#issuecomment-1946927299
Ok, if I understood correctly, I was supposed to add a azure_remote_logging param into my config:
```
config:
logging:
remote_logging: 'True'
remote_base_log_folder: wasb-airflow/logs
remote_wasb_log_container: airflow
azure_remote_logging: airflow
remote_log_conn_id: azure_blob_storage
```
So I did this, but it still did not work. However if I create the fallback container (airflow-logs) on our blob storage, the logs get stored correctly. So now the question is how does a person configure a custom container name instead of the predefined airflow-logs one. Maybe I missed something?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
Re: [I] Azure blob remote logging - The specified container does not exist. [airflow]
Posted by "Taragolis (via GitHub)" <gi...@apache.org>.
Taragolis commented on issue #37459:
URL: https://github.com/apache/airflow/issues/37459#issuecomment-1947316985
It is separate section, something like that.
```yaml
config:
logging:
remote_logging: 'True'
remote_base_log_folder: wasb-airflow/logs
remote_log_conn_id: azure_blob_storage
azure_remote_logging:
remote_wasb_log_container: airflow
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
Re: [I] Azure blob remote logging - The specified container does not exist. [airflow]
Posted by "Taragolis (via GitHub)" <gi...@apache.org>.
Taragolis commented on issue #37459:
URL: https://github.com/apache/airflow/issues/37459#issuecomment-1946827663
I guess it might be wrong information into the with documentation. `remote_wasb_log_container` expected to retrieved from `azure_remote_logging` section.
https://github.com/apache/airflow/blob/b75f9e880614fa0427e7d24a1817955f5de658b3/airflow/config_templates/airflow_local_settings.py#L248-L250
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
Re: [I] Azure blob remote logging - The specified container does not exist. [airflow]
Posted by "Taragolis (via GitHub)" <gi...@apache.org>.
Taragolis commented on issue #37459:
URL: https://github.com/apache/airflow/issues/37459#issuecomment-1946831240
Could you change it in your config and try again?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org