You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "jaruji (via GitHub)" <gi...@apache.org> on 2024/02/15 17:43:55 UTC

[I] Azure blob remote logging - The specified container does not exist. [airflow]

jaruji opened a new issue, #37459:
URL: https://github.com/apache/airflow/issues/37459

   ### Official Helm Chart version
   
   1.12.0 (latest released)
   
   ### Apache Airflow version
   
   2.8.1
   
   ### Kubernetes Version
   
   1.26.6
   
   ### Helm Chart configuration
   
   ```
     config:
       webserver:
         expose_config: 'True'
       logging:
         remote_logging: 'True'
         remote_base_log_folder: wasb-airflow/logs
         remote_wasb_log_container: airflow
         remote_log_conn_id: wasb_default
     images:
       airflow:
         # define custom airflow image here (with PyPi packages installed)
         repository: org.azurecr.io/internal-airflow
         # CHANGE THIS when updating
         tag: "0.2.0"
   
     executor: KubernetesExecutor
     fernetKeySecretName: airflow-fernet-secret
     webserverSecretKeySecretName: airflow-webserver-secret
     createUserJob:
       useHelmHooks: false
       applyCustomEnv: false
   
     migrateDatabaseJob:
       enabled: true  
       useHelmHooks: false
       applyCustomEnv: false
       jobAnnotations:
           "argocd.argoproj.io/hook": Sync
     useStandardNaming: true
   
     dags:
       gitSync:
         enabled: true
         repo: git@github.com:ORG/custom-airflow.git
         branch: master
         subPath: "dags"
         sshKeySecret: airflow-ssh-secret
     ingress:
       web:
         enabled: true
         annotations:
           cert-manager.io/cluster-issuer: "letsencrypt"
   
         # The path for the web Ingress
         path: "/"
   
         # The pathType for the above path (used only with Kubernetes v1.19 and above)
         pathType: "ImplementationSpecific"
   
         # The hostnames or hosts configuration for the web Ingress
         # Set in argoCD application yaml
         hosts: []
         #   # The hostname for the web Ingress (can be templated)
         # - name: ""
         #   # configs for web Ingress TLS
         #   tls:
         #     # Enable TLS termination for the web Ingress
         #     enabled: false
         #     # the name of a pre-created Secret containing a TLS private key and certificate
         #     secretName: ""
   
         # The Ingress Class for the web Ingress (used only with Kubernetes v1.19 and above)
         ingressClassName: "nginx"
   
   ```
   
   ### Docker Image customizations
   
   # Use the specified Apache Airflow image as a base
   FROM apache/airflow:2.8.1
   
   # Install dependencies required for building pymssql
   USER root
   RUN apt-get update && apt-get install -y \
       freetds-dev \
       build-essential \
       && rm -rf /var/lib/apt/lists/*
   RUN apt-get update -y \
       && apt-get install -y \
       libglib2.0-0 \
       libnss3 \
       libnspr4 \
       libdbus-1-3 \
       libatk1.0-0 \
       libatk-bridge2.0-0 \
       libcups2 \
       libdrm2 \
       libxkbcommon0 \
       libatspi2.0-0 \
       libxcomposite1 \
       libxdamage1 \
       libxext6 \
       libxfixes3 \
       libxrandr2 \
       libgbm1 \
       libpango-1.0-0 \
       libcairo2 \
       libasound2 \
       && rm -rf /var/lib/apt/lists/*  
   # Copy the requirements file into the container
   COPY requirements.txt /
   COPY .env /
   
   # Switch back to the airflow user
   USER airflow
   
   # Install the requirements, including Apache Airflow
   RUN pip install --no-cache-dir "apache-airflow==${AIRFLOW_VERSION}" -r /requirements.txt
   RUN pip install python-dotenv
   #install azure provider for airflow, needed for remote logging to azure blob
   RUN pip install apache-airflow-providers-microsoft-azure 
   RUN playwright install
   
   ### What happened
   
   When I define the connection manually using the webserver UI (I add a wasb connection using the azure blob connection string), the DAG execution always fails to remotely log the logs - saying that the provided container does not exist. The error I get:
   ```
   [2024-02-15T17:26:13.076+0000] {wasb_task_handler.py:238} ERROR - Could not write logs to wasb-airflow/logs/dag_id=internal_dag/run_id=manual__2024-02-15T17:25:55.528727+00:00/task_id=read_product_feed/attempt=1.log                         │
   │ Traceback (most recent call last):                                                                                                                                                                                                                │
   │   File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/microsoft/azure/log/wasb_task_handler.py", line 236, in wasb_write                                                                                                     │
   │     self.hook.load_string(log, self.wasb_container, remote_log_location, overwrite=True)                                                                                                                                                          │
   │   File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/microsoft/azure/hooks/wasb.py", line 373, in load_string                                                                                                               │
   │     self.upload(                                                                                                                                                                                                                                  │
   │   File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/microsoft/azure/hooks/wasb.py", line 431, in upload                                                                                                                    │
   │     return blob_client.upload_blob(data, blob_type, length=length, **kwargs)                                                                                                                                                                      │
   │   File "/home/airflow/.local/lib/python3.8/site-packages/azure/core/tracing/decorator.py", line 78, in wrapper_use_tracer                                                                                                                         │
   │     return func(*args, **kwargs)                                                                                                                                                                                                                  │
   │   File "/home/airflow/.local/lib/python3.8/site-packages/azure/storage/blob/_blob_client.py", line 765, in upload_blob                                                                                                                            │
   │     return upload_block_blob(**options)                                                                                                                                                                                                           │
   │   File "/home/airflow/.local/lib/python3.8/site-packages/azure/storage/blob/_upload_helpers.py", line 195, in upload_block_blob                                                                                                                   │
   │     process_storage_error(error)                                                                                                                                                                                                                  │
   │   File "/home/airflow/.local/lib/python3.8/site-packages/azure/storage/blob/_shared/response_handlers.py", line 184, in process_storage_error                                                                                                     │
   │     exec("raise error from None")   # pylint: disable=exec-used # nosec                                                                                                                                                                           │
   │   File "<string>", line 1, in <module>                                                                                                                                                                                                            │
   │   File "/home/airflow/.local/lib/python3.8/site-packages/azure/storage/blob/_upload_helpers.py", line 105, in upload_block_blob                                                                                                                   │
   │     response = client.upload(                                                                                                                                                                                                                     │
   │   File "/home/airflow/.local/lib/python3.8/site-packages/azure/core/tracing/decorator.py", line 78, in wrapper_use_tracer                                                                                                                         │
   │     return func(*args, **kwargs)                                                                                                                                                                                                                  │
   │   File "/home/airflow/.local/lib/python3.8/site-packages/azure/storage/blob/_generated/operations/_block_blob_operations.py", line 864, in upload                                                                                                 │
   │     map_error(status_code=response.status_code, response=response, error_map=error_map)                                                                                                                                                           │
   │   File "/home/airflow/.local/lib/python3.8/site-packages/azure/core/exceptions.py", line 164, in map_error                                                                                                                                        │
   │     raise error                                                                                                                                                                                                                                   │
   │ azure.core.exceptions.ResourceNotFoundError: The specified container does not exist.                                                                                                                                                              │
   │ RequestId:b1e6ba42-b01e-005c-1f34-60c86e000000                                                                                                                                                                                                    │
   │ Time:2024-02-15T17:26:13.0720393Z                                                                                                                                                                                                                 │
   │ ErrorCode:ContainerNotFound                                                                                                                                                                                                                       │
   │ Content: <?xml version="1.0" encoding="utf-8"?><Error><Code>ContainerNotFound</Code><Message>The specified container does not exist.                                                                                                              │
   │ RequestId:b1e6ba42-b01e-005c-1f34-60c86e000000                                                                                                                                                                                                    │
   │ Time:2024-02-15T17:26:13.0720393Z</Message></Error>
   ```
   
   ### What you think should happen instead
   
   The logs should get uploaded to the provided location in the blob using the configured Azure Blob connection.
   
   ### How to reproduce
   
   Deploy airflow to kubernetes cluster using the official helm chart and use the configurations for remote logging into azure blob with the kubernetes executor. I use the azure blob connection string to authenticate.
   
   ### Anything else
   
   This problem occurs every time the log upload process is initiated. I checked multiple times whether the airflow container exists on the blob, and it does. It's also possible I'm overlooking something / missing something obvious. I was following the docs at: https://airflow.apache.org/docs/apache-airflow-providers-microsoft-azure/stable/logging/index.html
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Azure blob remote logging - The specified container does not exist. [airflow]

Posted by "jaruji (via GitHub)" <gi...@apache.org>.
jaruji commented on issue #37459:
URL: https://github.com/apache/airflow/issues/37459#issuecomment-1947350151

   Okay that fixed it. The docs for this should get updated, as the information written there regarding the azure blob remote logging setup is not up to date. It took me a fair bit of time to figure out that the issue was not on my side. Thanks for the patience and help!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Azure blob remote logging - The specified container does not exist. [airflow]

Posted by "boring-cyborg[bot] (via GitHub)" <gi...@apache.org>.
boring-cyborg[bot] commented on issue #37459:
URL: https://github.com/apache/airflow/issues/37459#issuecomment-1946735479

   Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Azure blob remote logging - The specified container does not exist. [airflow]

Posted by "jaruji (via GitHub)" <gi...@apache.org>.
jaruji commented on issue #37459:
URL: https://github.com/apache/airflow/issues/37459#issuecomment-1946927299

   Ok, if I understood correctly, I was supposed to add a azure_remote_logging param into my config:
   
   ```
   config:
       logging:
         remote_logging: 'True'
         remote_base_log_folder: wasb-airflow/logs
         remote_wasb_log_container: airflow
         azure_remote_logging: airflow
         remote_log_conn_id: azure_blob_storage
   ```
   So I did this, but it still did not work. However if I create the fallback container (airflow-logs) on our blob storage, the logs get stored correctly. So now the question is how does a person configure a custom container name instead of the predefined airflow-logs one. Maybe I missed something?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Azure blob remote logging - The specified container does not exist. [airflow]

Posted by "Taragolis (via GitHub)" <gi...@apache.org>.
Taragolis commented on issue #37459:
URL: https://github.com/apache/airflow/issues/37459#issuecomment-1947316985

   It is separate section, something like that.
   
   ```yaml
   config:
       logging:
         remote_logging: 'True'
         remote_base_log_folder: wasb-airflow/logs
         remote_log_conn_id: azure_blob_storage
   
       azure_remote_logging:
         remote_wasb_log_container: airflow
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Azure blob remote logging - The specified container does not exist. [airflow]

Posted by "Taragolis (via GitHub)" <gi...@apache.org>.
Taragolis commented on issue #37459:
URL: https://github.com/apache/airflow/issues/37459#issuecomment-1946827663

   I guess it might be wrong information into the with documentation. `remote_wasb_log_container` expected to retrieved from  `azure_remote_logging` section.
   
   https://github.com/apache/airflow/blob/b75f9e880614fa0427e7d24a1817955f5de658b3/airflow/config_templates/airflow_local_settings.py#L248-L250
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Azure blob remote logging - The specified container does not exist. [airflow]

Posted by "Taragolis (via GitHub)" <gi...@apache.org>.
Taragolis commented on issue #37459:
URL: https://github.com/apache/airflow/issues/37459#issuecomment-1946831240

   Could you change it in your config and try again?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org