You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "taharah (via GitHub)" <gi...@apache.org> on 2023/10/05 18:39:32 UTC

[I] Airflow task log handling broken for non-Kubernetes executors [airflow]

taharah opened a new issue, #34783:
URL: https://github.com/apache/airflow/issues/34783

   ### Apache Airflow version
   
   Other Airflow 2 version (please specify below)
   
   ### What happened
   
   After upgrading Airflow from 2.5.3 to 2.6.3, none of our task logs for the ECS operator were being written to the configured file and S3 remote write handlers.
   
   ### What you think should happen instead
   
   Airflow task logs should be written to the configured handlers.
   
   ### How to reproduce
   
   Configure a remote write handler for a non-Kubernetes based executor.
   
   ### Operating System
   
   Amazon Linux 2
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow-providers-amazon = "7.4.1"
   apache-airflow-providers-celery = "3.3.4"
   
   ### Deployment
   
   Other
   
   ### Deployment details
   
   Airflow is deployed to Amazon ECS.
   
   ### Anything else
   
   The issue seems to be a result of the change from #28440. Specifically, the change to remove the handlers from the source logger (https://github.com/apache/airflow/pull/28440/files#diff-ad618185a072910e49c11770954af009d1cc070b120a4fde5f2fc095a588271bR704), which was not the previous behavior.
   
   I am happy to submit a change for this; however, want to get some direction on what others feel is the best path forward, e.g., making this behavior configurable or reverting to the original behavior.
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Airflow task log handling broken for non-Kubernetes executors [airflow]

Posted by "eladkal (via GitHub)" <gi...@apache.org>.
eladkal commented on issue #34783:
URL: https://github.com/apache/airflow/issues/34783#issuecomment-1803147884

   > I'll work on putting together PRs to make the necessary changes.
   
   Hi @taharah did you get a chance to work on it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Airflow task log handling broken for non-Kubernetes executors [airflow]

Posted by "Taragolis (via GitHub)" <gi...@apache.org>.
Taragolis commented on issue #34783:
URL: https://github.com/apache/airflow/issues/34783#issuecomment-1749722882

   Could you provide a bit more details, e.g example of your dag. I've unable to reproduce on current main branch
   
   ```console
   [2023-10-05, 22:05:02 UTC] {ecs.py:531} INFO - Running ECS Task - Task definition: sample-task:1 - on cluster test-cluster
   [2023-10-05, 22:05:02 UTC] {ecs.py:534} INFO - EcsOperator overrides: {'containerOverrides': [{'name': 'busybox', 'command': ['/bin/sh', '-c', 'echo regular; sleep 6; echo finish']}]}
   [2023-10-05, 22:05:02 UTC] {base.py:73} INFO - Using connection ID 'aws_default' for task execution.
   [2023-10-05, 22:05:02 UTC] {credentials.py:1123} INFO - Found credentials in environment variables.
   [2023-10-05, 22:05:04 UTC] {ecs.py:647} INFO - ECS task ID is: aef0db066c2f48faad6303efc27180f0
   [2023-10-05, 22:05:04 UTC] {ecs.py:573} INFO - Starting ECS Task Log Fetcher
   [2023-10-05, 22:05:34 UTC] {base.py:73} INFO - Using connection ID 'aws_default' for task execution.
   [2023-10-05, 22:05:34 UTC] {credentials.py:1123} INFO - Found credentials in environment variables.
   [2023-10-05, 22:05:35 UTC] {task_log_fetcher.py:63} INFO - [2023-10-05 22:05:17,228] regular
   [2023-10-05, 22:05:35 UTC] {task_log_fetcher.py:63} INFO - [2023-10-05 22:05:23,229] finish
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Airflow task log handling broken for non-Kubernetes executors [airflow]

Posted by "boring-cyborg[bot] (via GitHub)" <gi...@apache.org>.
boring-cyborg[bot] commented on issue #34783:
URL: https://github.com/apache/airflow/issues/34783#issuecomment-1749449121

   Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Airflow task log handling broken for non-Kubernetes executors [airflow]

Posted by "taharah (via GitHub)" <gi...@apache.org>.
taharah commented on issue #34783:
URL: https://github.com/apache/airflow/issues/34783#issuecomment-1750713541

   @Taragolis thank you for looking into this so quickly! Shortly after opening this issue, I was able to identify the real root cause for why our logs stopped being written to S3. We had added a logger for `airflow`, which had propagate set to `false`, in order to only write those logs to a file and not the console. The config was:
   
   ```python
   LOGGING_CONFIG["loggers"]["airflow"] = {
       "handlers": ["file"],
       "level": LOG_LEVEL,
       "propagate": False,
   }
   ```
   
   After removing the aforementioned logger, our task logs began to show up as expected. However, the reason we hit this issue was still due to the changes made in #28440. The changes made in that PR introduced a hard requirement for the `airflow.task` logs to be propagated to the root logger in order for the handlers associated with `airflow.task` to be invoked.
   
   Thus, there still needs to be a couple of things done in order to fully resolve this issue.
   
   1. Update the task logging documentation to include a note about requiring that the `airflow.task` logs are able to be propagated to the root logger.
   2. While investigating this issue, I noticed a small bug with the new implementation that results in differing behavior than the original implementation. Namely, the configuration for the root logger is not reverted, i.e., the handlers and log level, when the context manager exits.
   
   I'll work on putting together PRs to make the necessary changes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org