You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/08/13 12:21:57 UTC

[GitHub] [airflow] aioannoa opened a new issue #17605: Airflow stdout not working/console problem

aioannoa opened a new issue #17605:
URL: https://github.com/apache/airflow/issues/17605


   
   **Apache Airflow version**:
   Airflow 2.0.0
   
   **Apache Airflow Provider versions**:
   apache-airflow-providers-ftp==1.0.0
   apache-airflow-providers-http==1.0.0
   apache-airflow-providers-imap==1.0.0
   apache-airflow-providers-postgres==1.0.1
   apache-airflow-providers-sqlite==1.0.0
   
   **Kubernetes version (if you are using kubernetes)**:
   Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.0", GitCommit:"e8462b5b5dc2584fdcd18e6bcfe9f1e4d970a529", GitTreeState:"clean", BuildDate:"2019-06-19T16:40:16Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
   Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.12", GitCommit:"e2a822d9f3c2fdb5c9bfbe64313cf9f657f0a725", GitTreeState:"clean", BuildDate:"2020-05-06T05:09:48Z", GoVersion:"go1.12.17", Compiler:"gc", Platform:"linux/amd64"}
   
   **Environment**:
   - **Cloud provider or hardware configuration**: 
   AWS Cloud - EC2 instance
   - **OS** (e.g. from /etc/os-release):
   NAME="Ubuntu"
   VERSION="16.04.3 LTS (Xenial Xerus)"
   ID=ubuntu
   ID_LIKE=debian
   PRETTY_NAME="Ubuntu 16.04.3 LTS"
   VERSION_ID="16.04"
   HOME_URL="http://www.ubuntu.com/"
   SUPPORT_URL="http://help.ubuntu.com/"
   BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
   VERSION_CODENAME=xenial
   UBUNTU_CODENAME=xenial
   - **Kernel** (e.g. `uname -a`):
   Linux ip-172-25-1-109 4.4.0-1128-aws #142-Ubuntu SMP Fri Apr 16 12:42:33 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
   - **Install tools**:
   - **Others**:
   
   
   **What happened**:
   I have been trying to get Airflow logs to be printed to stdout by:
   1. creating a new python script, shown below, named log_config.py, under the directory config as instructed here: https://airflow.apache.org/docs/apache-airflow/stable/logging-monitoring/logging-tasks.html
   
   from copy import deepcopy
   from airflow.config_templates.airflow_local_settings import DEFAULT_LOGGING_CONFIG
   import sys
   
   LOGGING_CONFIG = deepcopy(DEFAULT_LOGGING_CONFIG)
   LOGGING_CONFIG["handlers"]["stdouttask"] = {
       "class": "logging.StreamHandler",
       "formatter": "airflow",
       "stream": sys.stdout,
   }
   LOGGING_CONFIG["loggers"]["airflow.task"]["handlers"] = ["stdouttask"]
   
   2. setting logging_config_class = log_config.LOGGING_CONFIG, and task_log_reader = stdouttask in the airflow.cfg file. 
   
   After checking the logs, the logs are still printed into a file, so this has not worked out for me. After searching online I noticed that people suggest using the console handler instead. However, when using this the pod becomes unresponsive, and this affects other pods as well as the system. ssh won't work anymore for some time, I guess while the dag runs, and other pods will be unresponsive or even down, e.g. the database of the cluster. I have read that there may be some memory leak issues with this. Has anyone been able to verify whether this is the case and under which circumstances this causes a problem? I have not been able to find a clear answer, neither online or in the Airflow documentation.
   
   **What you expected to happen**:
   I was expecting all Airflow logs to be printed to stdout alone.
   
   <!-- What do you think went wrong? -->
   For the 1st part, no idea. 
   For the 2nd part, memory leak or cpu exhaustion.
   
   **How to reproduce it**:
   For the 1st part, try the abive as is.
   For the 2nd part, try:
   LOGGING_CONFIG["loggers"]["airflow.task"]["handlers"] = ["console", "stdouttask"]
   
   
   **Anything else we need to know**:
   How often does this problem occur? Once? Every time etc?
   Every time.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #17605: Airflow stdout not working/console problem

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #17605:
URL: https://github.com/apache/airflow/issues/17605#issuecomment-900977527


   You still have not written how your deployument is runing. From what you describe, it seems quite likely that somewhere in your deployment you run airflow with `--stderr`, `--stdout`, and `--log-file` options of scheduler which redirect everything to those files you mentioned (.err, .out, .log). https://airflow.apache.org/docs/apache-airflow/stable/cli-and-env-variables-ref.html#scheduler It has nothing to do with Airflow, but everything to do with how your deployment starts it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] aioannoa edited a comment on issue #17605: Airflow stdout not working/console problem

Posted by GitBox <gi...@apache.org>.
aioannoa edited a comment on issue #17605:
URL: https://github.com/apache/airflow/issues/17605#issuecomment-901034398


   Thanks @potiuk for the feedback. I got it. I will exclude the -D option since the stdout is what I need the most. I will find another way to enforce the behaviour that I wanted, thus the -D. Once again, thanks for ur time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #17605: Airflow stdout not working/console problem

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #17605:
URL: https://github.com/apache/airflow/issues/17605#issuecomment-898536080


   You have no written how you deployed your Airflow.
   
   From the looks of it - the fact that the logs were still printed in log files of airflow -  I am almost 100% sure,your changes had a typo or they were not applied (for example you have not restarted the pods after changing the configuration.
   
   The easiest way you can check it is, to dump information about your loggers when you are printing logs. When you get the logger, you should be able to see how your logging configuration looks like in your task (you could print it to the logs since they are stored as files). This is explained for example here:
   https://stackoverflow.com/questions/31599940/how-to-print-current-logging-configuration-used-by-the-python-logging-module
   
   I am almost sure that when you try to do it, you will see that your custom configuration has not been actually applied and you will need to investigate why.
   
   Note that ALL your pods and workers (whatever deployment you have) have to have exactly the same configuration. Airflow is a distributed system, and you have to make sure that if you change the configuration, the configuration is available in all your machines and workers (for example by sharing a common image with the configuration or applying the helm chart).
   
   My very wild guess is that you applied the configuration change only on one machine (scheduler for example) but the same configuration has not been updated on the workers or pods that you use to run tasks.
   
   I am closing this as invalid until you do some more investigation in this direction and double-check that your configuration is properly applied.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #17605: Airflow stdout not working/console problem

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #17605:
URL: https://github.com/apache/airflow/issues/17605#issuecomment-898536080


   You have not described how you deployed your Airflow.
   
   From the looks of it - the fact that the logs were still printed in log files of airflow -  I am almost 100% sure,your changes had a typo or they were not applied (for example you have not restarted the pods after changing the configuration.
   
   The easiest way you can check it is, to dump information about your loggers when you are printing logs. When you get the logger, you should be able to see how your logging configuration looks like in your task (you could print it to the logs since they are stored as files). This is explained for example here:
   https://stackoverflow.com/questions/31599940/how-to-print-current-logging-configuration-used-by-the-python-logging-module
   
   I am almost sure that when you try to do it, you will see that your custom configuration has not been actually applied and you will need to investigate why.
   
   Note that ALL your pods and workers (whatever deployment you have) have to have exactly the same configuration. Airflow is a distributed system, and you have to make sure that if you change the configuration, the configuration is available in all your machines and workers (for example by sharing a common image with the configuration or applying the helm chart).
   
   My very wild guess is that you applied the configuration change only on one machine (scheduler for example) but the same configuration has not been updated on the workers or pods that you use to run tasks.
   
   I am closing this as invalid until you do some more investigation in this direction and double-check that your configuration is properly applied.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] aioannoa edited a comment on issue #17605: Airflow stdout not working/console problem

Posted by GitBox <gi...@apache.org>.
aioannoa edited a comment on issue #17605:
URL: https://github.com/apache/airflow/issues/17605#issuecomment-901002420


   Apologies for that. You are right. So, how can I get this info from my setup? I use Airflow as part of a Kubernetes cluster. In its entry_point.sh script I am using the following:
   
   airflow db init
   airflow scheduler -D &
   
   That is after I have setup the database connection of Airflow. Which I don't think is any relevant.
   Would an output of the core in Airflow's .cfg help?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk closed issue #17605: Airflow stdout not working/console problem

Posted by GitBox <gi...@apache.org>.
potiuk closed issue #17605:
URL: https://github.com/apache/airflow/issues/17605


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #17605: Airflow stdout not working/console problem

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #17605:
URL: https://github.com/apache/airflow/issues/17605#issuecomment-901008515


   OK. So yeah. It is the problem of your deployment (as suspected from the beginning). Or rather with not realizing the consequences of certain flags used.
   
   You run airflow with --deamonize flag (-D). ​By definition --deamonize means (via https://linux.die.net/man/1/daemonize):
   
   > daemonize runs a command as a Unix daemon. As defined in W. Richard Stevens' 1990 book, Unix Network Programming (Addison-Wesley, 1990), a daemon is 'a process that executes 'in the background' (i.e., without an associated terminal or login shell) either waiting for some event to occur, or waiting to perform some specified task on a periodic basis.' Upon startup, a typical daemon program will:
   > * Close all open file descriptors (especially standard input, standard output and standard error)
   > * Change its working directory to the root filesystem, to ensure that it doesn't tie up another filesystem and prevent it from being unmounted
   > * Reset its umask value
   > * Run in the background (i.e., fork)
   > * Disassociate from its process group (usually a shell), to insulate itself from signals (such as HUP) sent to the process group
   > * Ignore all terminal I/O signals
   > * Disassociate from the control terminal (and take steps not to reacquire one)
   > *  Handle any SIGCLD signals
   
   So you have no stdout/stderr that you can use when you run --deamonize. Moreover, when you run airflow with -D you do not need to run it in a background (`&` at the end) as airflow will go to background on its own as part of deamonization. 
   
   You either deamonize the process, or want to read its stdout/stderr. You simply cannot do both at the same time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #17605: Airflow stdout not working/console problem

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #17605:
URL: https://github.com/apache/airflow/issues/17605#issuecomment-900977527






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on issue #17605: Airflow stdout not working/console problem

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on issue #17605:
URL: https://github.com/apache/airflow/issues/17605#issuecomment-900977527


   You still have not written how your deployument is runing. From what you describe, it seems quite likely that somewhere in your deployment you run airflow with `--stderr`, `--stdout`, and `--log-file` options of scheduler which redirect everything to those files you mentioned (.err, .out, .log). https://airflow.apache.org/docs/apache-airflow/stable/cli-and-env-variables-ref.html#scheduler It has nothing to do with Airflow, but everything to do with how your deployment starts it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] aioannoa commented on issue #17605: Airflow stdout not working/console problem

Posted by GitBox <gi...@apache.org>.
aioannoa commented on issue #17605:
URL: https://github.com/apache/airflow/issues/17605#issuecomment-900882673






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #17605: Airflow stdout not working/console problem

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #17605:
URL: https://github.com/apache/airflow/issues/17605#issuecomment-900977527


   You still have not written how your deployument is runing. From what you describe, it seems quite likely that somewhere in your deployment you run airflow with `--stderr`, `--stdout`, and '--log-file' options of scheduler which redirect everything to those files you mentioned (.err, .out, .log). It has nothing to do with Airflow, but everything to do with how your deployment starts it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] aioannoa commented on issue #17605: Airflow stdout not working/console problem

Posted by GitBox <gi...@apache.org>.
aioannoa commented on issue #17605:
URL: https://github.com/apache/airflow/issues/17605#issuecomment-900882673


   Hi there, 
   
   I have rechecked the code and I can not honestly see any typo issues. I have also tried this config a couple of times, so I doubt this is an issue. Re the pod, I am always creating a new Docker image which I then load into my cluster, so this is a new pod running the new img version. I have also checked the configuration of Airflow which is displayed as follows:
   [logging]
   base_log_folder = /autoupgr_orch/airflow/logs
   remote_logging = False
   remote_log_conn_id = 
   google_key_path = 
   remote_base_log_folder = 
   encrypt_s3_logs = False
   logging_level = INFO
   fab_logging_level = WARN
   **logging_config_class = log_config.LOGGING_CONFIG**
   colored_console_log = True
   colored_log_format = [%(blue)s%(asctime)s%(reset)s] {%(blue)s%(filename)s:%(reset)s%(lineno)d} %(log_color)s%(levelname)s%(reset)s - %(log_color)s%(message)s%(reset)s
   colored_formatter_class = airflow.utils.log.colored_log.CustomTTYColoredFormatter
   log_format = [%(asctime)s] {%(filename)s:%(lineno)d} %(levelname)s - %(message)s
   simple_log_format = %(asctime)s %(levelname)s - %(message)s
   task_log_prefix_template = 
   log_filename_template = {{ ti.dag_id }}/{{ ti.task_id }}/{{ ts }}/{{ try_number }}.log
   log_processor_filename_template = {{ filename }}.log
   dag_processor_manager_log_location = /autoupgr_orch/airflow/logs/dag_processor_manager/dag_processor_manager.log
   **task_log_reader = stdouttask**
   extra_loggers = 
   
   Shouldn't this be enough to verify my configuration has been applied? I can also see some related log when using "kubectl -f logs pod_name". 
   At this stage perhaps I should clarify what I mean by saying I can still see the logs ini the file. I do not have any logs under airflow/logs related to the dag's name, yet I can see all the logs from the airflow scheduler and individual tasks in a fiile named airflow-scheduler.out. I have tried to search info on this file and I could not find anything. I do also have a .err and .log file. My intention is to have all the output of .out into stdout, and able to access them by usinig "kubectl logs -f pod_name". Am I still missing sth?
   
   Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #17605: Airflow stdout not working/console problem

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #17605:
URL: https://github.com/apache/airflow/issues/17605#issuecomment-898421321


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] aioannoa commented on issue #17605: Airflow stdout not working/console problem

Posted by GitBox <gi...@apache.org>.
aioannoa commented on issue #17605:
URL: https://github.com/apache/airflow/issues/17605#issuecomment-901034398


   Thanks @potiuk for the feedback. I got it. I will exclude the -D ooption since the stdout is what I need the most. I will find another way to enforce the behaviour that I wanted, thus the -D. Once again, thanks for ur time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] aioannoa commented on issue #17605: Airflow stdout not working/console problem

Posted by GitBox <gi...@apache.org>.
aioannoa commented on issue #17605:
URL: https://github.com/apache/airflow/issues/17605#issuecomment-901002420


   Apologies for that. You are right. So, how can I get this info from my setup? I use Airflow as part of a Kubernetes cluster. In its entry_point.sh script I am using the following:
   
   # initialize the database
   airflow db init
   
   # start the scheduler
   airflow scheduler -D &
   
   That is after I have setup the database connection of Airflow. Which I don't think is any relevant.
   Would an output of the core in Airflow's .cfg help?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] aioannoa edited a comment on issue #17605: Airflow stdout not working/console problem

Posted by GitBox <gi...@apache.org>.
aioannoa edited a comment on issue #17605:
URL: https://github.com/apache/airflow/issues/17605#issuecomment-901002420






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org