You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "David Tan (Jira)" <ji...@apache.org> on 2019/12/26 07:26:00 UTC

[jira] [Created] (AIRFLOW-6354) Intermittent warnings: "Processor for some/dag.py exited with return code -9"

David Tan created AIRFLOW-6354:
----------------------------------

             Summary: Intermittent warnings: "Processor for some/dag.py exited with return code -9"
                 Key: AIRFLOW-6354
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-6354
             Project: Apache Airflow
          Issue Type: Bug
          Components: DAG
    Affects Versions: 1.10.5
            Reporter: David Tan


Hi there,

Thank you for your work on Airflow. We've observed some intermittent errors with {{DagFileProcessorManager}} and we would like to request your assistance in understanding the error.

*Context*

We've provisioned an instance on Airflow on a Kubernetes cluster using the bitnami airflow helm chart. We get the following warning every few hours (and sometimes every few minutes - we didn't find any pattern to this).
{code:java}
# most times, we see these healthy info logs
[2019-12-20 11:49:27,956] {dag_processing.py:892} INFO - Searching for files in /opt/bitnami/airflow/dags 
[2019-12-20 11:49:28,895] {dag_processing.py:895} INFO - There are 2 files in /opt/bitnami/airflow/dags 
[2019-12-20 11:54:29,749] {dag_processing.py:892} INFO - Searching for files in /opt/bitnami/airflow/dags 
[2019-12-20 11:54:31,798] {dag_processing.py:895} INFO - There are 2 files in /opt/bitnami/airflow/dags 

...

# but at random intervals, we see this error logs saying that a processor for a particular dag has timed out (after 30 seconds), and processor exits with an error code -9.
[2019-12-20 11:56:04,814] {dag_processing.py:1229} INFO - Processor for /opt/bitnami/airflow/dags/multi_producer_dags.py with PID 19537 started at 2019-12-20T11:55:34.728282+00:00 has timed out, killing it. 
[2019-12-20 11:56:04,822] {dag_processing.py:1143} WARNING - Processor for /opt/bitnami/airflow/dags/multi_producer_dags.py exited with return code -9.
{code}

We've traced through the code path in airflow/utils/dag_processing.py which issues both the healthy log events and unhealthy log events:

 

 
{code:java}
# healthy info logs
DagFileProcessorManager#start() -> self._refresh_dag_dir() -> log.info("Searching for files...") and log.info("There are N files in ...")

# timeout error logs
DagFileProcessorManager#start() -> self.heartbeat() -> self.collect_results() -> _kill_timed_out_processors() -> log.error("Processor for [/some/dag.py] with [pid] started at [time] has timed out, killing it") and processor.kill()
{code}
 


Basically, when the processor manager loop runs every 5 minutes, 
{{_kill_timed_out_processors() }}is called without any problems. But sometimes processor durations takes longer than 30 seconds. *Could you please help us understand if this intermittent timeouts in processor is something that is to be expected? If it is, we will ignore this warning log in our alerts.*
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)