You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by הילה ויזן <hi...@gmail.com> on 2016/08/28 12:25:32 UTC

airflow is stuck

Hi,
We use airflow 1.7.1.3 with Celery (postgress as its backend DB).
After a few hours, we noticed that no task is executed.
Some tasks failed before it happened.

from worker log:

[2016-08-28 11:31:33,300] {__init__.py:36} INFO - Using executor
CeleryExecutor
Logging into:
/var/log/airflow//daily_agg/activities_per_day_in_week_task/2016-08-19T02:00:00
[2016-08-28 11:31:34,663] {__init__.py:36} INFO - Using executor
CeleryExecutor
Traceback (most recent call last):
  File "/usr/bin/airflow", line 15, in <module>
    args.func(args)
  File "/usr/lib/python2.7/site-packages/airflow/bin/cli.py", line 237, in
run
    pool=args.pool,
  File "/usr/lib/python2.7/site-packages/airflow/utils/db.py", line 53, in
wrapper
    result = func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/airflow/models.py", line 1245, in
run
    result = task_copy.execute(context=context)
  File
"/usr/lib/python2.7/site-packages/airflow/operators/bash_operator.py", line
83, in execute
    raise AirflowException("Bash command failed")
airflow.exceptions.AirflowException: Bash command failed


When we tried to see airflow processes, we saw that there are processes
related to postgres in state of 'idle'

[root@hadoop01 ~]# ps -ef | grep airflow
root     12884  3772 14 08:11 pts/7    00:30:40 /usr/bin/python
/usr/bin/airflow scheduler
root     12918 12901  0 08:11 pts/8    00:00:04 /usr/bin/python
/usr/bin/airflow serve_logs
root     13577 13253  0 08:11 pts/10   00:00:05 gunicorn: master
[airflow-webserver]
root     13943 13577  0 08:11 pts/10   00:00:04 gunicorn: worker
[airflow-webserver]
root     13945 13577  0 08:11 pts/10   00:00:07 gunicorn: worker
[airflow-webserver]
root     13950 13577  0 08:11 pts/10   00:00:06 gunicorn: worker
[airflow-webserver]
root     13954 13577  0 08:11 pts/10   00:00:05 gunicorn: worker
[airflow-webserver]
postgres 22139  1847  0 11:39 ?        00:00:00 postgres: airflow
airflow_db ::1(52324) idle
postgres 22312  1847  0 11:39 ?        00:00:00 postgres: airflow
airflow_db ::1(52365) idle
postgres 24844  1847  0 11:47 ?        00:00:00 postgres: airflow
airflow_db ::1(53002) idle in transaction
root     24849 12222  0 11:47 pts/9    00:00:00 grep --color=auto airflow
postgres 55388  1847  0 11:10 ?        00:00:00 postgres: airflow
airflow_db ::1(47759) idle
postgres 61412  1847  0 11:13 ?        00:00:00 postgres: airflow
airflow_db ::1(48266) idle


any idea?

Thanks,
Hila

Re: airflow is stuck

Posted by הילה ויזן <hi...@gmail.com>.
scheduler log (partial) is attached.
I can't find broker logs, searched it under: /var/log/rabbitmq, is there
other place?

After killing all airflow processes and restarting them, tasks are running
again.

Andrew is right, sometimes scheduler died, but that is not the case in my
scenario.

On Sun, Aug 28, 2016 at 3:27 PM, Bolke de Bruin <bd...@gmail.com> wrote:

> Please also provide logging from the scheduler and broker.
>
> Bolke
>
> > Op 28 aug. 2016, om 14:25 heeft הילה ויזן <hi...@gmail.com> het
> volgende geschreven:
> >
> > Hi,
> > We use airflow 1.7.1.3 with Celery (postgress as its backend DB).
> > After a few hours, we noticed that no task is executed.
> > Some tasks failed before it happened.
> >
> > from worker log:
> >
> > [2016-08-28 11:31:33,300] {__init__.py:36} INFO - Using executor
> > CeleryExecutor
> > Logging into:
> > /var/log/airflow//daily_agg/activities_per_day_in_week_
> task/2016-08-19T02:00:00
> > [2016-08-28 11:31:34,663] {__init__.py:36} INFO - Using executor
> > CeleryExecutor
> > Traceback (most recent call last):
> >  File "/usr/bin/airflow", line 15, in <module>
> >    args.func(args)
> >  File "/usr/lib/python2.7/site-packages/airflow/bin/cli.py", line 237,
> in
> > run
> >    pool=args.pool,
> >  File "/usr/lib/python2.7/site-packages/airflow/utils/db.py", line 53,
> in
> > wrapper
> >    result = func(*args, **kwargs)
> >  File "/usr/lib/python2.7/site-packages/airflow/models.py", line 1245,
> in
> > run
> >    result = task_copy.execute(context=context)
> >  File
> > "/usr/lib/python2.7/site-packages/airflow/operators/bash_operator.py",
> line
> > 83, in execute
> >    raise AirflowException("Bash command failed")
> > airflow.exceptions.AirflowException: Bash command failed
> >
> >
> > When we tried to see airflow processes, we saw that there are processes
> > related to postgres in state of 'idle'
> >
> > [root@hadoop01 ~]# ps -ef | grep airflow
> > root     12884  3772 14 08:11 pts/7    00:30:40 /usr/bin/python
> > /usr/bin/airflow scheduler
> > root     12918 12901  0 08:11 pts/8    00:00:04 /usr/bin/python
> > /usr/bin/airflow serve_logs
> > root     13577 13253  0 08:11 pts/10   00:00:05 gunicorn: master
> > [airflow-webserver]
> > root     13943 13577  0 08:11 pts/10   00:00:04 gunicorn: worker
> > [airflow-webserver]
> > root     13945 13577  0 08:11 pts/10   00:00:07 gunicorn: worker
> > [airflow-webserver]
> > root     13950 13577  0 08:11 pts/10   00:00:06 gunicorn: worker
> > [airflow-webserver]
> > root     13954 13577  0 08:11 pts/10   00:00:05 gunicorn: worker
> > [airflow-webserver]
> > postgres 22139  1847  0 11:39 ?        00:00:00 postgres: airflow
> > airflow_db ::1(52324) idle
> > postgres 22312  1847  0 11:39 ?        00:00:00 postgres: airflow
> > airflow_db ::1(52365) idle
> > postgres 24844  1847  0 11:47 ?        00:00:00 postgres: airflow
> > airflow_db ::1(53002) idle in transaction
> > root     24849 12222  0 11:47 pts/9    00:00:00 grep --color=auto airflow
> > postgres 55388  1847  0 11:10 ?        00:00:00 postgres: airflow
> > airflow_db ::1(47759) idle
> > postgres 61412  1847  0 11:13 ?        00:00:00 postgres: airflow
> > airflow_db ::1(48266) idle
> >
> >
> > any idea?
> >
> > Thanks,
> > Hila
>
>

Re: airflow is stuck

Posted by Bolke de Bruin <bd...@gmail.com>.
Please also provide logging from the scheduler and broker.

Bolke

> Op 28 aug. 2016, om 14:25 heeft הילה ויזן <hi...@gmail.com> het volgende geschreven:
> 
> Hi,
> We use airflow 1.7.1.3 with Celery (postgress as its backend DB).
> After a few hours, we noticed that no task is executed.
> Some tasks failed before it happened.
> 
> from worker log:
> 
> [2016-08-28 11:31:33,300] {__init__.py:36} INFO - Using executor
> CeleryExecutor
> Logging into:
> /var/log/airflow//daily_agg/activities_per_day_in_week_task/2016-08-19T02:00:00
> [2016-08-28 11:31:34,663] {__init__.py:36} INFO - Using executor
> CeleryExecutor
> Traceback (most recent call last):
>  File "/usr/bin/airflow", line 15, in <module>
>    args.func(args)
>  File "/usr/lib/python2.7/site-packages/airflow/bin/cli.py", line 237, in
> run
>    pool=args.pool,
>  File "/usr/lib/python2.7/site-packages/airflow/utils/db.py", line 53, in
> wrapper
>    result = func(*args, **kwargs)
>  File "/usr/lib/python2.7/site-packages/airflow/models.py", line 1245, in
> run
>    result = task_copy.execute(context=context)
>  File
> "/usr/lib/python2.7/site-packages/airflow/operators/bash_operator.py", line
> 83, in execute
>    raise AirflowException("Bash command failed")
> airflow.exceptions.AirflowException: Bash command failed
> 
> 
> When we tried to see airflow processes, we saw that there are processes
> related to postgres in state of 'idle'
> 
> [root@hadoop01 ~]# ps -ef | grep airflow
> root     12884  3772 14 08:11 pts/7    00:30:40 /usr/bin/python
> /usr/bin/airflow scheduler
> root     12918 12901  0 08:11 pts/8    00:00:04 /usr/bin/python
> /usr/bin/airflow serve_logs
> root     13577 13253  0 08:11 pts/10   00:00:05 gunicorn: master
> [airflow-webserver]
> root     13943 13577  0 08:11 pts/10   00:00:04 gunicorn: worker
> [airflow-webserver]
> root     13945 13577  0 08:11 pts/10   00:00:07 gunicorn: worker
> [airflow-webserver]
> root     13950 13577  0 08:11 pts/10   00:00:06 gunicorn: worker
> [airflow-webserver]
> root     13954 13577  0 08:11 pts/10   00:00:05 gunicorn: worker
> [airflow-webserver]
> postgres 22139  1847  0 11:39 ?        00:00:00 postgres: airflow
> airflow_db ::1(52324) idle
> postgres 22312  1847  0 11:39 ?        00:00:00 postgres: airflow
> airflow_db ::1(52365) idle
> postgres 24844  1847  0 11:47 ?        00:00:00 postgres: airflow
> airflow_db ::1(53002) idle in transaction
> root     24849 12222  0 11:47 pts/9    00:00:00 grep --color=auto airflow
> postgres 55388  1847  0 11:10 ?        00:00:00 postgres: airflow
> airflow_db ::1(47759) idle
> postgres 61412  1847  0 11:13 ?        00:00:00 postgres: airflow
> airflow_db ::1(48266) idle
> 
> 
> any idea?
> 
> Thanks,
> Hila


Re: airflow is stuck

Posted by Andrew Phillips <an...@apache.org>.
> After a few hours, we noticed that no task is executed.

We've run into a similar situation, which may or may not be related. In 
our case, the scheduler seems to die; there are suddenly no more active 
scheduler threads.

Restarting the scheduler resolves the issue, although we sometimes need 
to do that a few times to clear what seems to be a "backlog" of tasks.

The scheduler logs should indicate whether this may also be what's 
happening in your case.

Regards

ap