You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by הילה ויזן <hi...@gmail.com> on 2016/08/28 12:25:32 UTC
airflow is stuck
Hi,
We use airflow 1.7.1.3 with Celery (postgress as its backend DB).
After a few hours, we noticed that no task is executed.
Some tasks failed before it happened.
from worker log:
[2016-08-28 11:31:33,300] {__init__.py:36} INFO - Using executor
CeleryExecutor
Logging into:
/var/log/airflow//daily_agg/activities_per_day_in_week_task/2016-08-19T02:00:00
[2016-08-28 11:31:34,663] {__init__.py:36} INFO - Using executor
CeleryExecutor
Traceback (most recent call last):
File "/usr/bin/airflow", line 15, in <module>
args.func(args)
File "/usr/lib/python2.7/site-packages/airflow/bin/cli.py", line 237, in
run
pool=args.pool,
File "/usr/lib/python2.7/site-packages/airflow/utils/db.py", line 53, in
wrapper
result = func(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/airflow/models.py", line 1245, in
run
result = task_copy.execute(context=context)
File
"/usr/lib/python2.7/site-packages/airflow/operators/bash_operator.py", line
83, in execute
raise AirflowException("Bash command failed")
airflow.exceptions.AirflowException: Bash command failed
When we tried to see airflow processes, we saw that there are processes
related to postgres in state of 'idle'
[root@hadoop01 ~]# ps -ef | grep airflow
root 12884 3772 14 08:11 pts/7 00:30:40 /usr/bin/python
/usr/bin/airflow scheduler
root 12918 12901 0 08:11 pts/8 00:00:04 /usr/bin/python
/usr/bin/airflow serve_logs
root 13577 13253 0 08:11 pts/10 00:00:05 gunicorn: master
[airflow-webserver]
root 13943 13577 0 08:11 pts/10 00:00:04 gunicorn: worker
[airflow-webserver]
root 13945 13577 0 08:11 pts/10 00:00:07 gunicorn: worker
[airflow-webserver]
root 13950 13577 0 08:11 pts/10 00:00:06 gunicorn: worker
[airflow-webserver]
root 13954 13577 0 08:11 pts/10 00:00:05 gunicorn: worker
[airflow-webserver]
postgres 22139 1847 0 11:39 ? 00:00:00 postgres: airflow
airflow_db ::1(52324) idle
postgres 22312 1847 0 11:39 ? 00:00:00 postgres: airflow
airflow_db ::1(52365) idle
postgres 24844 1847 0 11:47 ? 00:00:00 postgres: airflow
airflow_db ::1(53002) idle in transaction
root 24849 12222 0 11:47 pts/9 00:00:00 grep --color=auto airflow
postgres 55388 1847 0 11:10 ? 00:00:00 postgres: airflow
airflow_db ::1(47759) idle
postgres 61412 1847 0 11:13 ? 00:00:00 postgres: airflow
airflow_db ::1(48266) idle
any idea?
Thanks,
Hila
Re: airflow is stuck
Posted by הילה ויזן <hi...@gmail.com>.
scheduler log (partial) is attached.
I can't find broker logs, searched it under: /var/log/rabbitmq, is there
other place?
After killing all airflow processes and restarting them, tasks are running
again.
Andrew is right, sometimes scheduler died, but that is not the case in my
scenario.
On Sun, Aug 28, 2016 at 3:27 PM, Bolke de Bruin <bd...@gmail.com> wrote:
> Please also provide logging from the scheduler and broker.
>
> Bolke
>
> > Op 28 aug. 2016, om 14:25 heeft הילה ויזן <hi...@gmail.com> het
> volgende geschreven:
> >
> > Hi,
> > We use airflow 1.7.1.3 with Celery (postgress as its backend DB).
> > After a few hours, we noticed that no task is executed.
> > Some tasks failed before it happened.
> >
> > from worker log:
> >
> > [2016-08-28 11:31:33,300] {__init__.py:36} INFO - Using executor
> > CeleryExecutor
> > Logging into:
> > /var/log/airflow//daily_agg/activities_per_day_in_week_
> task/2016-08-19T02:00:00
> > [2016-08-28 11:31:34,663] {__init__.py:36} INFO - Using executor
> > CeleryExecutor
> > Traceback (most recent call last):
> > File "/usr/bin/airflow", line 15, in <module>
> > args.func(args)
> > File "/usr/lib/python2.7/site-packages/airflow/bin/cli.py", line 237,
> in
> > run
> > pool=args.pool,
> > File "/usr/lib/python2.7/site-packages/airflow/utils/db.py", line 53,
> in
> > wrapper
> > result = func(*args, **kwargs)
> > File "/usr/lib/python2.7/site-packages/airflow/models.py", line 1245,
> in
> > run
> > result = task_copy.execute(context=context)
> > File
> > "/usr/lib/python2.7/site-packages/airflow/operators/bash_operator.py",
> line
> > 83, in execute
> > raise AirflowException("Bash command failed")
> > airflow.exceptions.AirflowException: Bash command failed
> >
> >
> > When we tried to see airflow processes, we saw that there are processes
> > related to postgres in state of 'idle'
> >
> > [root@hadoop01 ~]# ps -ef | grep airflow
> > root 12884 3772 14 08:11 pts/7 00:30:40 /usr/bin/python
> > /usr/bin/airflow scheduler
> > root 12918 12901 0 08:11 pts/8 00:00:04 /usr/bin/python
> > /usr/bin/airflow serve_logs
> > root 13577 13253 0 08:11 pts/10 00:00:05 gunicorn: master
> > [airflow-webserver]
> > root 13943 13577 0 08:11 pts/10 00:00:04 gunicorn: worker
> > [airflow-webserver]
> > root 13945 13577 0 08:11 pts/10 00:00:07 gunicorn: worker
> > [airflow-webserver]
> > root 13950 13577 0 08:11 pts/10 00:00:06 gunicorn: worker
> > [airflow-webserver]
> > root 13954 13577 0 08:11 pts/10 00:00:05 gunicorn: worker
> > [airflow-webserver]
> > postgres 22139 1847 0 11:39 ? 00:00:00 postgres: airflow
> > airflow_db ::1(52324) idle
> > postgres 22312 1847 0 11:39 ? 00:00:00 postgres: airflow
> > airflow_db ::1(52365) idle
> > postgres 24844 1847 0 11:47 ? 00:00:00 postgres: airflow
> > airflow_db ::1(53002) idle in transaction
> > root 24849 12222 0 11:47 pts/9 00:00:00 grep --color=auto airflow
> > postgres 55388 1847 0 11:10 ? 00:00:00 postgres: airflow
> > airflow_db ::1(47759) idle
> > postgres 61412 1847 0 11:13 ? 00:00:00 postgres: airflow
> > airflow_db ::1(48266) idle
> >
> >
> > any idea?
> >
> > Thanks,
> > Hila
>
>
Re: airflow is stuck
Posted by Bolke de Bruin <bd...@gmail.com>.
Please also provide logging from the scheduler and broker.
Bolke
> Op 28 aug. 2016, om 14:25 heeft הילה ויזן <hi...@gmail.com> het volgende geschreven:
>
> Hi,
> We use airflow 1.7.1.3 with Celery (postgress as its backend DB).
> After a few hours, we noticed that no task is executed.
> Some tasks failed before it happened.
>
> from worker log:
>
> [2016-08-28 11:31:33,300] {__init__.py:36} INFO - Using executor
> CeleryExecutor
> Logging into:
> /var/log/airflow//daily_agg/activities_per_day_in_week_task/2016-08-19T02:00:00
> [2016-08-28 11:31:34,663] {__init__.py:36} INFO - Using executor
> CeleryExecutor
> Traceback (most recent call last):
> File "/usr/bin/airflow", line 15, in <module>
> args.func(args)
> File "/usr/lib/python2.7/site-packages/airflow/bin/cli.py", line 237, in
> run
> pool=args.pool,
> File "/usr/lib/python2.7/site-packages/airflow/utils/db.py", line 53, in
> wrapper
> result = func(*args, **kwargs)
> File "/usr/lib/python2.7/site-packages/airflow/models.py", line 1245, in
> run
> result = task_copy.execute(context=context)
> File
> "/usr/lib/python2.7/site-packages/airflow/operators/bash_operator.py", line
> 83, in execute
> raise AirflowException("Bash command failed")
> airflow.exceptions.AirflowException: Bash command failed
>
>
> When we tried to see airflow processes, we saw that there are processes
> related to postgres in state of 'idle'
>
> [root@hadoop01 ~]# ps -ef | grep airflow
> root 12884 3772 14 08:11 pts/7 00:30:40 /usr/bin/python
> /usr/bin/airflow scheduler
> root 12918 12901 0 08:11 pts/8 00:00:04 /usr/bin/python
> /usr/bin/airflow serve_logs
> root 13577 13253 0 08:11 pts/10 00:00:05 gunicorn: master
> [airflow-webserver]
> root 13943 13577 0 08:11 pts/10 00:00:04 gunicorn: worker
> [airflow-webserver]
> root 13945 13577 0 08:11 pts/10 00:00:07 gunicorn: worker
> [airflow-webserver]
> root 13950 13577 0 08:11 pts/10 00:00:06 gunicorn: worker
> [airflow-webserver]
> root 13954 13577 0 08:11 pts/10 00:00:05 gunicorn: worker
> [airflow-webserver]
> postgres 22139 1847 0 11:39 ? 00:00:00 postgres: airflow
> airflow_db ::1(52324) idle
> postgres 22312 1847 0 11:39 ? 00:00:00 postgres: airflow
> airflow_db ::1(52365) idle
> postgres 24844 1847 0 11:47 ? 00:00:00 postgres: airflow
> airflow_db ::1(53002) idle in transaction
> root 24849 12222 0 11:47 pts/9 00:00:00 grep --color=auto airflow
> postgres 55388 1847 0 11:10 ? 00:00:00 postgres: airflow
> airflow_db ::1(47759) idle
> postgres 61412 1847 0 11:13 ? 00:00:00 postgres: airflow
> airflow_db ::1(48266) idle
>
>
> any idea?
>
> Thanks,
> Hila
Re: airflow is stuck
Posted by Andrew Phillips <an...@apache.org>.
> After a few hours, we noticed that no task is executed.
We've run into a similar situation, which may or may not be related. In
our case, the scheduler seems to die; there are suddenly no more active
scheduler threads.
Restarting the scheduler resolves the issue, although we sometimes need
to do that a few times to clear what seems to be a "backlog" of tasks.
The scheduler logs should indicate whether this may also be what's
happening in your case.
Regards
ap