You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Josef Samanek <jo...@gmail.com> on 2017/06/01 21:18:05 UTC

Tasks Queued but never run

Hi!

We have a problem with our airflow. Sometimes, several tasks get queued but they never get run and remain in Queud state forever. Other tasks from the same schedule interval run. And next schedule interval runs normally too. But these several tasks remain queued.

We are using Airflow 1.8.1. Currently with CeleryExecutor and redis, but we had the same problem with LocalExecutor as well (actually switching to Celery helped quite a bit, the problem now happens way less often, but still it happens). We have 18 DAGs total, 13 active. Some have just 1-2 tasks, but some are more complex, like 8 tasks or so and with upstreams. There are also ExternalTaskSensor tasks used. 

I tried playing around with DAG configurations (limiting concurrency, max_active_runs, ...), tried switching off some DAGs completely (not all but most) etc., so far nothing helped. Right now, I am not really sure, what else to try to identify a solve the issue.

I am getting a bit desperate, so I would really appreciate any help with this. Thank you all in advance!

Joe

Re: Tasks Queued but never run

Posted by Bolke de Bruin <bd...@gmail.com>.
I have made PR https://github.com/apache/incubator-airflow/pull/2356 <https://github.com/apache/incubator-airflow/pull/2356> for this. The issue went a little bit deeper than I expected. 

In the backfills we can loose tasks to execute due to a task
setting its own state to NONE if concurrency limits are reached,
this makes them fall outside of the scope the backfill is
managing hence they will not be executed.

Several bugs are the cause for this. Firstly, the state
reported by the executor was always reported as success, ie.
the return code of the task instance was not propagated.
Next to that, if the executor already has a task instance
in its queue it will silently ignore the task instance
being added. The backfills did not guard against this, thus
tasks could get lost here as well.

This patch introduces CONCURRENCY_REACHED as an executor
state, which will be set if the task exits with EBUSY (16).
This allows the backfill to properly handle these tasks
and reschedule them. Please note that the CeleryExecutor
does not report back on executor states.


Please test the patch and report back if it doesn/does not solve the issue.

Bolke.

> On 8 Jun 2017, at 04:23, Russell Pierce <ru...@gmail.com> wrote:
> 
> I hadn't thought of it that way. Given that SubDAGs are scheduled as
> backfills, then they'd inherit the same problem. So, the issue I had is
> version specific. Thanks for pointing that out Bolke. Do you know the
> relevant JIRA Issue off hand?
> 
> On Wed, Jun 7, 2017, 4:28 PM Bolke de Bruin <bd...@gmail.com> wrote:
> 
>> It is 1.8.x specific in this case (for backfills).
>> 
>> Sent from my iPhone
>> 
>>> On 7 Jun 2017, at 21:35, Russell Pierce <ru...@gmail.com>
>> wrote:
>>> 
>>> Probably more of a configuration constellation issue than version
>> specific
>>> or even an 'issue' per se. As noted, on restart the scheduler reschedules
>>> everything. I had a heavy SubDAG that when rescheduled could produce many
>>> extra tasks and a small fixed number of Celery workers. So, the scheduled
>>> tasks wouldn't be done by the time of the scheduler restart and then the
>>> scheduler would reschedule the SubDAG... debugging hilarity followed from
>>> there.
>>> 
>>>> On Wed, Jun 7, 2017, 10:57 AM Jason Chen <ch...@gmail.com>
>> wrote:
>>>> 
>>>> I am using Airflow 1.7.1.3 with CeleryExecutor, but not run into this
>>>> issue.
>>>> I am wondering if this issue is only for 1.8.x ?
>>>> 
>>>> On Wed, Jun 7, 2017 at 8:34 AM, Russell Pierce <
>> russell.s.pierce@gmail.com
>>>>> 
>>>> wrote:
>>>> 
>>>>> Depending on how fast you can clear down your queue, -n can be harmful
>>>> and
>>>>> really stack up your celery queue. Keep an eye on your queue depth of
>> you
>>>>> see a ton of messages about the task already having been run.
>>>>> 
>>>>> On Mon, Jun 5, 2017, 9:18 AM Josef Samanek <jo...@kiwi.com>
>>>> wrote:
>>>>> 
>>>>>> Hey. Thanks for the answer. I previously also tried to run scheduler
>> -n
>>>>>> 10, but it was back when I was still using LocalExecutor. And it did
>>>> not
>>>>>> help. I have not yet tried to do it with CeleryExecutor, so I might.
>>>>>> 
>>>>>> Still, I would prefer to find an actual solution for the underlying
>>>>>> problem, not just a workaround (eventhough a working workaround is
>> also
>>>>>> appreciated).
>>>>>> 
>>>>>> Best regards,
>>>>>> Joe
>>>>>> 
>>>>>> On 2017-06-02 00:10 (+0200), Alex Guziel <alex.guziel@airbnb.com.
>>>>> INVALID>
>>>>>> wrote:
>>>>>>> We've noticed this with celery, relating to this
>>>>>>> https://github.com/celery/celery/issues/3765
>>>>>>> 
>>>>>>> We also use `-n 5` option on the scheduler so it restarts every 5
>>>> runs,
>>>>>>> which will reset all queued tasks.
>>>>>>> 
>>>>>>> Best,
>>>>>>> Alex
>>>>>>> 
>>>>>>> On Thu, Jun 1, 2017 at 2:18 PM, Josef Samanek <
>>>> josef.samanek@gmail.com
>>>>>> 
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Hi!
>>>>>>>> 
>>>>>>>> We have a problem with our airflow. Sometimes, several tasks get
>>>>> queued
>>>>>>>> but they never get run and remain in Queud state forever. Other
>>>> tasks
>>>>>> from
>>>>>>>> the same schedule interval run. And next schedule interval runs
>>>>>> normally
>>>>>>>> too. But these several tasks remain queued.
>>>>>>>> 
>>>>>>>> We are using Airflow 1.8.1. Currently with CeleryExecutor and
>>>> redis,
>>>>>> but
>>>>>>>> we had the same problem with LocalExecutor as well (actually
>>>>> switching
>>>>>> to
>>>>>>>> Celery helped quite a bit, the problem now happens way less often,
>>>>> but
>>>>>>>> still it happens). We have 18 DAGs total, 13 active. Some have just
>>>>> 1-2
>>>>>>>> tasks, but some are more complex, like 8 tasks or so and with
>>>>>> upstreams.
>>>>>>>> There are also ExternalTaskSensor tasks used.
>>>>>>>> 
>>>>>>>> I tried playing around with DAG configurations (limiting
>>>> concurrency,
>>>>>>>> max_active_runs, ...), tried switching off some DAGs completely
>>>> (not
>>>>>> all
>>>>>>>> but most) etc., so far nothing helped. Right now, I am not really
>>>>> sure,
>>>>>>>> what else to try to identify a solve the issue.
>>>>>>>> 
>>>>>>>> I am getting a bit desperate, so I would really appreciate any help
>>>>>> with
>>>>>>>> this. Thank you all in advance!
>>>>>>>> 
>>>>>>>> Joe
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> 


Re: Tasks Queued but never run

Posted by Russell Pierce <ru...@gmail.com>.
I hadn't thought of it that way. Given that SubDAGs are scheduled as
backfills, then they'd inherit the same problem. So, the issue I had is
version specific. Thanks for pointing that out Bolke. Do you know the
relevant JIRA Issue off hand?

On Wed, Jun 7, 2017, 4:28 PM Bolke de Bruin <bd...@gmail.com> wrote:

> It is 1.8.x specific in this case (for backfills).
>
> Sent from my iPhone
>
> > On 7 Jun 2017, at 21:35, Russell Pierce <ru...@gmail.com>
> wrote:
> >
> > Probably more of a configuration constellation issue than version
> specific
> > or even an 'issue' per se. As noted, on restart the scheduler reschedules
> > everything. I had a heavy SubDAG that when rescheduled could produce many
> > extra tasks and a small fixed number of Celery workers. So, the scheduled
> > tasks wouldn't be done by the time of the scheduler restart and then the
> > scheduler would reschedule the SubDAG... debugging hilarity followed from
> > there.
> >
> >> On Wed, Jun 7, 2017, 10:57 AM Jason Chen <ch...@gmail.com>
> wrote:
> >>
> >> I am using Airflow 1.7.1.3 with CeleryExecutor, but not run into this
> >> issue.
> >> I am wondering if this issue is only for 1.8.x ?
> >>
> >> On Wed, Jun 7, 2017 at 8:34 AM, Russell Pierce <
> russell.s.pierce@gmail.com
> >>>
> >> wrote:
> >>
> >>> Depending on how fast you can clear down your queue, -n can be harmful
> >> and
> >>> really stack up your celery queue. Keep an eye on your queue depth of
> you
> >>> see a ton of messages about the task already having been run.
> >>>
> >>> On Mon, Jun 5, 2017, 9:18 AM Josef Samanek <jo...@kiwi.com>
> >> wrote:
> >>>
> >>>> Hey. Thanks for the answer. I previously also tried to run scheduler
> -n
> >>>> 10, but it was back when I was still using LocalExecutor. And it did
> >> not
> >>>> help. I have not yet tried to do it with CeleryExecutor, so I might.
> >>>>
> >>>> Still, I would prefer to find an actual solution for the underlying
> >>>> problem, not just a workaround (eventhough a working workaround is
> also
> >>>> appreciated).
> >>>>
> >>>> Best regards,
> >>>> Joe
> >>>>
> >>>> On 2017-06-02 00:10 (+0200), Alex Guziel <alex.guziel@airbnb.com.
> >>> INVALID>
> >>>> wrote:
> >>>>> We've noticed this with celery, relating to this
> >>>>> https://github.com/celery/celery/issues/3765
> >>>>>
> >>>>> We also use `-n 5` option on the scheduler so it restarts every 5
> >> runs,
> >>>>> which will reset all queued tasks.
> >>>>>
> >>>>> Best,
> >>>>> Alex
> >>>>>
> >>>>> On Thu, Jun 1, 2017 at 2:18 PM, Josef Samanek <
> >> josef.samanek@gmail.com
> >>>>
> >>>>> wrote:
> >>>>>
> >>>>>> Hi!
> >>>>>>
> >>>>>> We have a problem with our airflow. Sometimes, several tasks get
> >>> queued
> >>>>>> but they never get run and remain in Queud state forever. Other
> >> tasks
> >>>> from
> >>>>>> the same schedule interval run. And next schedule interval runs
> >>>> normally
> >>>>>> too. But these several tasks remain queued.
> >>>>>>
> >>>>>> We are using Airflow 1.8.1. Currently with CeleryExecutor and
> >> redis,
> >>>> but
> >>>>>> we had the same problem with LocalExecutor as well (actually
> >>> switching
> >>>> to
> >>>>>> Celery helped quite a bit, the problem now happens way less often,
> >>> but
> >>>>>> still it happens). We have 18 DAGs total, 13 active. Some have just
> >>> 1-2
> >>>>>> tasks, but some are more complex, like 8 tasks or so and with
> >>>> upstreams.
> >>>>>> There are also ExternalTaskSensor tasks used.
> >>>>>>
> >>>>>> I tried playing around with DAG configurations (limiting
> >> concurrency,
> >>>>>> max_active_runs, ...), tried switching off some DAGs completely
> >> (not
> >>>> all
> >>>>>> but most) etc., so far nothing helped. Right now, I am not really
> >>> sure,
> >>>>>> what else to try to identify a solve the issue.
> >>>>>>
> >>>>>> I am getting a bit desperate, so I would really appreciate any help
> >>>> with
> >>>>>> this. Thank you all in advance!
> >>>>>>
> >>>>>> Joe
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
>

Re: Tasks Queued but never run

Posted by Bolke de Bruin <bd...@gmail.com>.
It is 1.8.x specific in this case (for backfills). 

Sent from my iPhone

> On 7 Jun 2017, at 21:35, Russell Pierce <ru...@gmail.com> wrote:
> 
> Probably more of a configuration constellation issue than version specific
> or even an 'issue' per se. As noted, on restart the scheduler reschedules
> everything. I had a heavy SubDAG that when rescheduled could produce many
> extra tasks and a small fixed number of Celery workers. So, the scheduled
> tasks wouldn't be done by the time of the scheduler restart and then the
> scheduler would reschedule the SubDAG... debugging hilarity followed from
> there.
> 
>> On Wed, Jun 7, 2017, 10:57 AM Jason Chen <ch...@gmail.com> wrote:
>> 
>> I am using Airflow 1.7.1.3 with CeleryExecutor, but not run into this
>> issue.
>> I am wondering if this issue is only for 1.8.x ?
>> 
>> On Wed, Jun 7, 2017 at 8:34 AM, Russell Pierce <russell.s.pierce@gmail.com
>>> 
>> wrote:
>> 
>>> Depending on how fast you can clear down your queue, -n can be harmful
>> and
>>> really stack up your celery queue. Keep an eye on your queue depth of you
>>> see a ton of messages about the task already having been run.
>>> 
>>> On Mon, Jun 5, 2017, 9:18 AM Josef Samanek <jo...@kiwi.com>
>> wrote:
>>> 
>>>> Hey. Thanks for the answer. I previously also tried to run scheduler -n
>>>> 10, but it was back when I was still using LocalExecutor. And it did
>> not
>>>> help. I have not yet tried to do it with CeleryExecutor, so I might.
>>>> 
>>>> Still, I would prefer to find an actual solution for the underlying
>>>> problem, not just a workaround (eventhough a working workaround is also
>>>> appreciated).
>>>> 
>>>> Best regards,
>>>> Joe
>>>> 
>>>> On 2017-06-02 00:10 (+0200), Alex Guziel <alex.guziel@airbnb.com.
>>> INVALID>
>>>> wrote:
>>>>> We've noticed this with celery, relating to this
>>>>> https://github.com/celery/celery/issues/3765
>>>>> 
>>>>> We also use `-n 5` option on the scheduler so it restarts every 5
>> runs,
>>>>> which will reset all queued tasks.
>>>>> 
>>>>> Best,
>>>>> Alex
>>>>> 
>>>>> On Thu, Jun 1, 2017 at 2:18 PM, Josef Samanek <
>> josef.samanek@gmail.com
>>>> 
>>>>> wrote:
>>>>> 
>>>>>> Hi!
>>>>>> 
>>>>>> We have a problem with our airflow. Sometimes, several tasks get
>>> queued
>>>>>> but they never get run and remain in Queud state forever. Other
>> tasks
>>>> from
>>>>>> the same schedule interval run. And next schedule interval runs
>>>> normally
>>>>>> too. But these several tasks remain queued.
>>>>>> 
>>>>>> We are using Airflow 1.8.1. Currently with CeleryExecutor and
>> redis,
>>>> but
>>>>>> we had the same problem with LocalExecutor as well (actually
>>> switching
>>>> to
>>>>>> Celery helped quite a bit, the problem now happens way less often,
>>> but
>>>>>> still it happens). We have 18 DAGs total, 13 active. Some have just
>>> 1-2
>>>>>> tasks, but some are more complex, like 8 tasks or so and with
>>>> upstreams.
>>>>>> There are also ExternalTaskSensor tasks used.
>>>>>> 
>>>>>> I tried playing around with DAG configurations (limiting
>> concurrency,
>>>>>> max_active_runs, ...), tried switching off some DAGs completely
>> (not
>>>> all
>>>>>> but most) etc., so far nothing helped. Right now, I am not really
>>> sure,
>>>>>> what else to try to identify a solve the issue.
>>>>>> 
>>>>>> I am getting a bit desperate, so I would really appreciate any help
>>>> with
>>>>>> this. Thank you all in advance!
>>>>>> 
>>>>>> Joe
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 

Re: Tasks Queued but never run

Posted by Russell Pierce <ru...@gmail.com>.
Probably more of a configuration constellation issue than version specific
or even an 'issue' per se. As noted, on restart the scheduler reschedules
everything. I had a heavy SubDAG that when rescheduled could produce many
extra tasks and a small fixed number of Celery workers. So, the scheduled
tasks wouldn't be done by the time of the scheduler restart and then the
scheduler would reschedule the SubDAG... debugging hilarity followed from
there.

On Wed, Jun 7, 2017, 10:57 AM Jason Chen <ch...@gmail.com> wrote:

> I am using Airflow 1.7.1.3 with CeleryExecutor, but not run into this
> issue.
> I am wondering if this issue is only for 1.8.x ?
>
> On Wed, Jun 7, 2017 at 8:34 AM, Russell Pierce <russell.s.pierce@gmail.com
> >
> wrote:
>
> > Depending on how fast you can clear down your queue, -n can be harmful
> and
> > really stack up your celery queue. Keep an eye on your queue depth of you
> > see a ton of messages about the task already having been run.
> >
> > On Mon, Jun 5, 2017, 9:18 AM Josef Samanek <jo...@kiwi.com>
> wrote:
> >
> > > Hey. Thanks for the answer. I previously also tried to run scheduler -n
> > > 10, but it was back when I was still using LocalExecutor. And it did
> not
> > > help. I have not yet tried to do it with CeleryExecutor, so I might.
> > >
> > > Still, I would prefer to find an actual solution for the underlying
> > > problem, not just a workaround (eventhough a working workaround is also
> > > appreciated).
> > >
> > > Best regards,
> > > Joe
> > >
> > > On 2017-06-02 00:10 (+0200), Alex Guziel <alex.guziel@airbnb.com.
> > INVALID>
> > > wrote:
> > > > We've noticed this with celery, relating to this
> > > > https://github.com/celery/celery/issues/3765
> > > >
> > > > We also use `-n 5` option on the scheduler so it restarts every 5
> runs,
> > > > which will reset all queued tasks.
> > > >
> > > > Best,
> > > > Alex
> > > >
> > > > On Thu, Jun 1, 2017 at 2:18 PM, Josef Samanek <
> josef.samanek@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > Hi!
> > > > >
> > > > > We have a problem with our airflow. Sometimes, several tasks get
> > queued
> > > > > but they never get run and remain in Queud state forever. Other
> tasks
> > > from
> > > > > the same schedule interval run. And next schedule interval runs
> > > normally
> > > > > too. But these several tasks remain queued.
> > > > >
> > > > > We are using Airflow 1.8.1. Currently with CeleryExecutor and
> redis,
> > > but
> > > > > we had the same problem with LocalExecutor as well (actually
> > switching
> > > to
> > > > > Celery helped quite a bit, the problem now happens way less often,
> > but
> > > > > still it happens). We have 18 DAGs total, 13 active. Some have just
> > 1-2
> > > > > tasks, but some are more complex, like 8 tasks or so and with
> > > upstreams.
> > > > > There are also ExternalTaskSensor tasks used.
> > > > >
> > > > > I tried playing around with DAG configurations (limiting
> concurrency,
> > > > > max_active_runs, ...), tried switching off some DAGs completely
> (not
> > > all
> > > > > but most) etc., so far nothing helped. Right now, I am not really
> > sure,
> > > > > what else to try to identify a solve the issue.
> > > > >
> > > > > I am getting a bit desperate, so I would really appreciate any help
> > > with
> > > > > this. Thank you all in advance!
> > > > >
> > > > > Joe
> > > > >
> > > >
> > >
> >
>

Re: Tasks Queued but never run

Posted by Jason Chen <ch...@gmail.com>.
I am using Airflow 1.7.1.3 with CeleryExecutor, but not run into this issue.
I am wondering if this issue is only for 1.8.x ?

On Wed, Jun 7, 2017 at 8:34 AM, Russell Pierce <ru...@gmail.com>
wrote:

> Depending on how fast you can clear down your queue, -n can be harmful and
> really stack up your celery queue. Keep an eye on your queue depth of you
> see a ton of messages about the task already having been run.
>
> On Mon, Jun 5, 2017, 9:18 AM Josef Samanek <jo...@kiwi.com> wrote:
>
> > Hey. Thanks for the answer. I previously also tried to run scheduler -n
> > 10, but it was back when I was still using LocalExecutor. And it did not
> > help. I have not yet tried to do it with CeleryExecutor, so I might.
> >
> > Still, I would prefer to find an actual solution for the underlying
> > problem, not just a workaround (eventhough a working workaround is also
> > appreciated).
> >
> > Best regards,
> > Joe
> >
> > On 2017-06-02 00:10 (+0200), Alex Guziel <alex.guziel@airbnb.com.
> INVALID>
> > wrote:
> > > We've noticed this with celery, relating to this
> > > https://github.com/celery/celery/issues/3765
> > >
> > > We also use `-n 5` option on the scheduler so it restarts every 5 runs,
> > > which will reset all queued tasks.
> > >
> > > Best,
> > > Alex
> > >
> > > On Thu, Jun 1, 2017 at 2:18 PM, Josef Samanek <josef.samanek@gmail.com
> >
> > > wrote:
> > >
> > > > Hi!
> > > >
> > > > We have a problem with our airflow. Sometimes, several tasks get
> queued
> > > > but they never get run and remain in Queud state forever. Other tasks
> > from
> > > > the same schedule interval run. And next schedule interval runs
> > normally
> > > > too. But these several tasks remain queued.
> > > >
> > > > We are using Airflow 1.8.1. Currently with CeleryExecutor and redis,
> > but
> > > > we had the same problem with LocalExecutor as well (actually
> switching
> > to
> > > > Celery helped quite a bit, the problem now happens way less often,
> but
> > > > still it happens). We have 18 DAGs total, 13 active. Some have just
> 1-2
> > > > tasks, but some are more complex, like 8 tasks or so and with
> > upstreams.
> > > > There are also ExternalTaskSensor tasks used.
> > > >
> > > > I tried playing around with DAG configurations (limiting concurrency,
> > > > max_active_runs, ...), tried switching off some DAGs completely (not
> > all
> > > > but most) etc., so far nothing helped. Right now, I am not really
> sure,
> > > > what else to try to identify a solve the issue.
> > > >
> > > > I am getting a bit desperate, so I would really appreciate any help
> > with
> > > > this. Thank you all in advance!
> > > >
> > > > Joe
> > > >
> > >
> >
>

Re: Tasks Queued but never run

Posted by Russell Pierce <ru...@gmail.com>.
Depending on how fast you can clear down your queue, -n can be harmful and
really stack up your celery queue. Keep an eye on your queue depth of you
see a ton of messages about the task already having been run.

On Mon, Jun 5, 2017, 9:18 AM Josef Samanek <jo...@kiwi.com> wrote:

> Hey. Thanks for the answer. I previously also tried to run scheduler -n
> 10, but it was back when I was still using LocalExecutor. And it did not
> help. I have not yet tried to do it with CeleryExecutor, so I might.
>
> Still, I would prefer to find an actual solution for the underlying
> problem, not just a workaround (eventhough a working workaround is also
> appreciated).
>
> Best regards,
> Joe
>
> On 2017-06-02 00:10 (+0200), Alex Guziel <al...@airbnb.com.INVALID>
> wrote:
> > We've noticed this with celery, relating to this
> > https://github.com/celery/celery/issues/3765
> >
> > We also use `-n 5` option on the scheduler so it restarts every 5 runs,
> > which will reset all queued tasks.
> >
> > Best,
> > Alex
> >
> > On Thu, Jun 1, 2017 at 2:18 PM, Josef Samanek <jo...@gmail.com>
> > wrote:
> >
> > > Hi!
> > >
> > > We have a problem with our airflow. Sometimes, several tasks get queued
> > > but they never get run and remain in Queud state forever. Other tasks
> from
> > > the same schedule interval run. And next schedule interval runs
> normally
> > > too. But these several tasks remain queued.
> > >
> > > We are using Airflow 1.8.1. Currently with CeleryExecutor and redis,
> but
> > > we had the same problem with LocalExecutor as well (actually switching
> to
> > > Celery helped quite a bit, the problem now happens way less often, but
> > > still it happens). We have 18 DAGs total, 13 active. Some have just 1-2
> > > tasks, but some are more complex, like 8 tasks or so and with
> upstreams.
> > > There are also ExternalTaskSensor tasks used.
> > >
> > > I tried playing around with DAG configurations (limiting concurrency,
> > > max_active_runs, ...), tried switching off some DAGs completely (not
> all
> > > but most) etc., so far nothing helped. Right now, I am not really sure,
> > > what else to try to identify a solve the issue.
> > >
> > > I am getting a bit desperate, so I would really appreciate any help
> with
> > > this. Thank you all in advance!
> > >
> > > Joe
> > >
> >
>

Additional info

Posted by Josef Samanek <jo...@kiwi.com>.
Some additional info:

One of the DAG definitions:

dag = DAG(
    dag_id='sync_payments',
    default_args={
        'owner': 'joe',
        'email': [...],
        'email_on_failure': True,
        'email_on_retry': False,
        'depends_on_past': False,
        'start_date': datetime(2017, 4, 25, 0, 0, 0),
        'sla': timedelta(minutes=20),
        'retries': 10,
        'retry_delay': timedelta(minutes=1),
    },
    schedule_interval=timedelta(minutes=10),
)

A log-screenshot example. There was supposed to be a whole schedule interval run at 19:10, but just didn't. This was back when I was still using LocalExecutor, but its pretty much the same with Celery now, just less often and rarely/never the whole interval, just a few tasks.
https://pasteboard.co/eUexoUcew.png


And here is an ilustration of how it can look in Airflow web UI:
https://pasteboard.co/eTMtJKr0w.png

Re: Tasks Queued but never run

Posted by Josef Samanek <jo...@kiwi.com>.
Hey. Thanks for the answer. I previously also tried to run scheduler -n 10, but it was back when I was still using LocalExecutor. And it did not help. I have not yet tried to do it with CeleryExecutor, so I might. 

Still, I would prefer to find an actual solution for the underlying problem, not just a workaround (eventhough a working workaround is also appreciated).

Best regards,
Joe

On 2017-06-02 00:10 (+0200), Alex Guziel <al...@airbnb.com.INVALID> wrote: 
> We've noticed this with celery, relating to this
> https://github.com/celery/celery/issues/3765
> 
> We also use `-n 5` option on the scheduler so it restarts every 5 runs,
> which will reset all queued tasks.
> 
> Best,
> Alex
> 
> On Thu, Jun 1, 2017 at 2:18 PM, Josef Samanek <jo...@gmail.com>
> wrote:
> 
> > Hi!
> >
> > We have a problem with our airflow. Sometimes, several tasks get queued
> > but they never get run and remain in Queud state forever. Other tasks from
> > the same schedule interval run. And next schedule interval runs normally
> > too. But these several tasks remain queued.
> >
> > We are using Airflow 1.8.1. Currently with CeleryExecutor and redis, but
> > we had the same problem with LocalExecutor as well (actually switching to
> > Celery helped quite a bit, the problem now happens way less often, but
> > still it happens). We have 18 DAGs total, 13 active. Some have just 1-2
> > tasks, but some are more complex, like 8 tasks or so and with upstreams.
> > There are also ExternalTaskSensor tasks used.
> >
> > I tried playing around with DAG configurations (limiting concurrency,
> > max_active_runs, ...), tried switching off some DAGs completely (not all
> > but most) etc., so far nothing helped. Right now, I am not really sure,
> > what else to try to identify a solve the issue.
> >
> > I am getting a bit desperate, so I would really appreciate any help with
> > this. Thank you all in advance!
> >
> > Joe
> >
> 

Re: Tasks Queued but never run

Posted by Alex Guziel <al...@airbnb.com.INVALID>.
We've noticed this with celery, relating to this
https://github.com/celery/celery/issues/3765

We also use `-n 5` option on the scheduler so it restarts every 5 runs,
which will reset all queued tasks.

Best,
Alex

On Thu, Jun 1, 2017 at 2:18 PM, Josef Samanek <jo...@gmail.com>
wrote:

> Hi!
>
> We have a problem with our airflow. Sometimes, several tasks get queued
> but they never get run and remain in Queud state forever. Other tasks from
> the same schedule interval run. And next schedule interval runs normally
> too. But these several tasks remain queued.
>
> We are using Airflow 1.8.1. Currently with CeleryExecutor and redis, but
> we had the same problem with LocalExecutor as well (actually switching to
> Celery helped quite a bit, the problem now happens way less often, but
> still it happens). We have 18 DAGs total, 13 active. Some have just 1-2
> tasks, but some are more complex, like 8 tasks or so and with upstreams.
> There are also ExternalTaskSensor tasks used.
>
> I tried playing around with DAG configurations (limiting concurrency,
> max_active_runs, ...), tried switching off some DAGs completely (not all
> but most) etc., so far nothing helped. Right now, I am not really sure,
> what else to try to identify a solve the issue.
>
> I am getting a bit desperate, so I would really appreciate any help with
> this. Thank you all in advance!
>
> Joe
>