You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by הילה ויזן <hi...@gmail.com> on 2016/08/11 13:46:25 UTC

Run TaskInstances sequentially

Hi,
I searched in the documentation for a way to limit a specific task
concurrency to 1,
but didn't find a way.
I thought that 'depends_on_past' should achieve this goal, but I want the
task to run even if the previous task failed - just to be sure the they
don't run in parallel.

The task doesn't have a downstream task, so I can't use
'wait_for_downstream'.

Am I Missing something?

Thanks,
Hila

Re: Run TaskInstances sequentially

Posted by Jeremiah Lowin <jl...@apache.org>.
-- שלום הילה

A pool size of 1 is the simplest way to do this -- I recommend making a
pool for just this specific task.

depends_on_past is an even more strict condition: it will force your tasks
to run in chronological order. However as you point out, it also requires
the previous task to succeed.

Jeremiah

On Thu, Aug 11, 2016 at 5:29 PM Lance Norskog <la...@gmail.com>
wrote:

> In Airflow 1.6.2, all of the concurrency controls are sometimes ignored and
> many tasks are scheduled simultaneously. I don't know if this has been
> completely fixed.
> You can rely on them to separate your task runs *most* of the time, but not
> *all* of the time- so don't write code that depends on exclusive operation.
>
> Lance
>
> On Thu, Aug 11, 2016 at 1:15 PM, Kurt Muehlner <km...@connexity.com>
> wrote:
>
> > I’m not aware of a concurrency limit at task granularity, however, one
> > available option is the ‘max_active_runs’ parameter in the DAG class.
> >
> >   max_active_runs (int) – maximum number of active DAG runs, beyond this
> > number of DAG runs in a running state, the scheduler won’t create new
> > active DAG runs
> >
> > I’ve used the ‘pool size of 1’ option you mention as a very simple way to
> > ensure two DAGs run in serial.
> >
> > Kurt
> >
> > On 8/11/16, 6:57 AM, "הילה ויזן" <hi...@gmail.com> wrote:
> >
> >     should I use pool of size 1?
> >
> >     On Thu, Aug 11, 2016 at 4:46 PM, הילה ויזן <hi...@gmail.com>
> wrote:
> >
> >     > Hi,
> >     > I searched in the documentation for a way to limit a specific task
> >     > concurrency to 1,
> >     > but didn't find a way.
> >     > I thought that 'depends_on_past' should achieve this goal, but I
> > want the
> >     > task to run even if the previous task failed - just to be sure the
> > they
> >     > don't run in parallel.
> >     >
> >     > The task doesn't have a downstream task, so I can't use
> >     > 'wait_for_downstream'.
> >     >
> >     > Am I Missing something?
> >     >
> >     > Thanks,
> >     > Hila
> >     >
> >     >
> >
> >
> >
>
>
> --
> Lance Norskog
> lance.norskog@gmail.com
> Redwood City, CA
>

Re: Run TaskInstances sequentially

Posted by Lance Norskog <la...@gmail.com>.
In Airflow 1.6.2, all of the concurrency controls are sometimes ignored and
many tasks are scheduled simultaneously. I don't know if this has been
completely fixed.
You can rely on them to separate your task runs *most* of the time, but not
*all* of the time- so don't write code that depends on exclusive operation.

Lance

On Thu, Aug 11, 2016 at 1:15 PM, Kurt Muehlner <km...@connexity.com>
wrote:

> I’m not aware of a concurrency limit at task granularity, however, one
> available option is the ‘max_active_runs’ parameter in the DAG class.
>
>   max_active_runs (int) – maximum number of active DAG runs, beyond this
> number of DAG runs in a running state, the scheduler won’t create new
> active DAG runs
>
> I’ve used the ‘pool size of 1’ option you mention as a very simple way to
> ensure two DAGs run in serial.
>
> Kurt
>
> On 8/11/16, 6:57 AM, "הילה ויזן" <hi...@gmail.com> wrote:
>
>     should I use pool of size 1?
>
>     On Thu, Aug 11, 2016 at 4:46 PM, הילה ויזן <hi...@gmail.com> wrote:
>
>     > Hi,
>     > I searched in the documentation for a way to limit a specific task
>     > concurrency to 1,
>     > but didn't find a way.
>     > I thought that 'depends_on_past' should achieve this goal, but I
> want the
>     > task to run even if the previous task failed - just to be sure the
> they
>     > don't run in parallel.
>     >
>     > The task doesn't have a downstream task, so I can't use
>     > 'wait_for_downstream'.
>     >
>     > Am I Missing something?
>     >
>     > Thanks,
>     > Hila
>     >
>     >
>
>
>


-- 
Lance Norskog
lance.norskog@gmail.com
Redwood City, CA

Re: Run TaskInstances sequentially

Posted by Kurt Muehlner <km...@connexity.com>.
I’m not aware of a concurrency limit at task granularity, however, one available option is the ‘max_active_runs’ parameter in the DAG class.

  max_active_runs (int) – maximum number of active DAG runs, beyond this number of DAG runs in a running state, the scheduler won’t create new active DAG runs

I’ve used the ‘pool size of 1’ option you mention as a very simple way to ensure two DAGs run in serial.

Kurt

On 8/11/16, 6:57 AM, "הילה ויזן" <hi...@gmail.com> wrote:

    should I use pool of size 1?
    
    On Thu, Aug 11, 2016 at 4:46 PM, הילה ויזן <hi...@gmail.com> wrote:
    
    > Hi,
    > I searched in the documentation for a way to limit a specific task
    > concurrency to 1,
    > but didn't find a way.
    > I thought that 'depends_on_past' should achieve this goal, but I want the
    > task to run even if the previous task failed - just to be sure the they
    > don't run in parallel.
    >
    > The task doesn't have a downstream task, so I can't use
    > 'wait_for_downstream'.
    >
    > Am I Missing something?
    >
    > Thanks,
    > Hila
    >
    >
    


Re: Run TaskInstances sequentially

Posted by הילה ויזן <hi...@gmail.com>.
should I use pool of size 1?

On Thu, Aug 11, 2016 at 4:46 PM, הילה ויזן <hi...@gmail.com> wrote:

> Hi,
> I searched in the documentation for a way to limit a specific task
> concurrency to 1,
> but didn't find a way.
> I thought that 'depends_on_past' should achieve this goal, but I want the
> task to run even if the previous task failed - just to be sure the they
> don't run in parallel.
>
> The task doesn't have a downstream task, so I can't use
> 'wait_for_downstream'.
>
> Am I Missing something?
>
> Thanks,
> Hila
>
>