You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by הילה ויזן <hi...@gmail.com> on 2016/08/11 13:46:25 UTC
Run TaskInstances sequentially
Hi,
I searched in the documentation for a way to limit a specific task
concurrency to 1,
but didn't find a way.
I thought that 'depends_on_past' should achieve this goal, but I want the
task to run even if the previous task failed - just to be sure the they
don't run in parallel.
The task doesn't have a downstream task, so I can't use
'wait_for_downstream'.
Am I Missing something?
Thanks,
Hila
Re: Run TaskInstances sequentially
Posted by Jeremiah Lowin <jl...@apache.org>.
-- שלום הילה
A pool size of 1 is the simplest way to do this -- I recommend making a
pool for just this specific task.
depends_on_past is an even more strict condition: it will force your tasks
to run in chronological order. However as you point out, it also requires
the previous task to succeed.
Jeremiah
On Thu, Aug 11, 2016 at 5:29 PM Lance Norskog <la...@gmail.com>
wrote:
> In Airflow 1.6.2, all of the concurrency controls are sometimes ignored and
> many tasks are scheduled simultaneously. I don't know if this has been
> completely fixed.
> You can rely on them to separate your task runs *most* of the time, but not
> *all* of the time- so don't write code that depends on exclusive operation.
>
> Lance
>
> On Thu, Aug 11, 2016 at 1:15 PM, Kurt Muehlner <km...@connexity.com>
> wrote:
>
> > I’m not aware of a concurrency limit at task granularity, however, one
> > available option is the ‘max_active_runs’ parameter in the DAG class.
> >
> > max_active_runs (int) – maximum number of active DAG runs, beyond this
> > number of DAG runs in a running state, the scheduler won’t create new
> > active DAG runs
> >
> > I’ve used the ‘pool size of 1’ option you mention as a very simple way to
> > ensure two DAGs run in serial.
> >
> > Kurt
> >
> > On 8/11/16, 6:57 AM, "הילה ויזן" <hi...@gmail.com> wrote:
> >
> > should I use pool of size 1?
> >
> > On Thu, Aug 11, 2016 at 4:46 PM, הילה ויזן <hi...@gmail.com>
> wrote:
> >
> > > Hi,
> > > I searched in the documentation for a way to limit a specific task
> > > concurrency to 1,
> > > but didn't find a way.
> > > I thought that 'depends_on_past' should achieve this goal, but I
> > want the
> > > task to run even if the previous task failed - just to be sure the
> > they
> > > don't run in parallel.
> > >
> > > The task doesn't have a downstream task, so I can't use
> > > 'wait_for_downstream'.
> > >
> > > Am I Missing something?
> > >
> > > Thanks,
> > > Hila
> > >
> > >
> >
> >
> >
>
>
> --
> Lance Norskog
> lance.norskog@gmail.com
> Redwood City, CA
>
Re: Run TaskInstances sequentially
Posted by Lance Norskog <la...@gmail.com>.
In Airflow 1.6.2, all of the concurrency controls are sometimes ignored and
many tasks are scheduled simultaneously. I don't know if this has been
completely fixed.
You can rely on them to separate your task runs *most* of the time, but not
*all* of the time- so don't write code that depends on exclusive operation.
Lance
On Thu, Aug 11, 2016 at 1:15 PM, Kurt Muehlner <km...@connexity.com>
wrote:
> I’m not aware of a concurrency limit at task granularity, however, one
> available option is the ‘max_active_runs’ parameter in the DAG class.
>
> max_active_runs (int) – maximum number of active DAG runs, beyond this
> number of DAG runs in a running state, the scheduler won’t create new
> active DAG runs
>
> I’ve used the ‘pool size of 1’ option you mention as a very simple way to
> ensure two DAGs run in serial.
>
> Kurt
>
> On 8/11/16, 6:57 AM, "הילה ויזן" <hi...@gmail.com> wrote:
>
> should I use pool of size 1?
>
> On Thu, Aug 11, 2016 at 4:46 PM, הילה ויזן <hi...@gmail.com> wrote:
>
> > Hi,
> > I searched in the documentation for a way to limit a specific task
> > concurrency to 1,
> > but didn't find a way.
> > I thought that 'depends_on_past' should achieve this goal, but I
> want the
> > task to run even if the previous task failed - just to be sure the
> they
> > don't run in parallel.
> >
> > The task doesn't have a downstream task, so I can't use
> > 'wait_for_downstream'.
> >
> > Am I Missing something?
> >
> > Thanks,
> > Hila
> >
> >
>
>
>
--
Lance Norskog
lance.norskog@gmail.com
Redwood City, CA
Re: Run TaskInstances sequentially
Posted by Kurt Muehlner <km...@connexity.com>.
I’m not aware of a concurrency limit at task granularity, however, one available option is the ‘max_active_runs’ parameter in the DAG class.
max_active_runs (int) – maximum number of active DAG runs, beyond this number of DAG runs in a running state, the scheduler won’t create new active DAG runs
I’ve used the ‘pool size of 1’ option you mention as a very simple way to ensure two DAGs run in serial.
Kurt
On 8/11/16, 6:57 AM, "הילה ויזן" <hi...@gmail.com> wrote:
should I use pool of size 1?
On Thu, Aug 11, 2016 at 4:46 PM, הילה ויזן <hi...@gmail.com> wrote:
> Hi,
> I searched in the documentation for a way to limit a specific task
> concurrency to 1,
> but didn't find a way.
> I thought that 'depends_on_past' should achieve this goal, but I want the
> task to run even if the previous task failed - just to be sure the they
> don't run in parallel.
>
> The task doesn't have a downstream task, so I can't use
> 'wait_for_downstream'.
>
> Am I Missing something?
>
> Thanks,
> Hila
>
>
Re: Run TaskInstances sequentially
Posted by הילה ויזן <hi...@gmail.com>.
should I use pool of size 1?
On Thu, Aug 11, 2016 at 4:46 PM, הילה ויזן <hi...@gmail.com> wrote:
> Hi,
> I searched in the documentation for a way to limit a specific task
> concurrency to 1,
> but didn't find a way.
> I thought that 'depends_on_past' should achieve this goal, but I want the
> task to run even if the previous task failed - just to be sure the they
> don't run in parallel.
>
> The task doesn't have a downstream task, so I can't use
> 'wait_for_downstream'.
>
> Am I Missing something?
>
> Thanks,
> Hila
>
>