You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@airflow.apache.org by Reed Villanueva <rv...@ucera.org> on 2020/01/29 23:54:57 UTC

Way to designate certain set of airflow tasks to run before others (order invariant)?

Have an airflow (v1.10.5) dag that looks like...

[image: enter image description here] <https://i.stack.imgur.com/DfzdN.png>

Is there a way to specify that all of the blue tasks should complete before
scheduler moves on to any downstream tasks (as currently scheduler
sometimes goes down an entire branch of tasks before doing the next blue
task)?

Want to avoid just putting them in sequence (and using with trigger rule
TriggerRule.ALL_DONE) because they do not actually have any logical order
in which they need to be done (other than that they all need to be done
before any other downstream tasks in any branch).

Anyone know of any way to do this (like some kind of "priority" pool for
tasks)? Other workaround suggestions?

-- 
This electronic message is intended only for the named 
recipient, and may 
contain information that is confidential or 
privileged. If you are not the 
intended recipient, you are 
hereby notified that any disclosure, copying, 
distribution or 
use of the contents of this message is strictly 
prohibited. If 
you have received this message in error or are not the 
named
recipient, please notify us immediately by contacting the 
sender at 
the electronic mail address noted above, and delete 
and destroy all copies 
of this message. Thank you.

Re: Way to designate certain set of airflow tasks to run before others (order invariant)?

Posted by Reed Villanueva <rv...@ucera.org>.
Interesting, thanks.

On Wed, Jan 29, 2020 at 11:52 PM Ash Berlin-Taylor <as...@apache.org> wrote:

> Rather than a BashOperator I'd suggest using DummyOperator.
>
> The other thing you might want to look at is priority_weight attribute of
> a task. By default the priority of a task includes the weight of all it's
> down stream tasks, but you can configure this with the priority_weight and
> weight_rule attributes to the operator constructor - see
> http://airflow.apache.org/docs/stable/_api/airflow/operators/index.html?highlight=priority#airflow.operators.BaseOperator
> for the possible values and meanings.
>
> So priority_weight=20, weight_rule=absolute for your blue tasks should
> make the scheduler run those before the other tasks in your dag.
>
> -Ash
>
> On Jan 30 2020, at 1:46 am, Reed Villanueva <rv...@ucera.org> wrote:
>
> After some more thought, here is my workaround...
>
>
> Anyone see any holes with this?
>
>
> On Wed, Jan 29, 2020 at 3:26 PM Reed Villanueva <rv...@ucera.org>
> wrote:
>
> Kamil,
>
> One thing about my case is that the next tasks (greens) in each branch
> should only run if the blue task *in that same branch* completes
> successfully (should *not care about the success/failure status of the *
> *other** blue tasks*, only that they have been run). Thus I don't think
> the ALL_DONE trigger rule will help the greens and ALL_SUCCESS would be too
> strict.
>
> Any ideas for such a thing?
>
>
> On Wed, Jan 29, 2020 at 2:14 PM Kamil Breguła <ka...@polidea.com>
> wrote:
>
> white
> blue = [blue_a, blue_b, blue_c]
> green = [green_a, green_b, green_c]
> yellow = [yellow_a, yellow_b]
>
> cross_downstream(from_tasks=[white], to_tasks=[blue])
> cross_downstream(from_tasks=blue, to_tasks=green)
> cross_downstream(from_tasks=green to_tasks=yellow)
>
> This should create the required network of dependencies between tasks.
>
> Here is visualization available:
> https://imgur.com/a/2jqyqQO
>
> This is the easiest solution and in my opinion the correct one.
> However, if you don't want a dependencies then you can create a new
> schedule rule by editing the BaseOperator.deps property.
>
>
> This electronic message is intended only for the named
> recipient, and may contain information that is confidential or
> privileged. If you are not the intended recipient, you are
> hereby notified that any disclosure, copying, distribution or
> use of the contents of this message is strictly prohibited. If
> you have received this message in error or are not the named
> recipient, please notify us immediately by contacting the
> sender at the electronic mail address noted above, and delete
> and destroy all copies of this message. Thank you.
>
>

-- 
This electronic message is intended only for the named 
recipient, and may 
contain information that is confidential or 
privileged. If you are not the 
intended recipient, you are 
hereby notified that any disclosure, copying, 
distribution or 
use of the contents of this message is strictly 
prohibited. If 
you have received this message in error or are not the 
named
recipient, please notify us immediately by contacting the 
sender at 
the electronic mail address noted above, and delete 
and destroy all copies 
of this message. Thank you.

Re: Way to designate certain set of airflow tasks to run before others (order invariant)?

Posted by Ash Berlin-Taylor <as...@apache.org>.
Rather than a BashOperator I'd suggest using DummyOperator.

The other thing you might want to look at is priority_weight attribute of a task. By default the priority of a task includes the weight of all it's down stream tasks, but you can configure this with the priority_weight and weight_rule attributes to the operator constructor - see http://airflow.apache.org/docs/stable/_api/airflow/operators/index.html?highlight=priority#airflow.operators.BaseOperator for the possible values and meanings.
So priority_weight=20, weight_rule=absolute for your blue tasks should make the scheduler run those before the other tasks in your dag.
-Ash
On Jan 30 2020, at 1:46 am, Reed Villanueva <rv...@ucera.org> wrote:
> After some more thought, here is my workaround...
>
>
> Anyone see any holes with this?
>
>
> On Wed, Jan 29, 2020 at 3:26 PM Reed Villanueva <rvillanueva@ucera.org (mailto:rvillanueva@ucera.org)> wrote:
> > Kamil,
> >
> > One thing about my case is that the next tasks (greens) in each branch should only run if the blue task in that same branch completes successfully (should not care about the success/failure status of the other blue tasks, only that they have been run). Thus I don't think the ALL_DONE trigger rule will help the greens and ALL_SUCCESS would be too strict.
> >
> > Any ideas for such a thing?
> >
> >
> > On Wed, Jan 29, 2020 at 2:14 PM Kamil Breguła <kamil.bregula@polidea.com (mailto:kamil.bregula@polidea.com)> wrote:
> > > white
> > > blue = [blue_a, blue_b, blue_c]
> > > green = [green_a, green_b, green_c]
> > > yellow = [yellow_a, yellow_b]
> > >
> > > cross_downstream(from_tasks=[white], to_tasks=[blue])
> > > cross_downstream(from_tasks=blue, to_tasks=green)
> > > cross_downstream(from_tasks=green to_tasks=yellow)
> > >
> > > This should create the required network of dependencies between tasks.
> > > Here is visualization available:
> > > https://imgur.com/a/2jqyqQO
> > >
> > > This is the easiest solution and in my opinion the correct one.
> > > However, if you don't want a dependencies then you can create a new
> > > schedule rule by editing the BaseOperator.deps property.
> >
> >
>
>
>
> This electronic message is intended only for the named
> recipient, and may contain information that is confidential or
> privileged. If you are not the intended recipient, you are
> hereby notified that any disclosure, copying, distribution or
> use of the contents of this message is strictly prohibited. If
> you have received this message in error or are not the named
> recipient, please notify us immediately by contacting the
> sender at the electronic mail address noted above, and delete
> and destroy all copies of this message. Thank you.
>


Re: Way to designate certain set of airflow tasks to run before others (order invariant)?

Posted by Michael Lutz <mi...@gmail.com>.
That was my first reaction too.  Build anew task as a convergence point.

On Wed, Jan 29, 2020, 8:49 PM Reed Villanueva <rv...@ucera.org> wrote:

> After some more thought, here is my workaround...
>
> [image: image.png]
> Anyone see any holes with this?
>
>
> On Wed, Jan 29, 2020 at 3:26 PM Reed Villanueva <rv...@ucera.org>
> wrote:
>
>> Kamil,
>>
>> One thing about my case is that the next tasks (greens) in each branch
>> should only run if the blue task *in that same branch* completes
>> successfully (should *not care about the success/failure status of the
>> other blue tasks*, only that they have been run). Thus I don't think the
>> ALL_DONE trigger rule will help the greens and ALL_SUCCESS would be too
>> strict.
>>
>> Any ideas for such a thing?
>>
>>
>> On Wed, Jan 29, 2020 at 2:14 PM Kamil Breguła <ka...@polidea.com>
>> wrote:
>>
>>> white
>>> blue = [blue_a, blue_b, blue_c]
>>> green = [green_a, green_b, green_c]
>>> yellow = [yellow_a, yellow_b]
>>>
>>> cross_downstream(from_tasks=[white], to_tasks=[blue])
>>> cross_downstream(from_tasks=blue, to_tasks=green)
>>> cross_downstream(from_tasks=green to_tasks=yellow)
>>>
>>> This should create the required network of dependencies between tasks.
>>>
>>> Here is visualization available:
>>> https://imgur.com/a/2jqyqQO
>>>
>>> This is the easiest solution and in my opinion the correct one.
>>> However, if you don't want a dependencies then you can create a new
>>> schedule rule by editing the BaseOperator.deps property.
>>>
>>
> This electronic message is intended only for the named
> recipient, and may contain information that is confidential or
> privileged. If you are not the intended recipient, you are
> hereby notified that any disclosure, copying, distribution or
> use of the contents of this message is strictly prohibited. If
> you have received this message in error or are not the named
> recipient, please notify us immediately by contacting the
> sender at the electronic mail address noted above, and delete
> and destroy all copies of this message. Thank you.
>

Re: Way to designate certain set of airflow tasks to run before others (order invariant)?

Posted by Reed Villanueva <rv...@ucera.org>.
After some more thought, here is my workaround...

[image: image.png]
Anyone see any holes with this?


On Wed, Jan 29, 2020 at 3:26 PM Reed Villanueva <rv...@ucera.org>
wrote:

> Kamil,
>
> One thing about my case is that the next tasks (greens) in each branch
> should only run if the blue task *in that same branch* completes
> successfully (should *not care about the success/failure status of the
> other blue tasks*, only that they have been run). Thus I don't think the
> ALL_DONE trigger rule will help the greens and ALL_SUCCESS would be too
> strict.
>
> Any ideas for such a thing?
>
>
> On Wed, Jan 29, 2020 at 2:14 PM Kamil Breguła <ka...@polidea.com>
> wrote:
>
>> white
>> blue = [blue_a, blue_b, blue_c]
>> green = [green_a, green_b, green_c]
>> yellow = [yellow_a, yellow_b]
>>
>> cross_downstream(from_tasks=[white], to_tasks=[blue])
>> cross_downstream(from_tasks=blue, to_tasks=green)
>> cross_downstream(from_tasks=green to_tasks=yellow)
>>
>> This should create the required network of dependencies between tasks.
>>
>> Here is visualization available:
>> https://imgur.com/a/2jqyqQO
>>
>> This is the easiest solution and in my opinion the correct one.
>> However, if you don't want a dependencies then you can create a new
>> schedule rule by editing the BaseOperator.deps property.
>>
>

-- 
This electronic message is intended only for the named 
recipient, and may 
contain information that is confidential or 
privileged. If you are not the 
intended recipient, you are 
hereby notified that any disclosure, copying, 
distribution or 
use of the contents of this message is strictly 
prohibited. If 
you have received this message in error or are not the 
named
recipient, please notify us immediately by contacting the 
sender at 
the electronic mail address noted above, and delete 
and destroy all copies 
of this message. Thank you.

Re: Way to designate certain set of airflow tasks to run before others (order invariant)?

Posted by Reed Villanueva <rv...@ucera.org>.
Kamil,

One thing about my case is that the next tasks (greens) in each branch
should only run if the blue task *in that same branch* completes
successfully (should *not care about the success/failure status of the
other blue tasks*, only that they have been run). Thus I don't think the
ALL_DONE trigger rule will help the greens and ALL_SUCCESS would be too
strict.

Any ideas for such a thing?


On Wed, Jan 29, 2020 at 2:14 PM Kamil Breguła <ka...@polidea.com>
wrote:

> white
> blue = [blue_a, blue_b, blue_c]
> green = [green_a, green_b, green_c]
> yellow = [yellow_a, yellow_b]
>
> cross_downstream(from_tasks=[white], to_tasks=[blue])
> cross_downstream(from_tasks=blue, to_tasks=green)
> cross_downstream(from_tasks=green to_tasks=yellow)
>
> This should create the required network of dependencies between tasks.
>
> Here is visualization available:
> https://imgur.com/a/2jqyqQO
>
> This is the easiest solution and in my opinion the correct one.
> However, if you don't want a dependencies then you can create a new
> schedule rule by editing the BaseOperator.deps property.
>

-- 
This electronic message is intended only for the named 
recipient, and may 
contain information that is confidential or 
privileged. If you are not the 
intended recipient, you are 
hereby notified that any disclosure, copying, 
distribution or 
use of the contents of this message is strictly 
prohibited. If 
you have received this message in error or are not the 
named
recipient, please notify us immediately by contacting the 
sender at 
the electronic mail address noted above, and delete 
and destroy all copies 
of this message. Thank you.

Re: Way to designate certain set of airflow tasks to run before others (order invariant)?

Posted by Reed Villanueva <rv...@ucera.org>.
Interesting, thanks

On Wed, Jan 29, 2020 at 2:14 PM Kamil Breguła <ka...@polidea.com>
wrote:

> white
> blue = [blue_a, blue_b, blue_c]
> green = [green_a, green_b, green_c]
> yellow = [yellow_a, yellow_b]
>
> cross_downstream(from_tasks=[white], to_tasks=[blue])
> cross_downstream(from_tasks=blue, to_tasks=green)
> cross_downstream(from_tasks=green to_tasks=yellow)
>
> This should create the required network of dependencies between tasks.
>
> Here is visualization available:
> https://imgur.com/a/2jqyqQO
>
> This is the easiest solution and in my opinion the correct one.
> However, if you don't want a dependencies then you can create a new
> schedule rule by editing the BaseOperator.deps property.
>

-- 
This electronic message is intended only for the named 
recipient, and may 
contain information that is confidential or 
privileged. If you are not the 
intended recipient, you are 
hereby notified that any disclosure, copying, 
distribution or 
use of the contents of this message is strictly 
prohibited. If 
you have received this message in error or are not the 
named
recipient, please notify us immediately by contacting the 
sender at 
the electronic mail address noted above, and delete 
and destroy all copies 
of this message. Thank you.

Re: Way to designate certain set of airflow tasks to run before others (order invariant)?

Posted by Kamil Breguła <ka...@polidea.com>.
white
blue = [blue_a, blue_b, blue_c]
green = [green_a, green_b, green_c]
yellow = [yellow_a, yellow_b]

cross_downstream(from_tasks=[white], to_tasks=[blue])
cross_downstream(from_tasks=blue, to_tasks=green)
cross_downstream(from_tasks=green to_tasks=yellow)

This should create the required network of dependencies between tasks.

Here is visualization available:
https://imgur.com/a/2jqyqQO

This is the easiest solution and in my opinion the correct one.
However, if you don't want a dependencies then you can create a new
schedule rule by editing the BaseOperator.deps property.