You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Cyril Scetbon <cy...@free.fr> on 2016/07/06 21:50:15 UTC

Handling running tasks

Hi,

I have dags with tasks that use same configuration and same schedule time frequencies.

When I have x tasks in a dag they run in //. My dag is scheduled every 15 minutes, but sometimes I have at least a task that runs for more than 15 minutes and 2 identical tasks should not run at the same time (in my case) but it's what happens.

So when the dag is kicked off (every 15 min) I need Airflow to run tasks only for those that are not running, which means :

- if at time t1 tasks tk1 is running and tk2, ..., tkn are not running, I need Airflow to run only tk2,.., tkn but not tk1 cause it's already running
- if at time t2 tasks tk1, ..., tkn are not running, I need Airflow to run tk1,.., tkn

I already tried using depends_on_past=True, however I need failed tasks to be kicked off, cause I can get some temporary issues.

I also use an upstream task connected to all tasks in my dag to be able to run manually all tasks if I need to. (I don't use a frequency of 15 minutes in all dags)

Any idea ? 

Re: Handling running tasks

Posted by siddharth anand <sa...@apache.org>.
+1 on the uniqueness of the solution. Wondering how it worked!
-s


On Sun, Jul 10, 2016 at 9:02 AM, Cyril Scetbon <cy...@free.fr>
wrote:

> Interesting. Thanks for this solution Lance, gonna try it
> > On Jul 6, 2016, at 19:11, Lance Norskog <la...@gmail.com> wrote:
> >
> > You could use the XCOM feature to post a semaphore at the start of the
> task
> > and then remove it at the end.
> > Another task would see the semaphore and immediately quit.
> > If you get a race condition, the third 15-minute task will take care of
> the
> > problem, 15 minutes late.
> >
> > Lance
> >
> > On Wed, Jul 6, 2016 at 2:50 PM, Cyril Scetbon <cy...@free.fr>
> wrote:
> >
> >> Hi,
> >>
> >> I have dags with tasks that use same configuration and same schedule
> time
> >> frequencies.
> >>
> >> When I have x tasks in a dag they run in //. My dag is scheduled every
> 15
> >> minutes, but sometimes I have at least a task that runs for more than 15
> >> minutes and 2 identical tasks should not run at the same time (in my
> case)
> >> but it's what happens.
> >>
> >> So when the dag is kicked off (every 15 min) I need Airflow to run tasks
> >> only for those that are not running, which means :
> >>
> >> - if at time t1 tasks tk1 is running and tk2, ..., tkn are not running,
> I
> >> need Airflow to run only tk2,.., tkn but not tk1 cause it's already
> running
> >> - if at time t2 tasks tk1, ..., tkn are not running, I need Airflow to
> run
> >> tk1,.., tkn
> >>
> >> I already tried using depends_on_past=True, however I need failed tasks
> to
> >> be kicked off, cause I can get some temporary issues.
> >>
> >> I also use an upstream task connected to all tasks in my dag to be able
> to
> >> run manually all tasks if I need to. (I don't use a frequency of 15
> minutes
> >> in all dags)
> >>
> >> Any idea ?
> >
> >
> >
> >
> > --
> > Lance Norskog
> > lance.norskog@gmail.com
> > Redwood City, CA
>
>

Re: Handling running tasks

Posted by Cyril Scetbon <cy...@free.fr>.
Interesting. Thanks for this solution Lance, gonna try it
> On Jul 6, 2016, at 19:11, Lance Norskog <la...@gmail.com> wrote:
> 
> You could use the XCOM feature to post a semaphore at the start of the task
> and then remove it at the end.
> Another task would see the semaphore and immediately quit.
> If you get a race condition, the third 15-minute task will take care of the
> problem, 15 minutes late.
> 
> Lance
> 
> On Wed, Jul 6, 2016 at 2:50 PM, Cyril Scetbon <cy...@free.fr> wrote:
> 
>> Hi,
>> 
>> I have dags with tasks that use same configuration and same schedule time
>> frequencies.
>> 
>> When I have x tasks in a dag they run in //. My dag is scheduled every 15
>> minutes, but sometimes I have at least a task that runs for more than 15
>> minutes and 2 identical tasks should not run at the same time (in my case)
>> but it's what happens.
>> 
>> So when the dag is kicked off (every 15 min) I need Airflow to run tasks
>> only for those that are not running, which means :
>> 
>> - if at time t1 tasks tk1 is running and tk2, ..., tkn are not running, I
>> need Airflow to run only tk2,.., tkn but not tk1 cause it's already running
>> - if at time t2 tasks tk1, ..., tkn are not running, I need Airflow to run
>> tk1,.., tkn
>> 
>> I already tried using depends_on_past=True, however I need failed tasks to
>> be kicked off, cause I can get some temporary issues.
>> 
>> I also use an upstream task connected to all tasks in my dag to be able to
>> run manually all tasks if I need to. (I don't use a frequency of 15 minutes
>> in all dags)
>> 
>> Any idea ?
> 
> 
> 
> 
> -- 
> Lance Norskog
> lance.norskog@gmail.com
> Redwood City, CA


Re: Handling running tasks

Posted by Lance Norskog <la...@gmail.com>.
You could use the XCOM feature to post a semaphore at the start of the task
and then remove it at the end.
Another task would see the semaphore and immediately quit.
If you get a race condition, the third 15-minute task will take care of the
problem, 15 minutes late.

Lance

On Wed, Jul 6, 2016 at 2:50 PM, Cyril Scetbon <cy...@free.fr> wrote:

> Hi,
>
> I have dags with tasks that use same configuration and same schedule time
> frequencies.
>
> When I have x tasks in a dag they run in //. My dag is scheduled every 15
> minutes, but sometimes I have at least a task that runs for more than 15
> minutes and 2 identical tasks should not run at the same time (in my case)
> but it's what happens.
>
> So when the dag is kicked off (every 15 min) I need Airflow to run tasks
> only for those that are not running, which means :
>
> - if at time t1 tasks tk1 is running and tk2, ..., tkn are not running, I
> need Airflow to run only tk2,.., tkn but not tk1 cause it's already running
> - if at time t2 tasks tk1, ..., tkn are not running, I need Airflow to run
> tk1,.., tkn
>
> I already tried using depends_on_past=True, however I need failed tasks to
> be kicked off, cause I can get some temporary issues.
>
> I also use an upstream task connected to all tasks in my dag to be able to
> run manually all tasks if I need to. (I don't use a frequency of 15 minutes
> in all dags)
>
> Any idea ?




-- 
Lance Norskog
lance.norskog@gmail.com
Redwood City, CA