You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Jason Chen <ch...@gmail.com> on 2016/05/16 20:09:15 UTC

Time zone used in "Tree view" and task order

I have two questions

(1) For the airflow UI: "Tree view", it lists the tasks along with the time
highlighted in the top (say, 08:30; 09:00, etc). What's the meaning of
time? It looks not the UTC time of the task was running.  I know in
overall, airflow uses UTC time
(2) I have a DAG with two tasks: task1 --> task2
Task1 is running hourly and could take longer than one hour to run,
sometimes.
In such a setup, task1 will be triggered hourly and what happens if the
previous task1 is still running ? Will the "new" task1 be queued ?

Thanks.
Jason

Re: Time zone used in "Tree view" and task order

Posted by Maxime Beauchemin <ma...@gmail.com>.
About time zones, it'd be nice to add an entry to the FAQ in the docs with
recommendations. We do UTC all around here which makes it easy.

Max

On Tue, May 31, 2016 at 4:22 PM, Jason Chen <ch...@gmail.com>
wrote:

> Hi Chris,
>
> I see.
> I switched to LocalExecutor and the scheduler is working as expected.
> Thanks a lot for your help!
>
> Jason
>
> On Tue, May 31, 2016 at 3:35 PM, Chris Riccomini <cr...@apache.org>
> wrote:
>
> > Hey Jason,
> >
> > The SequentialExecutor only ever runs one task at a time. It's meant for
> > debugging purposes. Try switching to the LocalExecutor.
> >
> > Cheers,
> > Chris
> >
> > On Tue, May 31, 2016 at 3:31 PM, Jason Chen <ch...@gmail.com>
> > wrote:
> >
> >> Chris,
> >>  I am running SequentialExecutor.
> >>
> >> Thanks.
> >> Jason
> >>
> >>
> >> On Tue, May 31, 2016 at 1:36 PM, Chris Riccomini <criccomini@apache.org
> >
> >> wrote:
> >>
> >>> Hey Jason,
> >>>
> >>> Are you running the SerialExecutor? This is the default out-of-the-box
> >>> executor.
> >>>
> >>> Cheers,
> >>> Chris
> >>>
> >>> On Tue, May 31, 2016 at 12:59 PM, Jason Chen <
> chingchien.chen@gmail.com>
> >>> wrote:
> >>>
> >>>> Hi Chris,
> >>>>
> >>>> I made the changes and tried it out.
> >>>> It seems not working as expected.
> >>>> When a dag is running (a particular task inside that dag is taking
> >>>> time), another task from another dag seems "blocked".
> >>>>
> >>>> My setting:
> >>>> (1) airflow.cfg
> >>>>   max_active_runs_per_dag = 16
> >>>>   parallelism = 32
> >>>>   dag_concurrency = 16
> >>>>
> >>>> (2) A dag (dag1) python file is as below partially. Please note that
> >>>> inside this DAG, the first task (task1) is a long running task
> >>>>
> >>>> dag1 = DAG('dag1', schedule_interval=timedelta(minutes=15),
> >>>> max_active_runs=1, default_args=args)
> >>>>
> >>>> Then, the tasks are running in the order...
> >>>> task1 (long running) --> task 2  --> task3
> >>>> ...
> >>>> (3) In another dag (dag2) python file is as below partially.
> >>>> dag2 = DAG('dag2', schedule_interval=timedelta(minutes=3),
> >>>> max_active_runs=1, default_args=args)
> >>>> ...
> >>>> Then, the tasks are running in the order...
> >>>> taskA (short running task) --> taskB
> >>>>
> >>>> (4) Inside the upstart script file. this is the main part how I start
> >>>> airflow scheduler
> >>>>
> >>>> env SCHEDULER_RUNS=0
> >>>> export SCHEDULER_RUNS
> >>>>
> >>>> script
> >>>>     exec >> ${AIRFLOW_HOME}/scheduler-log/airflow-scheduler.log 2>&1
> >>>>     exec usr/local/bin/airflow scheduler -n ${SCHEDULER_RUNS}
> >>>> end script
> >>>>
> >>>> =========================
> >>>>
> >>>> What I observed are that
> >>>> (a) task1 (of dag1) is running about 20 mins and during it's running
> >>>> time, there is no other dag1 triggered. This is as expected.
> >>>>
> >>>> (b) taskA (of dag2) should be triggered to run every 3 mins. However,
> >>>> it is NOT triggered if task-1 of dag-1 is running.
> >>>> taskA seems to be queued/bolcked and not run. It is executed after
> >>>> task-1 (of dag-1) is done. So, it looks like it is dispatched into a
> "gap"
> >>>> of task1 and task2 (of dag1). This looks not normal, as it's expected
> taskA
> >>>> (of dag 2) should run no matter what happens to another dag (dag-1).
> >>>>
> >>>>
> >>>> Any suggestions?
> >>>> Thanks.
> >>>> Jason
> >>>>
> >>>>
> >>>> On Tue, May 31, 2016 at 9:02 AM, Chris Riccomini <
> criccomini@apache.org
> >>>> > wrote:
> >>>>
> >>>>> Hey Jason,
> >>>>>
> >>>>> The problem is max_active_runs_per_dag=1. Set it back to 16. You just
> >>>>> need
> >>>>> max_active_runs=1 for the individual DAGs. This will allow multiple
> >>>>> (different) DAGs to run in parallel, but only one DAG of each type
> can
> >>>>> run
> >>>>> at the same type.
> >>>>>
> >>>>> Cheers,
> >>>>> Chris
> >>>>>
> >>>>> On Fri, May 27, 2016 at 11:42 PM, Jason Chen <
> >>>>> chingchien.chen@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>> > Hi Chris,
> >>>>> >  Thanks for your reply. After setting it up, I observed how it
> works
> >>>>> for
> >>>>> > couple of days..
> >>>>> >
> >>>>> >  I tried to to set max_active_runs=1 in the DAG
> >>>>> > dag = DAG(...max_active_runs=1...) and it executed fine to avoid
> two
> >>>>> runs
> >>>>> > at the same time.
> >>>>> > However, I noticed other dags (not the dag that is running) is also
> >>>>> > "paused".
> >>>>> > My understanding is that "max_active_runs" is basically
> >>>>> > "max_active_runs_per_dag".
> >>>>> > So, why another dag (different dag name) cannot run at the same
> time
> >>>>> as the
> >>>>> > first dag?
> >>>>> > I want to have the two dags can be possibly run at the same time
> and
> >>>>> inside
> >>>>> > each dag, there is only
> >>>>> > one run per dag.
> >>>>> > Thanks.
> >>>>> >
> >>>>> > Jason
> >>>>> >
> >>>>> > My other settings in airflow.cfg
> >>>>> >
> >>>>> > max_active_runs_per_dag=1
> >>>>> > parallelism = 32
> >>>>> > dag_concurrency = 16
> >>>>> >
> >>>>> >
> >>>>> >
> >>>>> > On Mon, May 16, 2016 at 8:57 PM, Chris Riccomini <
> >>>>> criccomini@apache.org>
> >>>>> > wrote:
> >>>>> >
> >>>>> > > Hey Jason,
> >>>>> > >
> >>>>> > > For (2), by default, task1 will start running again. You'll have
> >>>>> two runs
> >>>>> > > going at the same time. If you want to prevent this, you can set
> >>>>> > > max_active_runs to 1 in your DAG.
> >>>>> > >
> >>>>> > > Cheers,
> >>>>> > > Chris
> >>>>> > >
> >>>>> > > On Mon, May 16, 2016 at 1:09 PM, Jason Chen <
> >>>>> chingchien.chen@gmail.com>
> >>>>> > > wrote:
> >>>>> > >
> >>>>> > > > I have two questions
> >>>>> > > >
> >>>>> > > > (1) For the airflow UI: "Tree view", it lists the tasks along
> >>>>> with the
> >>>>> > > time
> >>>>> > > > highlighted in the top (say, 08:30; 09:00, etc). What's the
> >>>>> meaning of
> >>>>> > > > time? It looks not the UTC time of the task was running.  I
> know
> >>>>> in
> >>>>> > > > overall, airflow uses UTC time
> >>>>> > > > (2) I have a DAG with two tasks: task1 --> task2
> >>>>> > > > Task1 is running hourly and could take longer than one hour to
> >>>>> run,
> >>>>> > > > sometimes.
> >>>>> > > > In such a setup, task1 will be triggered hourly and what
> happens
> >>>>> if the
> >>>>> > > > previous task1 is still running ? Will the "new" task1 be
> queued
> >>>>> ?
> >>>>> > > >
> >>>>> > > > Thanks.
> >>>>> > > > Jason
> >>>>> > > >
> >>>>> > >
> >>>>> >
> >>>>>
> >>>>
> >>>>
> >>>
> >>
> >
>

Re: Time zone used in "Tree view" and task order

Posted by Jason Chen <ch...@gmail.com>.
Hi Chris,

I see.
I switched to LocalExecutor and the scheduler is working as expected.
Thanks a lot for your help!

Jason

On Tue, May 31, 2016 at 3:35 PM, Chris Riccomini <cr...@apache.org>
wrote:

> Hey Jason,
>
> The SequentialExecutor only ever runs one task at a time. It's meant for
> debugging purposes. Try switching to the LocalExecutor.
>
> Cheers,
> Chris
>
> On Tue, May 31, 2016 at 3:31 PM, Jason Chen <ch...@gmail.com>
> wrote:
>
>> Chris,
>>  I am running SequentialExecutor.
>>
>> Thanks.
>> Jason
>>
>>
>> On Tue, May 31, 2016 at 1:36 PM, Chris Riccomini <cr...@apache.org>
>> wrote:
>>
>>> Hey Jason,
>>>
>>> Are you running the SerialExecutor? This is the default out-of-the-box
>>> executor.
>>>
>>> Cheers,
>>> Chris
>>>
>>> On Tue, May 31, 2016 at 12:59 PM, Jason Chen <ch...@gmail.com>
>>> wrote:
>>>
>>>> Hi Chris,
>>>>
>>>> I made the changes and tried it out.
>>>> It seems not working as expected.
>>>> When a dag is running (a particular task inside that dag is taking
>>>> time), another task from another dag seems "blocked".
>>>>
>>>> My setting:
>>>> (1) airflow.cfg
>>>>   max_active_runs_per_dag = 16
>>>>   parallelism = 32
>>>>   dag_concurrency = 16
>>>>
>>>> (2) A dag (dag1) python file is as below partially. Please note that
>>>> inside this DAG, the first task (task1) is a long running task
>>>>
>>>> dag1 = DAG('dag1', schedule_interval=timedelta(minutes=15),
>>>> max_active_runs=1, default_args=args)
>>>>
>>>> Then, the tasks are running in the order...
>>>> task1 (long running) --> task 2  --> task3
>>>> ...
>>>> (3) In another dag (dag2) python file is as below partially.
>>>> dag2 = DAG('dag2', schedule_interval=timedelta(minutes=3),
>>>> max_active_runs=1, default_args=args)
>>>> ...
>>>> Then, the tasks are running in the order...
>>>> taskA (short running task) --> taskB
>>>>
>>>> (4) Inside the upstart script file. this is the main part how I start
>>>> airflow scheduler
>>>>
>>>> env SCHEDULER_RUNS=0
>>>> export SCHEDULER_RUNS
>>>>
>>>> script
>>>>     exec >> ${AIRFLOW_HOME}/scheduler-log/airflow-scheduler.log 2>&1
>>>>     exec usr/local/bin/airflow scheduler -n ${SCHEDULER_RUNS}
>>>> end script
>>>>
>>>> =========================
>>>>
>>>> What I observed are that
>>>> (a) task1 (of dag1) is running about 20 mins and during it's running
>>>> time, there is no other dag1 triggered. This is as expected.
>>>>
>>>> (b) taskA (of dag2) should be triggered to run every 3 mins. However,
>>>> it is NOT triggered if task-1 of dag-1 is running.
>>>> taskA seems to be queued/bolcked and not run. It is executed after
>>>> task-1 (of dag-1) is done. So, it looks like it is dispatched into a "gap"
>>>> of task1 and task2 (of dag1). This looks not normal, as it's expected taskA
>>>> (of dag 2) should run no matter what happens to another dag (dag-1).
>>>>
>>>>
>>>> Any suggestions?
>>>> Thanks.
>>>> Jason
>>>>
>>>>
>>>> On Tue, May 31, 2016 at 9:02 AM, Chris Riccomini <criccomini@apache.org
>>>> > wrote:
>>>>
>>>>> Hey Jason,
>>>>>
>>>>> The problem is max_active_runs_per_dag=1. Set it back to 16. You just
>>>>> need
>>>>> max_active_runs=1 for the individual DAGs. This will allow multiple
>>>>> (different) DAGs to run in parallel, but only one DAG of each type can
>>>>> run
>>>>> at the same type.
>>>>>
>>>>> Cheers,
>>>>> Chris
>>>>>
>>>>> On Fri, May 27, 2016 at 11:42 PM, Jason Chen <
>>>>> chingchien.chen@gmail.com>
>>>>> wrote:
>>>>>
>>>>> > Hi Chris,
>>>>> >  Thanks for your reply. After setting it up, I observed how it works
>>>>> for
>>>>> > couple of days..
>>>>> >
>>>>> >  I tried to to set max_active_runs=1 in the DAG
>>>>> > dag = DAG(...max_active_runs=1...) and it executed fine to avoid two
>>>>> runs
>>>>> > at the same time.
>>>>> > However, I noticed other dags (not the dag that is running) is also
>>>>> > "paused".
>>>>> > My understanding is that "max_active_runs" is basically
>>>>> > "max_active_runs_per_dag".
>>>>> > So, why another dag (different dag name) cannot run at the same time
>>>>> as the
>>>>> > first dag?
>>>>> > I want to have the two dags can be possibly run at the same time and
>>>>> inside
>>>>> > each dag, there is only
>>>>> > one run per dag.
>>>>> > Thanks.
>>>>> >
>>>>> > Jason
>>>>> >
>>>>> > My other settings in airflow.cfg
>>>>> >
>>>>> > max_active_runs_per_dag=1
>>>>> > parallelism = 32
>>>>> > dag_concurrency = 16
>>>>> >
>>>>> >
>>>>> >
>>>>> > On Mon, May 16, 2016 at 8:57 PM, Chris Riccomini <
>>>>> criccomini@apache.org>
>>>>> > wrote:
>>>>> >
>>>>> > > Hey Jason,
>>>>> > >
>>>>> > > For (2), by default, task1 will start running again. You'll have
>>>>> two runs
>>>>> > > going at the same time. If you want to prevent this, you can set
>>>>> > > max_active_runs to 1 in your DAG.
>>>>> > >
>>>>> > > Cheers,
>>>>> > > Chris
>>>>> > >
>>>>> > > On Mon, May 16, 2016 at 1:09 PM, Jason Chen <
>>>>> chingchien.chen@gmail.com>
>>>>> > > wrote:
>>>>> > >
>>>>> > > > I have two questions
>>>>> > > >
>>>>> > > > (1) For the airflow UI: "Tree view", it lists the tasks along
>>>>> with the
>>>>> > > time
>>>>> > > > highlighted in the top (say, 08:30; 09:00, etc). What's the
>>>>> meaning of
>>>>> > > > time? It looks not the UTC time of the task was running.  I know
>>>>> in
>>>>> > > > overall, airflow uses UTC time
>>>>> > > > (2) I have a DAG with two tasks: task1 --> task2
>>>>> > > > Task1 is running hourly and could take longer than one hour to
>>>>> run,
>>>>> > > > sometimes.
>>>>> > > > In such a setup, task1 will be triggered hourly and what happens
>>>>> if the
>>>>> > > > previous task1 is still running ? Will the "new" task1 be queued
>>>>> ?
>>>>> > > >
>>>>> > > > Thanks.
>>>>> > > > Jason
>>>>> > > >
>>>>> > >
>>>>> >
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Time zone used in "Tree view" and task order

Posted by Chris Riccomini <cr...@apache.org>.
Hey Jason,

The SequentialExecutor only ever runs one task at a time. It's meant for
debugging purposes. Try switching to the LocalExecutor.

Cheers,
Chris

On Tue, May 31, 2016 at 3:31 PM, Jason Chen <ch...@gmail.com>
wrote:

> Chris,
>  I am running SequentialExecutor.
>
> Thanks.
> Jason
>
>
> On Tue, May 31, 2016 at 1:36 PM, Chris Riccomini <cr...@apache.org>
> wrote:
>
>> Hey Jason,
>>
>> Are you running the SerialExecutor? This is the default out-of-the-box
>> executor.
>>
>> Cheers,
>> Chris
>>
>> On Tue, May 31, 2016 at 12:59 PM, Jason Chen <ch...@gmail.com>
>> wrote:
>>
>>> Hi Chris,
>>>
>>> I made the changes and tried it out.
>>> It seems not working as expected.
>>> When a dag is running (a particular task inside that dag is taking
>>> time), another task from another dag seems "blocked".
>>>
>>> My setting:
>>> (1) airflow.cfg
>>>   max_active_runs_per_dag = 16
>>>   parallelism = 32
>>>   dag_concurrency = 16
>>>
>>> (2) A dag (dag1) python file is as below partially. Please note that
>>> inside this DAG, the first task (task1) is a long running task
>>>
>>> dag1 = DAG('dag1', schedule_interval=timedelta(minutes=15),
>>> max_active_runs=1, default_args=args)
>>>
>>> Then, the tasks are running in the order...
>>> task1 (long running) --> task 2  --> task3
>>> ...
>>> (3) In another dag (dag2) python file is as below partially.
>>> dag2 = DAG('dag2', schedule_interval=timedelta(minutes=3),
>>> max_active_runs=1, default_args=args)
>>> ...
>>> Then, the tasks are running in the order...
>>> taskA (short running task) --> taskB
>>>
>>> (4) Inside the upstart script file. this is the main part how I start
>>> airflow scheduler
>>>
>>> env SCHEDULER_RUNS=0
>>> export SCHEDULER_RUNS
>>>
>>> script
>>>     exec >> ${AIRFLOW_HOME}/scheduler-log/airflow-scheduler.log 2>&1
>>>     exec usr/local/bin/airflow scheduler -n ${SCHEDULER_RUNS}
>>> end script
>>>
>>> =========================
>>>
>>> What I observed are that
>>> (a) task1 (of dag1) is running about 20 mins and during it's running
>>> time, there is no other dag1 triggered. This is as expected.
>>>
>>> (b) taskA (of dag2) should be triggered to run every 3 mins. However, it
>>> is NOT triggered if task-1 of dag-1 is running.
>>> taskA seems to be queued/bolcked and not run. It is executed after
>>> task-1 (of dag-1) is done. So, it looks like it is dispatched into a "gap"
>>> of task1 and task2 (of dag1). This looks not normal, as it's expected taskA
>>> (of dag 2) should run no matter what happens to another dag (dag-1).
>>>
>>>
>>> Any suggestions?
>>> Thanks.
>>> Jason
>>>
>>>
>>> On Tue, May 31, 2016 at 9:02 AM, Chris Riccomini <cr...@apache.org>
>>> wrote:
>>>
>>>> Hey Jason,
>>>>
>>>> The problem is max_active_runs_per_dag=1. Set it back to 16. You just
>>>> need
>>>> max_active_runs=1 for the individual DAGs. This will allow multiple
>>>> (different) DAGs to run in parallel, but only one DAG of each type can
>>>> run
>>>> at the same type.
>>>>
>>>> Cheers,
>>>> Chris
>>>>
>>>> On Fri, May 27, 2016 at 11:42 PM, Jason Chen <chingchien.chen@gmail.com
>>>> >
>>>> wrote:
>>>>
>>>> > Hi Chris,
>>>> >  Thanks for your reply. After setting it up, I observed how it works
>>>> for
>>>> > couple of days..
>>>> >
>>>> >  I tried to to set max_active_runs=1 in the DAG
>>>> > dag = DAG(...max_active_runs=1...) and it executed fine to avoid two
>>>> runs
>>>> > at the same time.
>>>> > However, I noticed other dags (not the dag that is running) is also
>>>> > "paused".
>>>> > My understanding is that "max_active_runs" is basically
>>>> > "max_active_runs_per_dag".
>>>> > So, why another dag (different dag name) cannot run at the same time
>>>> as the
>>>> > first dag?
>>>> > I want to have the two dags can be possibly run at the same time and
>>>> inside
>>>> > each dag, there is only
>>>> > one run per dag.
>>>> > Thanks.
>>>> >
>>>> > Jason
>>>> >
>>>> > My other settings in airflow.cfg
>>>> >
>>>> > max_active_runs_per_dag=1
>>>> > parallelism = 32
>>>> > dag_concurrency = 16
>>>> >
>>>> >
>>>> >
>>>> > On Mon, May 16, 2016 at 8:57 PM, Chris Riccomini <
>>>> criccomini@apache.org>
>>>> > wrote:
>>>> >
>>>> > > Hey Jason,
>>>> > >
>>>> > > For (2), by default, task1 will start running again. You'll have
>>>> two runs
>>>> > > going at the same time. If you want to prevent this, you can set
>>>> > > max_active_runs to 1 in your DAG.
>>>> > >
>>>> > > Cheers,
>>>> > > Chris
>>>> > >
>>>> > > On Mon, May 16, 2016 at 1:09 PM, Jason Chen <
>>>> chingchien.chen@gmail.com>
>>>> > > wrote:
>>>> > >
>>>> > > > I have two questions
>>>> > > >
>>>> > > > (1) For the airflow UI: "Tree view", it lists the tasks along
>>>> with the
>>>> > > time
>>>> > > > highlighted in the top (say, 08:30; 09:00, etc). What's the
>>>> meaning of
>>>> > > > time? It looks not the UTC time of the task was running.  I know
>>>> in
>>>> > > > overall, airflow uses UTC time
>>>> > > > (2) I have a DAG with two tasks: task1 --> task2
>>>> > > > Task1 is running hourly and could take longer than one hour to
>>>> run,
>>>> > > > sometimes.
>>>> > > > In such a setup, task1 will be triggered hourly and what happens
>>>> if the
>>>> > > > previous task1 is still running ? Will the "new" task1 be queued ?
>>>> > > >
>>>> > > > Thanks.
>>>> > > > Jason
>>>> > > >
>>>> > >
>>>> >
>>>>
>>>
>>>
>>
>

Re: Time zone used in "Tree view" and task order

Posted by Jason Chen <ch...@gmail.com>.
Chris,
 I am running SequentialExecutor.

Thanks.
Jason

On Tue, May 31, 2016 at 1:36 PM, Chris Riccomini <cr...@apache.org>
wrote:

> Hey Jason,
>
> Are you running the SerialExecutor? This is the default out-of-the-box
> executor.
>
> Cheers,
> Chris
>
> On Tue, May 31, 2016 at 12:59 PM, Jason Chen <ch...@gmail.com>
> wrote:
>
>> Hi Chris,
>>
>> I made the changes and tried it out.
>> It seems not working as expected.
>> When a dag is running (a particular task inside that dag is taking time),
>> another task from another dag seems "blocked".
>>
>> My setting:
>> (1) airflow.cfg
>>   max_active_runs_per_dag = 16
>>   parallelism = 32
>>   dag_concurrency = 16
>>
>> (2) A dag (dag1) python file is as below partially. Please note that
>> inside this DAG, the first task (task1) is a long running task
>>
>> dag1 = DAG('dag1', schedule_interval=timedelta(minutes=15),
>> max_active_runs=1, default_args=args)
>>
>> Then, the tasks are running in the order...
>> task1 (long running) --> task 2  --> task3
>> ...
>> (3) In another dag (dag2) python file is as below partially.
>> dag2 = DAG('dag2', schedule_interval=timedelta(minutes=3),
>> max_active_runs=1, default_args=args)
>> ...
>> Then, the tasks are running in the order...
>> taskA (short running task) --> taskB
>>
>> (4) Inside the upstart script file. this is the main part how I start
>> airflow scheduler
>>
>> env SCHEDULER_RUNS=0
>> export SCHEDULER_RUNS
>>
>> script
>>     exec >> ${AIRFLOW_HOME}/scheduler-log/airflow-scheduler.log 2>&1
>>     exec usr/local/bin/airflow scheduler -n ${SCHEDULER_RUNS}
>> end script
>>
>> =========================
>>
>> What I observed are that
>> (a) task1 (of dag1) is running about 20 mins and during it's running
>> time, there is no other dag1 triggered. This is as expected.
>>
>> (b) taskA (of dag2) should be triggered to run every 3 mins. However, it
>> is NOT triggered if task-1 of dag-1 is running.
>> taskA seems to be queued/bolcked and not run. It is executed after task-1
>> (of dag-1) is done. So, it looks like it is dispatched into a "gap" of
>> task1 and task2 (of dag1). This looks not normal, as it's expected taskA
>> (of dag 2) should run no matter what happens to another dag (dag-1).
>>
>>
>> Any suggestions?
>> Thanks.
>> Jason
>>
>>
>> On Tue, May 31, 2016 at 9:02 AM, Chris Riccomini <cr...@apache.org>
>> wrote:
>>
>>> Hey Jason,
>>>
>>> The problem is max_active_runs_per_dag=1. Set it back to 16. You just
>>> need
>>> max_active_runs=1 for the individual DAGs. This will allow multiple
>>> (different) DAGs to run in parallel, but only one DAG of each type can
>>> run
>>> at the same type.
>>>
>>> Cheers,
>>> Chris
>>>
>>> On Fri, May 27, 2016 at 11:42 PM, Jason Chen <ch...@gmail.com>
>>> wrote:
>>>
>>> > Hi Chris,
>>> >  Thanks for your reply. After setting it up, I observed how it works
>>> for
>>> > couple of days..
>>> >
>>> >  I tried to to set max_active_runs=1 in the DAG
>>> > dag = DAG(...max_active_runs=1...) and it executed fine to avoid two
>>> runs
>>> > at the same time.
>>> > However, I noticed other dags (not the dag that is running) is also
>>> > "paused".
>>> > My understanding is that "max_active_runs" is basically
>>> > "max_active_runs_per_dag".
>>> > So, why another dag (different dag name) cannot run at the same time
>>> as the
>>> > first dag?
>>> > I want to have the two dags can be possibly run at the same time and
>>> inside
>>> > each dag, there is only
>>> > one run per dag.
>>> > Thanks.
>>> >
>>> > Jason
>>> >
>>> > My other settings in airflow.cfg
>>> >
>>> > max_active_runs_per_dag=1
>>> > parallelism = 32
>>> > dag_concurrency = 16
>>> >
>>> >
>>> >
>>> > On Mon, May 16, 2016 at 8:57 PM, Chris Riccomini <
>>> criccomini@apache.org>
>>> > wrote:
>>> >
>>> > > Hey Jason,
>>> > >
>>> > > For (2), by default, task1 will start running again. You'll have two
>>> runs
>>> > > going at the same time. If you want to prevent this, you can set
>>> > > max_active_runs to 1 in your DAG.
>>> > >
>>> > > Cheers,
>>> > > Chris
>>> > >
>>> > > On Mon, May 16, 2016 at 1:09 PM, Jason Chen <
>>> chingchien.chen@gmail.com>
>>> > > wrote:
>>> > >
>>> > > > I have two questions
>>> > > >
>>> > > > (1) For the airflow UI: "Tree view", it lists the tasks along with
>>> the
>>> > > time
>>> > > > highlighted in the top (say, 08:30; 09:00, etc). What's the
>>> meaning of
>>> > > > time? It looks not the UTC time of the task was running.  I know in
>>> > > > overall, airflow uses UTC time
>>> > > > (2) I have a DAG with two tasks: task1 --> task2
>>> > > > Task1 is running hourly and could take longer than one hour to run,
>>> > > > sometimes.
>>> > > > In such a setup, task1 will be triggered hourly and what happens
>>> if the
>>> > > > previous task1 is still running ? Will the "new" task1 be queued ?
>>> > > >
>>> > > > Thanks.
>>> > > > Jason
>>> > > >
>>> > >
>>> >
>>>
>>
>>
>

Re: Time zone used in "Tree view" and task order

Posted by Chris Riccomini <cr...@apache.org>.
Hey Jason,

Are you running the SerialExecutor? This is the default out-of-the-box
executor.

Cheers,
Chris

On Tue, May 31, 2016 at 12:59 PM, Jason Chen <ch...@gmail.com>
wrote:

> Hi Chris,
>
> I made the changes and tried it out.
> It seems not working as expected.
> When a dag is running (a particular task inside that dag is taking time),
> another task from another dag seems "blocked".
>
> My setting:
> (1) airflow.cfg
>   max_active_runs_per_dag = 16
>   parallelism = 32
>   dag_concurrency = 16
>
> (2) A dag (dag1) python file is as below partially. Please note that
> inside this DAG, the first task (task1) is a long running task
>
> dag1 = DAG('dag1', schedule_interval=timedelta(minutes=15),
> max_active_runs=1, default_args=args)
>
> Then, the tasks are running in the order...
> task1 (long running) --> task 2  --> task3
> ...
> (3) In another dag (dag2) python file is as below partially.
> dag2 = DAG('dag2', schedule_interval=timedelta(minutes=3),
> max_active_runs=1, default_args=args)
> ...
> Then, the tasks are running in the order...
> taskA (short running task) --> taskB
>
> (4) Inside the upstart script file. this is the main part how I start
> airflow scheduler
>
> env SCHEDULER_RUNS=0
> export SCHEDULER_RUNS
>
> script
>     exec >> ${AIRFLOW_HOME}/scheduler-log/airflow-scheduler.log 2>&1
>     exec usr/local/bin/airflow scheduler -n ${SCHEDULER_RUNS}
> end script
>
> =========================
>
> What I observed are that
> (a) task1 (of dag1) is running about 20 mins and during it's running time,
> there is no other dag1 triggered. This is as expected.
>
> (b) taskA (of dag2) should be triggered to run every 3 mins. However, it
> is NOT triggered if task-1 of dag-1 is running.
> taskA seems to be queued/bolcked and not run. It is executed after task-1
> (of dag-1) is done. So, it looks like it is dispatched into a "gap" of
> task1 and task2 (of dag1). This looks not normal, as it's expected taskA
> (of dag 2) should run no matter what happens to another dag (dag-1).
>
>
> Any suggestions?
> Thanks.
> Jason
>
>
> On Tue, May 31, 2016 at 9:02 AM, Chris Riccomini <cr...@apache.org>
> wrote:
>
>> Hey Jason,
>>
>> The problem is max_active_runs_per_dag=1. Set it back to 16. You just need
>> max_active_runs=1 for the individual DAGs. This will allow multiple
>> (different) DAGs to run in parallel, but only one DAG of each type can run
>> at the same type.
>>
>> Cheers,
>> Chris
>>
>> On Fri, May 27, 2016 at 11:42 PM, Jason Chen <ch...@gmail.com>
>> wrote:
>>
>> > Hi Chris,
>> >  Thanks for your reply. After setting it up, I observed how it works for
>> > couple of days..
>> >
>> >  I tried to to set max_active_runs=1 in the DAG
>> > dag = DAG(...max_active_runs=1...) and it executed fine to avoid two
>> runs
>> > at the same time.
>> > However, I noticed other dags (not the dag that is running) is also
>> > "paused".
>> > My understanding is that "max_active_runs" is basically
>> > "max_active_runs_per_dag".
>> > So, why another dag (different dag name) cannot run at the same time as
>> the
>> > first dag?
>> > I want to have the two dags can be possibly run at the same time and
>> inside
>> > each dag, there is only
>> > one run per dag.
>> > Thanks.
>> >
>> > Jason
>> >
>> > My other settings in airflow.cfg
>> >
>> > max_active_runs_per_dag=1
>> > parallelism = 32
>> > dag_concurrency = 16
>> >
>> >
>> >
>> > On Mon, May 16, 2016 at 8:57 PM, Chris Riccomini <criccomini@apache.org
>> >
>> > wrote:
>> >
>> > > Hey Jason,
>> > >
>> > > For (2), by default, task1 will start running again. You'll have two
>> runs
>> > > going at the same time. If you want to prevent this, you can set
>> > > max_active_runs to 1 in your DAG.
>> > >
>> > > Cheers,
>> > > Chris
>> > >
>> > > On Mon, May 16, 2016 at 1:09 PM, Jason Chen <
>> chingchien.chen@gmail.com>
>> > > wrote:
>> > >
>> > > > I have two questions
>> > > >
>> > > > (1) For the airflow UI: "Tree view", it lists the tasks along with
>> the
>> > > time
>> > > > highlighted in the top (say, 08:30; 09:00, etc). What's the meaning
>> of
>> > > > time? It looks not the UTC time of the task was running.  I know in
>> > > > overall, airflow uses UTC time
>> > > > (2) I have a DAG with two tasks: task1 --> task2
>> > > > Task1 is running hourly and could take longer than one hour to run,
>> > > > sometimes.
>> > > > In such a setup, task1 will be triggered hourly and what happens if
>> the
>> > > > previous task1 is still running ? Will the "new" task1 be queued ?
>> > > >
>> > > > Thanks.
>> > > > Jason
>> > > >
>> > >
>> >
>>
>
>

Re: Time zone used in "Tree view" and task order

Posted by Jason Chen <ch...@gmail.com>.
Hi Chris,

I made the changes and tried it out.
It seems not working as expected.
When a dag is running (a particular task inside that dag is taking time),
another task from another dag seems "blocked".

My setting:
(1) airflow.cfg
  max_active_runs_per_dag = 16
  parallelism = 32
  dag_concurrency = 16

(2) A dag (dag1) python file is as below partially. Please note that inside
this DAG, the first task (task1) is a long running task

dag1 = DAG('dag1', schedule_interval=timedelta(minutes=15),
max_active_runs=1, default_args=args)

Then, the tasks are running in the order...
task1 (long running) --> task 2  --> task3
...
(3) In another dag (dag2) python file is as below partially.
dag2 = DAG('dag2', schedule_interval=timedelta(minutes=3),
max_active_runs=1, default_args=args)
...
Then, the tasks are running in the order...
taskA (short running task) --> taskB

(4) Inside the upstart script file. this is the main part how I start
airflow scheduler

env SCHEDULER_RUNS=0
export SCHEDULER_RUNS

script
    exec >> ${AIRFLOW_HOME}/scheduler-log/airflow-scheduler.log 2>&1
    exec usr/local/bin/airflow scheduler -n ${SCHEDULER_RUNS}
end script

=========================

What I observed are that
(a) task1 (of dag1) is running about 20 mins and during it's running time,
there is no other dag1 triggered. This is as expected.

(b) taskA (of dag2) should be triggered to run every 3 mins. However, it is
NOT triggered if task-1 of dag-1 is running.
taskA seems to be queued/bolcked and not run. It is executed after task-1
(of dag-1) is done. So, it looks like it is dispatched into a "gap" of
task1 and task2 (of dag1). This looks not normal, as it's expected taskA
(of dag 2) should run no matter what happens to another dag (dag-1).


Any suggestions?
Thanks.
Jason


On Tue, May 31, 2016 at 9:02 AM, Chris Riccomini <cr...@apache.org>
wrote:

> Hey Jason,
>
> The problem is max_active_runs_per_dag=1. Set it back to 16. You just need
> max_active_runs=1 for the individual DAGs. This will allow multiple
> (different) DAGs to run in parallel, but only one DAG of each type can run
> at the same type.
>
> Cheers,
> Chris
>
> On Fri, May 27, 2016 at 11:42 PM, Jason Chen <ch...@gmail.com>
> wrote:
>
> > Hi Chris,
> >  Thanks for your reply. After setting it up, I observed how it works for
> > couple of days..
> >
> >  I tried to to set max_active_runs=1 in the DAG
> > dag = DAG(...max_active_runs=1...) and it executed fine to avoid two runs
> > at the same time.
> > However, I noticed other dags (not the dag that is running) is also
> > "paused".
> > My understanding is that "max_active_runs" is basically
> > "max_active_runs_per_dag".
> > So, why another dag (different dag name) cannot run at the same time as
> the
> > first dag?
> > I want to have the two dags can be possibly run at the same time and
> inside
> > each dag, there is only
> > one run per dag.
> > Thanks.
> >
> > Jason
> >
> > My other settings in airflow.cfg
> >
> > max_active_runs_per_dag=1
> > parallelism = 32
> > dag_concurrency = 16
> >
> >
> >
> > On Mon, May 16, 2016 at 8:57 PM, Chris Riccomini <cr...@apache.org>
> > wrote:
> >
> > > Hey Jason,
> > >
> > > For (2), by default, task1 will start running again. You'll have two
> runs
> > > going at the same time. If you want to prevent this, you can set
> > > max_active_runs to 1 in your DAG.
> > >
> > > Cheers,
> > > Chris
> > >
> > > On Mon, May 16, 2016 at 1:09 PM, Jason Chen <chingchien.chen@gmail.com
> >
> > > wrote:
> > >
> > > > I have two questions
> > > >
> > > > (1) For the airflow UI: "Tree view", it lists the tasks along with
> the
> > > time
> > > > highlighted in the top (say, 08:30; 09:00, etc). What's the meaning
> of
> > > > time? It looks not the UTC time of the task was running.  I know in
> > > > overall, airflow uses UTC time
> > > > (2) I have a DAG with two tasks: task1 --> task2
> > > > Task1 is running hourly and could take longer than one hour to run,
> > > > sometimes.
> > > > In such a setup, task1 will be triggered hourly and what happens if
> the
> > > > previous task1 is still running ? Will the "new" task1 be queued ?
> > > >
> > > > Thanks.
> > > > Jason
> > > >
> > >
> >
>

Re: Time zone used in "Tree view" and task order

Posted by Jason Chen <ja...@surfline.com>.
Chris,

Got it.
Thanks for your information.
I will make the change and try it out!

-Jason



On 5/31/16, 9:02 AM, "Chris Riccomini" <cr...@apache.org> wrote:

>Hey Jason,
>
>The problem is max_active_runs_per_dag=1. Set it back to 16. You just need
>max_active_runs=1 for the individual DAGs. This will allow multiple
>(different) DAGs to run in parallel, but only one DAG of each type can run
>at the same type.
>
>Cheers,
>Chris
>
>On Fri, May 27, 2016 at 11:42 PM, Jason Chen <ch...@gmail.com>
>wrote:
>
>> Hi Chris,
>>  Thanks for your reply. After setting it up, I observed how it works for
>> couple of days..
>>
>>  I tried to to set max_active_runs=1 in the DAG
>> dag = DAG(...max_active_runs=1...) and it executed fine to avoid two runs
>> at the same time.
>> However, I noticed other dags (not the dag that is running) is also
>> "paused".
>> My understanding is that "max_active_runs" is basically
>> "max_active_runs_per_dag".
>> So, why another dag (different dag name) cannot run at the same time as the
>> first dag?
>> I want to have the two dags can be possibly run at the same time and inside
>> each dag, there is only
>> one run per dag.
>> Thanks.
>>
>> Jason
>>
>> My other settings in airflow.cfg
>>
>> max_active_runs_per_dag=1
>> parallelism = 32
>> dag_concurrency = 16
>>
>>
>>
>> On Mon, May 16, 2016 at 8:57 PM, Chris Riccomini <cr...@apache.org>
>> wrote:
>>
>> > Hey Jason,
>> >
>> > For (2), by default, task1 will start running again. You'll have two runs
>> > going at the same time. If you want to prevent this, you can set
>> > max_active_runs to 1 in your DAG.
>> >
>> > Cheers,
>> > Chris
>> >
>> > On Mon, May 16, 2016 at 1:09 PM, Jason Chen <ch...@gmail.com>
>> > wrote:
>> >
>> > > I have two questions
>> > >
>> > > (1) For the airflow UI: "Tree view", it lists the tasks along with the
>> > time
>> > > highlighted in the top (say, 08:30; 09:00, etc). What's the meaning of
>> > > time? It looks not the UTC time of the task was running.  I know in
>> > > overall, airflow uses UTC time
>> > > (2) I have a DAG with two tasks: task1 --> task2
>> > > Task1 is running hourly and could take longer than one hour to run,
>> > > sometimes.
>> > > In such a setup, task1 will be triggered hourly and what happens if the
>> > > previous task1 is still running ? Will the "new" task1 be queued ?
>> > >
>> > > Thanks.
>> > > Jason
>> > >
>> >
>>

Re: Time zone used in "Tree view" and task order

Posted by Chris Riccomini <cr...@apache.org>.
Hey Jason,

The problem is max_active_runs_per_dag=1. Set it back to 16. You just need
max_active_runs=1 for the individual DAGs. This will allow multiple
(different) DAGs to run in parallel, but only one DAG of each type can run
at the same type.

Cheers,
Chris

On Fri, May 27, 2016 at 11:42 PM, Jason Chen <ch...@gmail.com>
wrote:

> Hi Chris,
>  Thanks for your reply. After setting it up, I observed how it works for
> couple of days..
>
>  I tried to to set max_active_runs=1 in the DAG
> dag = DAG(...max_active_runs=1...) and it executed fine to avoid two runs
> at the same time.
> However, I noticed other dags (not the dag that is running) is also
> "paused".
> My understanding is that "max_active_runs" is basically
> "max_active_runs_per_dag".
> So, why another dag (different dag name) cannot run at the same time as the
> first dag?
> I want to have the two dags can be possibly run at the same time and inside
> each dag, there is only
> one run per dag.
> Thanks.
>
> Jason
>
> My other settings in airflow.cfg
>
> max_active_runs_per_dag=1
> parallelism = 32
> dag_concurrency = 16
>
>
>
> On Mon, May 16, 2016 at 8:57 PM, Chris Riccomini <cr...@apache.org>
> wrote:
>
> > Hey Jason,
> >
> > For (2), by default, task1 will start running again. You'll have two runs
> > going at the same time. If you want to prevent this, you can set
> > max_active_runs to 1 in your DAG.
> >
> > Cheers,
> > Chris
> >
> > On Mon, May 16, 2016 at 1:09 PM, Jason Chen <ch...@gmail.com>
> > wrote:
> >
> > > I have two questions
> > >
> > > (1) For the airflow UI: "Tree view", it lists the tasks along with the
> > time
> > > highlighted in the top (say, 08:30; 09:00, etc). What's the meaning of
> > > time? It looks not the UTC time of the task was running.  I know in
> > > overall, airflow uses UTC time
> > > (2) I have a DAG with two tasks: task1 --> task2
> > > Task1 is running hourly and could take longer than one hour to run,
> > > sometimes.
> > > In such a setup, task1 will be triggered hourly and what happens if the
> > > previous task1 is still running ? Will the "new" task1 be queued ?
> > >
> > > Thanks.
> > > Jason
> > >
> >
>

Re: Time zone used in "Tree view" and task order

Posted by Jason Chen <ch...@gmail.com>.
Hi Chris,
 Thanks for your reply. After setting it up, I observed how it works for
couple of days..

 I tried to to set max_active_runs=1 in the DAG
dag = DAG(...max_active_runs=1...) and it executed fine to avoid two runs
at the same time.
However, I noticed other dags (not the dag that is running) is also
"paused".
My understanding is that "max_active_runs" is basically
"max_active_runs_per_dag".
So, why another dag (different dag name) cannot run at the same time as the
first dag?
I want to have the two dags can be possibly run at the same time and inside
each dag, there is only
one run per dag.
Thanks.

Jason

My other settings in airflow.cfg

max_active_runs_per_dag=1
parallelism = 32
dag_concurrency = 16



On Mon, May 16, 2016 at 8:57 PM, Chris Riccomini <cr...@apache.org>
wrote:

> Hey Jason,
>
> For (2), by default, task1 will start running again. You'll have two runs
> going at the same time. If you want to prevent this, you can set
> max_active_runs to 1 in your DAG.
>
> Cheers,
> Chris
>
> On Mon, May 16, 2016 at 1:09 PM, Jason Chen <ch...@gmail.com>
> wrote:
>
> > I have two questions
> >
> > (1) For the airflow UI: "Tree view", it lists the tasks along with the
> time
> > highlighted in the top (say, 08:30; 09:00, etc). What's the meaning of
> > time? It looks not the UTC time of the task was running.  I know in
> > overall, airflow uses UTC time
> > (2) I have a DAG with two tasks: task1 --> task2
> > Task1 is running hourly and could take longer than one hour to run,
> > sometimes.
> > In such a setup, task1 will be triggered hourly and what happens if the
> > previous task1 is still running ? Will the "new" task1 be queued ?
> >
> > Thanks.
> > Jason
> >
>

Re: Time zone used in "Tree view" and task order

Posted by Chris Riccomini <cr...@apache.org>.
Hey Jason,

For (2), by default, task1 will start running again. You'll have two runs
going at the same time. If you want to prevent this, you can set
max_active_runs to 1 in your DAG.

Cheers,
Chris

On Mon, May 16, 2016 at 1:09 PM, Jason Chen <ch...@gmail.com>
wrote:

> I have two questions
>
> (1) For the airflow UI: "Tree view", it lists the tasks along with the time
> highlighted in the top (say, 08:30; 09:00, etc). What's the meaning of
> time? It looks not the UTC time of the task was running.  I know in
> overall, airflow uses UTC time
> (2) I have a DAG with two tasks: task1 --> task2
> Task1 is running hourly and could take longer than one hour to run,
> sometimes.
> In such a setup, task1 will be triggered hourly and what happens if the
> previous task1 is still running ? Will the "new" task1 be queued ?
>
> Thanks.
> Jason
>