You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Lance Norskog <la...@gmail.com> on 2016/05/18 01:06:54 UTC

Scheduler problems in 1.7?

Has the "long-running scheduler hang" problem been solved yet?
We just upgraded from 1.6.2 to 1.7.0 and we think it just happened, but
don't know.
Should we use chronic scheduler restarts?

Thanks,

-- 
Lance Norskog
lance.norskog@gmail.com
Redwood City, CA

Re: Scheduler problems in 1.7?

Posted by Dan Davydov <da...@airbnb.com.INVALID>.
We have two staging clusters at the moment:
1. Cluster with:

   - Canary DAG (sanity check that tasks can run end-to-end)
   - Synthetic DAGs (test a couple of operators)

2. Cluster with:

   - Our production DAGs
   - Has webserver/scheduler but no workers (so nothing is actually run).
   At some point soon we will add what Max suggested (replacing real tasks
   with dummy tasks) and then add workers to this cluster as well


On Thu, May 19, 2016 at 10:17 AM, Maxime Beauchemin <
maximebeauchemin@gmail.com> wrote:

> @dan can provide more details but I think our staging is running off of our
> production DAGS_FOLDER, but we swap all tasks with dummy tasks
> (BashOperator with a dummy command) in our policy function.
>
> http://pythonhosted.org/airflow/concepts.html?highlight=policy#cluster-policy
>
> On Thu, May 19, 2016 at 9:34 AM, Chris Riccomini <cr...@apache.org>
> wrote:
>
> > Hey Max,
> >
> > I think we would like to set up some DAG testing as well. Are you guys
> > using synthetic DAGs, or running your real DAGs on a separate cluster?
> >
> > Cheers,
> > Chris
> >
> > On Wed, May 18, 2016 at 1:32 PM, Lance Norskog <la...@gmail.com>
> > wrote:
> >
> > > Ok, we'll update to 1.7.1 when y'all think it's fine.
> > >
> > > Thanks,
> > >
> > > Lance Norskog
> > >
> > > On Wed, May 18, 2016 at 12:01 PM, Bolke de Bruin <bd...@gmail.com>
> > > wrote:
> > >
> > > > Hey Max,
> > > >
> > > > Fair point. I’ll make sure that for the next release we jump a bit
> > > earlier
> > > > on board. We do run integration tests continuously,
> > > > but only from next month we will reach a certain level of complexity
> we
> > > > really will need to start pre-testing releases.
> > > >
> > > > - Bolke
> > > >
> > > > > Op 18 mei 2016, om 17:49 heeft Maxime Beauchemin <
> > > > maximebeauchemin@gmail.com> het volgende geschreven:
> > > > >
> > > > > There's an RC out that is currently in production at Airbnb (as of
> > > > Monday)
> > > > > if you want to help us make sure the next version is up fully baked
> > for
> > > > > release. For now Airbnb is carrying most of the risk around
> deploying
> > > new
> > > > > code in production first. Knowing that we don't use all features
> and
> > > > > therefore wouldn't catch all possible regressions, it would be nice
> > to
> > > > have
> > > > > more companies pushing RCs in production along with us.
> > > > >
> > > > > Here's the git tag for the RC:
> > > > >
> > >
> https://github.com/apache/incubator-airflow/releases/tag/airbnb_1.7.1rc6
> > > > >
> > > > > Max
> > > > >
> > > > > On Tue, May 17, 2016 at 11:01 PM, Bolke de Bruin <
> bdbruin@gmail.com>
> > > > wrote:
> > > > >
> > > > >> 1.7.1 that most likely will be out at the end of the week,
> hopefully
> > > > fixes
> > > > >> this indeed. Don't stay on 1.7.0 for too long 1.7.1 contains many
> > > > stability
> > > > >> fixes.
> > > > >>
> > > > >> Verstuurd vanaf mijn iPad
> > > > >>
> > > > >>> Op 18 mei 2016 om 03:06 heeft Lance Norskog <
> > lance.norskog@gmail.com
> > > >
> > > > >> het volgende geschreven:
> > > > >>>
> > > > >>> Has the "long-running scheduler hang" problem been solved yet?
> > > > >>> We just upgraded from 1.6.2 to 1.7.0 and we think it just
> happened,
> > > but
> > > > >>> don't know.
> > > > >>> Should we use chronic scheduler restarts?
> > > > >>>
> > > > >>> Thanks,
> > > > >>>
> > > > >>> --
> > > > >>> Lance Norskog
> > > > >>> lance.norskog@gmail.com
> > > > >>> Redwood City, CA
> > > > >>
> > > >
> > > >
> > >
> > >
> > > --
> > > Lance Norskog
> > > lance.norskog@gmail.com
> > > Redwood City, CA
> > >
> >
>

Re: Scheduler problems in 1.7?

Posted by Maxime Beauchemin <ma...@gmail.com>.
@dan can provide more details but I think our staging is running off of our
production DAGS_FOLDER, but we swap all tasks with dummy tasks
(BashOperator with a dummy command) in our policy function.
http://pythonhosted.org/airflow/concepts.html?highlight=policy#cluster-policy

On Thu, May 19, 2016 at 9:34 AM, Chris Riccomini <cr...@apache.org>
wrote:

> Hey Max,
>
> I think we would like to set up some DAG testing as well. Are you guys
> using synthetic DAGs, or running your real DAGs on a separate cluster?
>
> Cheers,
> Chris
>
> On Wed, May 18, 2016 at 1:32 PM, Lance Norskog <la...@gmail.com>
> wrote:
>
> > Ok, we'll update to 1.7.1 when y'all think it's fine.
> >
> > Thanks,
> >
> > Lance Norskog
> >
> > On Wed, May 18, 2016 at 12:01 PM, Bolke de Bruin <bd...@gmail.com>
> > wrote:
> >
> > > Hey Max,
> > >
> > > Fair point. I’ll make sure that for the next release we jump a bit
> > earlier
> > > on board. We do run integration tests continuously,
> > > but only from next month we will reach a certain level of complexity we
> > > really will need to start pre-testing releases.
> > >
> > > - Bolke
> > >
> > > > Op 18 mei 2016, om 17:49 heeft Maxime Beauchemin <
> > > maximebeauchemin@gmail.com> het volgende geschreven:
> > > >
> > > > There's an RC out that is currently in production at Airbnb (as of
> > > Monday)
> > > > if you want to help us make sure the next version is up fully baked
> for
> > > > release. For now Airbnb is carrying most of the risk around deploying
> > new
> > > > code in production first. Knowing that we don't use all features and
> > > > therefore wouldn't catch all possible regressions, it would be nice
> to
> > > have
> > > > more companies pushing RCs in production along with us.
> > > >
> > > > Here's the git tag for the RC:
> > > >
> > https://github.com/apache/incubator-airflow/releases/tag/airbnb_1.7.1rc6
> > > >
> > > > Max
> > > >
> > > > On Tue, May 17, 2016 at 11:01 PM, Bolke de Bruin <bd...@gmail.com>
> > > wrote:
> > > >
> > > >> 1.7.1 that most likely will be out at the end of the week, hopefully
> > > fixes
> > > >> this indeed. Don't stay on 1.7.0 for too long 1.7.1 contains many
> > > stability
> > > >> fixes.
> > > >>
> > > >> Verstuurd vanaf mijn iPad
> > > >>
> > > >>> Op 18 mei 2016 om 03:06 heeft Lance Norskog <
> lance.norskog@gmail.com
> > >
> > > >> het volgende geschreven:
> > > >>>
> > > >>> Has the "long-running scheduler hang" problem been solved yet?
> > > >>> We just upgraded from 1.6.2 to 1.7.0 and we think it just happened,
> > but
> > > >>> don't know.
> > > >>> Should we use chronic scheduler restarts?
> > > >>>
> > > >>> Thanks,
> > > >>>
> > > >>> --
> > > >>> Lance Norskog
> > > >>> lance.norskog@gmail.com
> > > >>> Redwood City, CA
> > > >>
> > >
> > >
> >
> >
> > --
> > Lance Norskog
> > lance.norskog@gmail.com
> > Redwood City, CA
> >
>

Re: Scheduler problems in 1.7?

Posted by Chris Riccomini <cr...@apache.org>.
Hey Max,

I think we would like to set up some DAG testing as well. Are you guys
using synthetic DAGs, or running your real DAGs on a separate cluster?

Cheers,
Chris

On Wed, May 18, 2016 at 1:32 PM, Lance Norskog <la...@gmail.com>
wrote:

> Ok, we'll update to 1.7.1 when y'all think it's fine.
>
> Thanks,
>
> Lance Norskog
>
> On Wed, May 18, 2016 at 12:01 PM, Bolke de Bruin <bd...@gmail.com>
> wrote:
>
> > Hey Max,
> >
> > Fair point. I’ll make sure that for the next release we jump a bit
> earlier
> > on board. We do run integration tests continuously,
> > but only from next month we will reach a certain level of complexity we
> > really will need to start pre-testing releases.
> >
> > - Bolke
> >
> > > Op 18 mei 2016, om 17:49 heeft Maxime Beauchemin <
> > maximebeauchemin@gmail.com> het volgende geschreven:
> > >
> > > There's an RC out that is currently in production at Airbnb (as of
> > Monday)
> > > if you want to help us make sure the next version is up fully baked for
> > > release. For now Airbnb is carrying most of the risk around deploying
> new
> > > code in production first. Knowing that we don't use all features and
> > > therefore wouldn't catch all possible regressions, it would be nice to
> > have
> > > more companies pushing RCs in production along with us.
> > >
> > > Here's the git tag for the RC:
> > >
> https://github.com/apache/incubator-airflow/releases/tag/airbnb_1.7.1rc6
> > >
> > > Max
> > >
> > > On Tue, May 17, 2016 at 11:01 PM, Bolke de Bruin <bd...@gmail.com>
> > wrote:
> > >
> > >> 1.7.1 that most likely will be out at the end of the week, hopefully
> > fixes
> > >> this indeed. Don't stay on 1.7.0 for too long 1.7.1 contains many
> > stability
> > >> fixes.
> > >>
> > >> Verstuurd vanaf mijn iPad
> > >>
> > >>> Op 18 mei 2016 om 03:06 heeft Lance Norskog <lance.norskog@gmail.com
> >
> > >> het volgende geschreven:
> > >>>
> > >>> Has the "long-running scheduler hang" problem been solved yet?
> > >>> We just upgraded from 1.6.2 to 1.7.0 and we think it just happened,
> but
> > >>> don't know.
> > >>> Should we use chronic scheduler restarts?
> > >>>
> > >>> Thanks,
> > >>>
> > >>> --
> > >>> Lance Norskog
> > >>> lance.norskog@gmail.com
> > >>> Redwood City, CA
> > >>
> >
> >
>
>
> --
> Lance Norskog
> lance.norskog@gmail.com
> Redwood City, CA
>

Re: Scheduler problems in 1.7?

Posted by Lance Norskog <la...@gmail.com>.
Ok, we'll update to 1.7.1 when y'all think it's fine.

Thanks,

Lance Norskog

On Wed, May 18, 2016 at 12:01 PM, Bolke de Bruin <bd...@gmail.com> wrote:

> Hey Max,
>
> Fair point. I’ll make sure that for the next release we jump a bit earlier
> on board. We do run integration tests continuously,
> but only from next month we will reach a certain level of complexity we
> really will need to start pre-testing releases.
>
> - Bolke
>
> > Op 18 mei 2016, om 17:49 heeft Maxime Beauchemin <
> maximebeauchemin@gmail.com> het volgende geschreven:
> >
> > There's an RC out that is currently in production at Airbnb (as of
> Monday)
> > if you want to help us make sure the next version is up fully baked for
> > release. For now Airbnb is carrying most of the risk around deploying new
> > code in production first. Knowing that we don't use all features and
> > therefore wouldn't catch all possible regressions, it would be nice to
> have
> > more companies pushing RCs in production along with us.
> >
> > Here's the git tag for the RC:
> > https://github.com/apache/incubator-airflow/releases/tag/airbnb_1.7.1rc6
> >
> > Max
> >
> > On Tue, May 17, 2016 at 11:01 PM, Bolke de Bruin <bd...@gmail.com>
> wrote:
> >
> >> 1.7.1 that most likely will be out at the end of the week, hopefully
> fixes
> >> this indeed. Don't stay on 1.7.0 for too long 1.7.1 contains many
> stability
> >> fixes.
> >>
> >> Verstuurd vanaf mijn iPad
> >>
> >>> Op 18 mei 2016 om 03:06 heeft Lance Norskog <la...@gmail.com>
> >> het volgende geschreven:
> >>>
> >>> Has the "long-running scheduler hang" problem been solved yet?
> >>> We just upgraded from 1.6.2 to 1.7.0 and we think it just happened, but
> >>> don't know.
> >>> Should we use chronic scheduler restarts?
> >>>
> >>> Thanks,
> >>>
> >>> --
> >>> Lance Norskog
> >>> lance.norskog@gmail.com
> >>> Redwood City, CA
> >>
>
>


-- 
Lance Norskog
lance.norskog@gmail.com
Redwood City, CA

Re: Scheduler problems in 1.7?

Posted by Bolke de Bruin <bd...@gmail.com>.
Hey Max,

Fair point. I’ll make sure that for the next release we jump a bit earlier on board. We do run integration tests continuously,
but only from next month we will reach a certain level of complexity we really will need to start pre-testing releases.

- Bolke

> Op 18 mei 2016, om 17:49 heeft Maxime Beauchemin <ma...@gmail.com> het volgende geschreven:
> 
> There's an RC out that is currently in production at Airbnb (as of Monday)
> if you want to help us make sure the next version is up fully baked for
> release. For now Airbnb is carrying most of the risk around deploying new
> code in production first. Knowing that we don't use all features and
> therefore wouldn't catch all possible regressions, it would be nice to have
> more companies pushing RCs in production along with us.
> 
> Here's the git tag for the RC:
> https://github.com/apache/incubator-airflow/releases/tag/airbnb_1.7.1rc6
> 
> Max
> 
> On Tue, May 17, 2016 at 11:01 PM, Bolke de Bruin <bd...@gmail.com> wrote:
> 
>> 1.7.1 that most likely will be out at the end of the week, hopefully fixes
>> this indeed. Don't stay on 1.7.0 for too long 1.7.1 contains many stability
>> fixes.
>> 
>> Verstuurd vanaf mijn iPad
>> 
>>> Op 18 mei 2016 om 03:06 heeft Lance Norskog <la...@gmail.com>
>> het volgende geschreven:
>>> 
>>> Has the "long-running scheduler hang" problem been solved yet?
>>> We just upgraded from 1.6.2 to 1.7.0 and we think it just happened, but
>>> don't know.
>>> Should we use chronic scheduler restarts?
>>> 
>>> Thanks,
>>> 
>>> --
>>> Lance Norskog
>>> lance.norskog@gmail.com
>>> Redwood City, CA
>> 


Re: Scheduler problems in 1.7?

Posted by Maxime Beauchemin <ma...@gmail.com>.
There's an RC out that is currently in production at Airbnb (as of Monday)
if you want to help us make sure the next version is up fully baked for
release. For now Airbnb is carrying most of the risk around deploying new
code in production first. Knowing that we don't use all features and
therefore wouldn't catch all possible regressions, it would be nice to have
more companies pushing RCs in production along with us.

Here's the git tag for the RC:
https://github.com/apache/incubator-airflow/releases/tag/airbnb_1.7.1rc6

Max

On Tue, May 17, 2016 at 11:01 PM, Bolke de Bruin <bd...@gmail.com> wrote:

> 1.7.1 that most likely will be out at the end of the week, hopefully fixes
> this indeed. Don't stay on 1.7.0 for too long 1.7.1 contains many stability
> fixes.
>
> Verstuurd vanaf mijn iPad
>
> > Op 18 mei 2016 om 03:06 heeft Lance Norskog <la...@gmail.com>
> het volgende geschreven:
> >
> > Has the "long-running scheduler hang" problem been solved yet?
> > We just upgraded from 1.6.2 to 1.7.0 and we think it just happened, but
> > don't know.
> > Should we use chronic scheduler restarts?
> >
> > Thanks,
> >
> > --
> > Lance Norskog
> > lance.norskog@gmail.com
> > Redwood City, CA
>

Re: Scheduler problems in 1.7?

Posted by Bolke de Bruin <bd...@gmail.com>.
1.7.1 that most likely will be out at the end of the week, hopefully fixes this indeed. Don't stay on 1.7.0 for too long 1.7.1 contains many stability fixes. 

Verstuurd vanaf mijn iPad

> Op 18 mei 2016 om 03:06 heeft Lance Norskog <la...@gmail.com> het volgende geschreven:
> 
> Has the "long-running scheduler hang" problem been solved yet?
> We just upgraded from 1.6.2 to 1.7.0 and we think it just happened, but
> don't know.
> Should we use chronic scheduler restarts?
> 
> Thanks,
> 
> -- 
> Lance Norskog
> lance.norskog@gmail.com
> Redwood City, CA