You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Kaxil Naik <ka...@gmail.com> on 2020/03/03 11:49:52 UTC

Re: [PROPOSAL][AIP-15 Support Multiple-Schedulers for HA & Better Scheduling Performance]

Good work on the Proposal Ash & Vikram.



On Fri, Feb 28, 2020 at 10:39 PM Vikram Koka <vi...@astronomer.io.invalid>
wrote:

> Team,
>
>
>
> We just updated 'AIP-15 Support Multiple-Schedulers for HA & Better
> Scheduling Performance' on Confluence and would very much appreciate
> feedback and suggestions from the community.
>
>
>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=103092651
>
>
>
> The original AIP was filed by Xiaodong Deng on March 2nd, 2019 and has
> stalled after a while, so with his blessing, we are taking the baton on
> this AIP. We at Astronomer have heard several enterprises ask for both High
> Availability as well as greater scalability, specifically around starting
> hundreds and thousands of tasks in a very short time window.
>
>
>
> We would like to attempt this based on our experience running Airflow as a
> Service and deploying Airflow at enterprises around the globe. We believe
> that this will benefit Airflow and fuel greater adoption of Airflow for
> production pipelines within enterprises.
>
>
>
> Building on the original AIP, we have proposed an active/active model,
> where we can scale schedulers, but are staying away from the quorum
> approach. Xiaodong Deng had put in some really good thinking about the
> problem including approaches towards reducing contention between multiple
> schedulers and we have included some of those concepts here. Additional
> commenters had discussed the possibilities of leader selection and those
> challenges, and we have incorporated their thinking as well. .
>
>
>
>  Any feedback, suggestions, and comments would be greatly appreciated.
>
>
>
> Best Regards,
>
>
> Ash Berlin-Taylor and Vikram Koka
>

Re: [PROPOSAL][AIP-15 Support Multiple-Schedulers for HA & Better Scheduling Performance]

Posted by Dan Davydov <dd...@twitter.com.INVALID>.
Haven't checked the math in the AIP but I believe with the given formula,
with 5 schedulers and 100 DAGs there is already a 9% chance of conflict and
the larger users of Airflow have many more DAGs than that.

I'm a bit concerned putting about putting more load on the DB which is
already a scalability bottleneck. I agree with the sentiment in the AIP
about using a more long-term solution like leader election (or consistent
hashing with hash(dag_id) -> scheduler instance, etc), and the even more
radical change would be pushing the scheduling logic to the workers
themselves so scheduling becomes push-based instead of pull-based. The
proposed change is probably better than doing nothing though in the short
term, and I think one that shouldn't be too hard to reverse/change if done
properly so I'm neutral overall.

On Mon, Mar 16, 2020 at 6:12 PM Deng Xiaodong <xd...@gmail.com> wrote:

> Would be happy to give +1 for this AIP later!
>
>
> XD
>
> On Mon, Mar 16, 2020 at 11:08 PM Ash Berlin-Taylor <as...@apache.org> wrote:
>
> > Does anyone have any other opinions about this? If not I'd like to call a
> > vote (and start working on the code!)
> >
> > -ash
> > On Mar 3 2020, at 12:34 pm, Kaxil Naik <ka...@gmail.com> wrote:
> > > The goal would be to support both MySQL and PostgreSQL for production
> as
> > we know many of Airflow users use MySQL as Metadata DB. On Tue, Mar 3,
> 2020
> > at 12:25 PM Ash Berlin-Taylor wrote: > It _shouldn't_, and we will test
> > extensively with mysql. > > Worse case is we'll have to fall back to
> > managing the lock ourselves with > a column rather than relying on db/row
> > level locks. This might be a case > where we have different/specialised
> > behaviour for different dbs, or even db > versions, if say mysql 8
> behaves
> > okay but 5.7/5.6 doesn't. > > Ash > > On 3 March 2020 07:01:15 GMT-05:00,
> > "Kamil Breguła" < > kamil.bregula@polidea.com> wrote: > >Hello, > > >
> > >Will reliance on the database cause problems with MySQL? A lot of my >
> > >users use this database. I am afraid that the lock mechanism in MySQL >
> > >is much less stable and predictable than PostgresSQL and this can >
> >cause
> > various stability problems. I know that Astronomer uses > >PostgreSQL,
> but
> > Airflow supports RDMS in a production environment and > >both must work
> > properly in this AIP. > > > >Best regards, > >Kamil > > > >On Tue, Mar 3,
> > 2020 at 12:50 PM Kaxil Naik wrote: > >> > >> Good work on the Proposal
> Ash
> > & Vikram. > >> > >> > >> > >> On Fri, Feb 28, 2020 at 10:39 PM Vikram
> Koka
> > > > > >> wrote: > >> > >> > Team, > >> > > >> > > >> > > >> > We just
> > updated 'AIP-15 Support Multiple-Schedulers for HA & Better > >> >
> > Scheduling Performance' on Confluence and would very much > >appreciate >
> > >> > feedback and suggestions from the community. > >> > > >> > > >> > >
> >>
> > > > > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=103092651
> > > >> > > >> > > >> > > >> > The original AIP was filed by Xiaodong Deng
> on
> > March 2nd, 2019 and > >has > >> > stalled after a while, so with his
> > blessing, we are taking the > >baton on > >> > this AIP. We at Astronomer
> > have heard several enterprises ask for > >both High > >> > Availability
> as
> > well as greater scalability, specifically around > >starting > >> >
> > hundreds and thousands of tasks in a very short time window. > >> > > >>
> >
> > > >> > > >> > We would like to attempt this based on our experience
> running
> > > >Airflow as a > >> > Service and deploying Airflow at enterprises
> around
> > the globe. We > >believe > >> > that this will benefit Airflow and fuel
> > greater adoption of Airflow > >for > >> > production pipelines within
> > enterprises. > >> > > >> > > >> > > >> > Building on the original AIP, we
> > have proposed an active/active > >model, > >> > where we can scale
> > schedulers, but are staying away from the quorum > >> > approach.
> Xiaodong
> > Deng had put in some really good thinking about > >the > >> > problem
> > including approaches towards reducing contention between > >multiple >
> >> >
> > schedulers and we have included some of those concepts here. >
> >Additional
> > > >> > commenters had discussed the possibilities of leader selection
> and >
> > >those > >> > challenges, and we have incorporated their thinking as
> well.
> > . > >> > > >> > > >> > > >> > Any feedback, suggestions, and comments
> would
> > be greatly > >appreciated. > >> > > >> > > >> > > >> > Best Regards, >
> >> >
> > > >> > > >> > Ash Berlin-Taylor and Vikram Koka > >> > >
> >
> >
>

Re: [PROPOSAL][AIP-15 Support Multiple-Schedulers for HA & Better Scheduling Performance]

Posted by Deng Xiaodong <xd...@gmail.com>.
Would be happy to give +1 for this AIP later!


XD

On Mon, Mar 16, 2020 at 11:08 PM Ash Berlin-Taylor <as...@apache.org> wrote:

> Does anyone have any other opinions about this? If not I'd like to call a
> vote (and start working on the code!)
>
> -ash
> On Mar 3 2020, at 12:34 pm, Kaxil Naik <ka...@gmail.com> wrote:
> > The goal would be to support both MySQL and PostgreSQL for production as
> we know many of Airflow users use MySQL as Metadata DB. On Tue, Mar 3, 2020
> at 12:25 PM Ash Berlin-Taylor wrote: > It _shouldn't_, and we will test
> extensively with mysql. > > Worse case is we'll have to fall back to
> managing the lock ourselves with > a column rather than relying on db/row
> level locks. This might be a case > where we have different/specialised
> behaviour for different dbs, or even db > versions, if say mysql 8 behaves
> okay but 5.7/5.6 doesn't. > > Ash > > On 3 March 2020 07:01:15 GMT-05:00,
> "Kamil Breguła" < > kamil.bregula@polidea.com> wrote: > >Hello, > > >
> >Will reliance on the database cause problems with MySQL? A lot of my >
> >users use this database. I am afraid that the lock mechanism in MySQL >
> >is much less stable and predictable than PostgresSQL and this can > >cause
> various stability problems. I know that Astronomer uses > >PostgreSQL, but
> Airflow supports RDMS in a production environment and > >both must work
> properly in this AIP. > > > >Best regards, > >Kamil > > > >On Tue, Mar 3,
> 2020 at 12:50 PM Kaxil Naik wrote: > >> > >> Good work on the Proposal Ash
> & Vikram. > >> > >> > >> > >> On Fri, Feb 28, 2020 at 10:39 PM Vikram Koka
> > > > >> wrote: > >> > >> > Team, > >> > > >> > > >> > > >> > We just
> updated 'AIP-15 Support Multiple-Schedulers for HA & Better > >> >
> Scheduling Performance' on Confluence and would very much > >appreciate >
> >> > feedback and suggestions from the community. > >> > > >> > > >> > > >>
> > > > >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=103092651
> > >> > > >> > > >> > > >> > The original AIP was filed by Xiaodong Deng on
> March 2nd, 2019 and > >has > >> > stalled after a while, so with his
> blessing, we are taking the > >baton on > >> > this AIP. We at Astronomer
> have heard several enterprises ask for > >both High > >> > Availability as
> well as greater scalability, specifically around > >starting > >> >
> hundreds and thousands of tasks in a very short time window. > >> > > >> >
> > >> > > >> > We would like to attempt this based on our experience running
> > >Airflow as a > >> > Service and deploying Airflow at enterprises around
> the globe. We > >believe > >> > that this will benefit Airflow and fuel
> greater adoption of Airflow > >for > >> > production pipelines within
> enterprises. > >> > > >> > > >> > > >> > Building on the original AIP, we
> have proposed an active/active > >model, > >> > where we can scale
> schedulers, but are staying away from the quorum > >> > approach. Xiaodong
> Deng had put in some really good thinking about > >the > >> > problem
> including approaches towards reducing contention between > >multiple > >> >
> schedulers and we have included some of those concepts here. > >Additional
> > >> > commenters had discussed the possibilities of leader selection and >
> >those > >> > challenges, and we have incorporated their thinking as well.
> . > >> > > >> > > >> > > >> > Any feedback, suggestions, and comments would
> be greatly > >appreciated. > >> > > >> > > >> > > >> > Best Regards, > >> >
> > >> > > >> > Ash Berlin-Taylor and Vikram Koka > >> > >
>
>

Re: [PROPOSAL][AIP-15 Support Multiple-Schedulers for HA & Better Scheduling Performance]

Posted by Ash Berlin-Taylor <as...@apache.org>.
Does anyone have any other opinions about this? If not I'd like to call a vote (and start working on the code!)

-ash
On Mar 3 2020, at 12:34 pm, Kaxil Naik <ka...@gmail.com> wrote:
> The goal would be to support both MySQL and PostgreSQL for production as we know many of Airflow users use MySQL as Metadata DB. On Tue, Mar 3, 2020 at 12:25 PM Ash Berlin-Taylor wrote: > It _shouldn't_, and we will test extensively with mysql. > > Worse case is we'll have to fall back to managing the lock ourselves with > a column rather than relying on db/row level locks. This might be a case > where we have different/specialised behaviour for different dbs, or even db > versions, if say mysql 8 behaves okay but 5.7/5.6 doesn't. > > Ash > > On 3 March 2020 07:01:15 GMT-05:00, "Kamil Breguła" < > kamil.bregula@polidea.com> wrote: > >Hello, > > > >Will reliance on the database cause problems with MySQL? A lot of my > >users use this database. I am afraid that the lock mechanism in MySQL > >is much less stable and predictable than PostgresSQL and this can > >cause various stability problems. I know that Astronomer uses > >PostgreSQL, but Airflow supports RDMS in a production environment and > >both must work properly in this AIP. > > > >Best regards, > >Kamil > > > >On Tue, Mar 3, 2020 at 12:50 PM Kaxil Naik wrote: > >> > >> Good work on the Proposal Ash & Vikram. > >> > >> > >> > >> On Fri, Feb 28, 2020 at 10:39 PM Vikram Koka > > > >> wrote: > >> > >> > Team, > >> > > >> > > >> > > >> > We just updated 'AIP-15 Support Multiple-Schedulers for HA & Better > >> > Scheduling Performance' on Confluence and would very much > >appreciate > >> > feedback and suggestions from the community. > >> > > >> > > >> > > >> > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=103092651 > >> > > >> > > >> > > >> > The original AIP was filed by Xiaodong Deng on March 2nd, 2019 and > >has > >> > stalled after a while, so with his blessing, we are taking the > >baton on > >> > this AIP. We at Astronomer have heard several enterprises ask for > >both High > >> > Availability as well as greater scalability, specifically around > >starting > >> > hundreds and thousands of tasks in a very short time window. > >> > > >> > > >> > > >> > We would like to attempt this based on our experience running > >Airflow as a > >> > Service and deploying Airflow at enterprises around the globe. We > >believe > >> > that this will benefit Airflow and fuel greater adoption of Airflow > >for > >> > production pipelines within enterprises. > >> > > >> > > >> > > >> > Building on the original AIP, we have proposed an active/active > >model, > >> > where we can scale schedulers, but are staying away from the quorum > >> > approach. Xiaodong Deng had put in some really good thinking about > >the > >> > problem including approaches towards reducing contention between > >multiple > >> > schedulers and we have included some of those concepts here. > >Additional > >> > commenters had discussed the possibilities of leader selection and > >those > >> > challenges, and we have incorporated their thinking as well. . > >> > > >> > > >> > > >> > Any feedback, suggestions, and comments would be greatly > >appreciated. > >> > > >> > > >> > > >> > Best Regards, > >> > > >> > > >> > Ash Berlin-Taylor and Vikram Koka > >> > >


Re: [PROPOSAL][AIP-15 Support Multiple-Schedulers for HA & Better Scheduling Performance]

Posted by Kaxil Naik <ka...@gmail.com>.
The goal would be to support both MySQL and PostgreSQL for production as we
know many of Airflow users use MySQL as Metadata DB.

On Tue, Mar 3, 2020 at 12:25 PM Ash Berlin-Taylor <as...@apache.org> wrote:

> It _shouldn't_, and we will test extensively with mysql.
>
> Worse case is we'll have to fall back to managing the lock ourselves with
> a column rather than relying on db/row level locks. This might be a case
> where we have different/specialised behaviour for different dbs, or even db
> versions, if say mysql 8 behaves okay but 5.7/5.6 doesn't.
>
> Ash
>
> On 3 March 2020 07:01:15 GMT-05:00, "Kamil Breguła" <
> kamil.bregula@polidea.com> wrote:
> >Hello,
> >
> >Will reliance on the database cause problems with MySQL? A lot of my
> >users use this database.  I am afraid that the lock mechanism in MySQL
> >is much less stable and predictable than PostgresSQL and this can
> >cause various stability problems. I know that Astronomer uses
> >PostgreSQL, but Airflow supports RDMS in a production environment and
> >both must work properly in this AIP.
> >
> >Best regards,
> >Kamil
> >
> >On Tue, Mar 3, 2020 at 12:50 PM Kaxil Naik <ka...@gmail.com> wrote:
> >>
> >> Good work on the Proposal Ash & Vikram.
> >>
> >>
> >>
> >> On Fri, Feb 28, 2020 at 10:39 PM Vikram Koka
> ><vi...@astronomer.io.invalid>
> >> wrote:
> >>
> >> > Team,
> >> >
> >> >
> >> >
> >> > We just updated 'AIP-15 Support Multiple-Schedulers for HA & Better
> >> > Scheduling Performance' on Confluence and would very much
> >appreciate
> >> > feedback and suggestions from the community.
> >> >
> >> >
> >> >
> >> >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=103092651
> >> >
> >> >
> >> >
> >> > The original AIP was filed by Xiaodong Deng on March 2nd, 2019 and
> >has
> >> > stalled after a while, so with his blessing, we are taking the
> >baton on
> >> > this AIP. We at Astronomer have heard several enterprises ask for
> >both High
> >> > Availability as well as greater scalability, specifically around
> >starting
> >> > hundreds and thousands of tasks in a very short time window.
> >> >
> >> >
> >> >
> >> > We would like to attempt this based on our experience running
> >Airflow as a
> >> > Service and deploying Airflow at enterprises around the globe. We
> >believe
> >> > that this will benefit Airflow and fuel greater adoption of Airflow
> >for
> >> > production pipelines within enterprises.
> >> >
> >> >
> >> >
> >> > Building on the original AIP, we have proposed an active/active
> >model,
> >> > where we can scale schedulers, but are staying away from the quorum
> >> > approach. Xiaodong Deng had put in some really good thinking about
> >the
> >> > problem including approaches towards reducing contention between
> >multiple
> >> > schedulers and we have included some of those concepts here.
> >Additional
> >> > commenters had discussed the possibilities of leader selection and
> >those
> >> > challenges, and we have incorporated their thinking as well. .
> >> >
> >> >
> >> >
> >> >  Any feedback, suggestions, and comments would be greatly
> >appreciated.
> >> >
> >> >
> >> >
> >> > Best Regards,
> >> >
> >> >
> >> > Ash Berlin-Taylor and Vikram Koka
> >> >
>

Re: [PROPOSAL][AIP-15 Support Multiple-Schedulers for HA & Better Scheduling Performance]

Posted by Ash Berlin-Taylor <as...@apache.org>.
It _shouldn't_, and we will test extensively with mysql.

Worse case is we'll have to fall back to managing the lock ourselves with a column rather than relying on db/row level locks. This might be a case where we have different/specialised behaviour for different dbs, or even db versions, if say mysql 8 behaves okay but 5.7/5.6 doesn't.

Ash

On 3 March 2020 07:01:15 GMT-05:00, "Kamil Breguła" <ka...@polidea.com> wrote:
>Hello,
>
>Will reliance on the database cause problems with MySQL? A lot of my
>users use this database.  I am afraid that the lock mechanism in MySQL
>is much less stable and predictable than PostgresSQL and this can
>cause various stability problems. I know that Astronomer uses
>PostgreSQL, but Airflow supports RDMS in a production environment and
>both must work properly in this AIP.
>
>Best regards,
>Kamil
>
>On Tue, Mar 3, 2020 at 12:50 PM Kaxil Naik <ka...@gmail.com> wrote:
>>
>> Good work on the Proposal Ash & Vikram.
>>
>>
>>
>> On Fri, Feb 28, 2020 at 10:39 PM Vikram Koka
><vi...@astronomer.io.invalid>
>> wrote:
>>
>> > Team,
>> >
>> >
>> >
>> > We just updated 'AIP-15 Support Multiple-Schedulers for HA & Better
>> > Scheduling Performance' on Confluence and would very much
>appreciate
>> > feedback and suggestions from the community.
>> >
>> >
>> >
>> >
>https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=103092651
>> >
>> >
>> >
>> > The original AIP was filed by Xiaodong Deng on March 2nd, 2019 and
>has
>> > stalled after a while, so with his blessing, we are taking the
>baton on
>> > this AIP. We at Astronomer have heard several enterprises ask for
>both High
>> > Availability as well as greater scalability, specifically around
>starting
>> > hundreds and thousands of tasks in a very short time window.
>> >
>> >
>> >
>> > We would like to attempt this based on our experience running
>Airflow as a
>> > Service and deploying Airflow at enterprises around the globe. We
>believe
>> > that this will benefit Airflow and fuel greater adoption of Airflow
>for
>> > production pipelines within enterprises.
>> >
>> >
>> >
>> > Building on the original AIP, we have proposed an active/active
>model,
>> > where we can scale schedulers, but are staying away from the quorum
>> > approach. Xiaodong Deng had put in some really good thinking about
>the
>> > problem including approaches towards reducing contention between
>multiple
>> > schedulers and we have included some of those concepts here.
>Additional
>> > commenters had discussed the possibilities of leader selection and
>those
>> > challenges, and we have incorporated their thinking as well. .
>> >
>> >
>> >
>> >  Any feedback, suggestions, and comments would be greatly
>appreciated.
>> >
>> >
>> >
>> > Best Regards,
>> >
>> >
>> > Ash Berlin-Taylor and Vikram Koka
>> >

Re: [PROPOSAL][AIP-15 Support Multiple-Schedulers for HA & Better Scheduling Performance]

Posted by Kamil Breguła <ka...@polidea.com>.
Hello,

Will reliance on the database cause problems with MySQL? A lot of my
users use this database.  I am afraid that the lock mechanism in MySQL
is much less stable and predictable than PostgresSQL and this can
cause various stability problems. I know that Astronomer uses
PostgreSQL, but Airflow supports RDMS in a production environment and
both must work properly in this AIP.

Best regards,
Kamil

On Tue, Mar 3, 2020 at 12:50 PM Kaxil Naik <ka...@gmail.com> wrote:
>
> Good work on the Proposal Ash & Vikram.
>
>
>
> On Fri, Feb 28, 2020 at 10:39 PM Vikram Koka <vi...@astronomer.io.invalid>
> wrote:
>
> > Team,
> >
> >
> >
> > We just updated 'AIP-15 Support Multiple-Schedulers for HA & Better
> > Scheduling Performance' on Confluence and would very much appreciate
> > feedback and suggestions from the community.
> >
> >
> >
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=103092651
> >
> >
> >
> > The original AIP was filed by Xiaodong Deng on March 2nd, 2019 and has
> > stalled after a while, so with his blessing, we are taking the baton on
> > this AIP. We at Astronomer have heard several enterprises ask for both High
> > Availability as well as greater scalability, specifically around starting
> > hundreds and thousands of tasks in a very short time window.
> >
> >
> >
> > We would like to attempt this based on our experience running Airflow as a
> > Service and deploying Airflow at enterprises around the globe. We believe
> > that this will benefit Airflow and fuel greater adoption of Airflow for
> > production pipelines within enterprises.
> >
> >
> >
> > Building on the original AIP, we have proposed an active/active model,
> > where we can scale schedulers, but are staying away from the quorum
> > approach. Xiaodong Deng had put in some really good thinking about the
> > problem including approaches towards reducing contention between multiple
> > schedulers and we have included some of those concepts here. Additional
> > commenters had discussed the possibilities of leader selection and those
> > challenges, and we have incorporated their thinking as well. .
> >
> >
> >
> >  Any feedback, suggestions, and comments would be greatly appreciated.
> >
> >
> >
> > Best Regards,
> >
> >
> > Ash Berlin-Taylor and Vikram Koka
> >