You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Julian De Ruiter <ju...@godatadriven.com> on 2019/03/27 19:03:58 UTC

[AIP-19] Making the webserver stateless

Dear all,

Last week we added AIP-19 (https://cwiki.apache.org/confluence/display/AIRFLOW/AIP+19+-+Making+the+webserver+stateless), which aims to address various stability issues in the Airflow webserver stemming from differences in DagBag state between the different processes of the webserver. These stability issues are illustrated in this video: https://youtu.be/sNrBruPS3r4.

Our AIP aims to solve these issues by moving DAG-related information into the database, rather than querying DAG metadata from the DagBag instance of the given webserver process. By obtaining this information from the database, we can ensure that there is a single-source-of-truth for DAG-related metadata, thus avoiding differences in state between webserver processes. To keep this AIP tractable, we propose to leverage the existing ORM models for storing and querying DAG metadata from the database.

More information information on this AIP is available in cwiki. Feedback on the AIP is more than welcome! However, to keep the discussion centralized, I propose to discuss this AIP proposal in the comment section of cwiki.

Best regards / met vriendelijke groet,

Julian de Ruiter
Machine learning engineer

▉▉▉▉▉▉▉ GoDataDriven
Proudly part of the Xebia group

M: +31 6 30 61 26 24
W: http://www.godatadriven.com

Re: [AIP-19] Making the webserver stateless

Posted by Kevin Yang <yr...@gmail.com>.
Thank you Julian, nice work. Good idea trying to put context info into
TaskInstance. Overall I would be strongly preferring option 2a, for not
upsetting owners of big DAGs and being more tractable, we can keep in mind
we may 2b later when implementing 2a.

Cheers,
Kevin Y

On Sat, Apr 13, 2019 at 8:59 AM Andrew Stahlman <as...@lyft.com.invalid>
wrote:

> Hi Julian,
>
> Thanks for adding that exhaustive list of changes that are needed for each
> view. Assuming we went with option 2b for obtaining information about the
> edges:
>
> > Adding the current state of the DAG in the database, so that edges
> reflect the most recent version of DAG as it was parsed.
>
> Do you have a proposal for how the database schema will change? i.e., what
> tables are being added, their schema, any changes to columns in existing
> tables, etc. This has come up before in AIP-12
> <
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-12+Persist+DAG+into+DB
> >
> and its accompanying PR
> <
> https://github.com/apache/airflow/pull/4396/files#diff-ad4989e508949997ebe0f59574dc287f
> >
> - will we use the same schema as was proposed there?
>
> Thanks,
> Andrew Stahlman
>
> On Fri, Apr 12, 2019 at 7:43 AM Julian De Ruiter <
> julianderuiter@godatadriven.com> wrote:
> >
> > Dear all,
> >
> > As an update on AIP-19, I have added the list of expected changes to the
> discussion of the AIP:
>
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-19+Making+the+webserver+stateless
> .
> >
> > Does anybody have more feedback on the proposal?
> >
> > Best,
> > Julian de Ruiter
> >
> >
> > On 27 Mar 2019, at 20:03, Julian De Ruiter <
> julianderuiter@godatadriven.com<ma...@godatadriven.com>>
> wrote:
> >
> > Dear all,
> >
> > Last week we added AIP-19 (
>
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP+19+-+Making+the+webserver+stateless
> ),
> which aims to address various stability issues in the Airflow webserver
> stemming from differences in DagBag state between the different processes
> of the webserver. These stability issues are illustrated in this video:
> https://youtu.be/sNrBruPS3r4.
> >
> > Our AIP aims to solve these issues by moving DAG-related information into
> the database, rather than querying DAG metadata from the DagBag instance of
> the given webserver process. By obtaining this information from the
> database, we can ensure that there is a single-source-of-truth for
> DAG-related metadata, thus avoiding differences in state between webserver
> processes. To keep this AIP tractable, we propose to leverage the existing
> ORM models for storing and querying DAG metadata from the database.
> >
> > More information information on this AIP is available in cwiki. Feedback
> on the AIP is more than welcome! However, to keep the discussion
> centralized, I propose to discuss this AIP proposal in the comment section
> of cwiki.
> >
> > Best regards / met vriendelijke groet,
> >
> > Julian de Ruiter
> > Machine learning engineer
> >
> > ▉▉▉▉▉▉▉ GoDataDriven
> > Proudly part of the Xebia group
> >
> > M: +31 6 30 61 26 24
> > W: http://www.godatadriven.com
> >
>

Re: [AIP-19] Making the webserver stateless

Posted by Andrew Stahlman <as...@lyft.com.INVALID>.
Hi Julian,

Thanks for adding that exhaustive list of changes that are needed for each
view. Assuming we went with option 2b for obtaining information about the
edges:

> Adding the current state of the DAG in the database, so that edges
reflect the most recent version of DAG as it was parsed.

Do you have a proposal for how the database schema will change? i.e., what
tables are being added, their schema, any changes to columns in existing
tables, etc. This has come up before in AIP-12
<https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-12+Persist+DAG+into+DB>
and its accompanying PR
<https://github.com/apache/airflow/pull/4396/files#diff-ad4989e508949997ebe0f59574dc287f>
- will we use the same schema as was proposed there?

Thanks,
Andrew Stahlman

On Fri, Apr 12, 2019 at 7:43 AM Julian De Ruiter <
julianderuiter@godatadriven.com> wrote:
>
> Dear all,
>
> As an update on AIP-19, I have added the list of expected changes to the
discussion of the AIP:
https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-19+Making+the+webserver+stateless
.
>
> Does anybody have more feedback on the proposal?
>
> Best,
> Julian de Ruiter
>
>
> On 27 Mar 2019, at 20:03, Julian De Ruiter <
julianderuiter@godatadriven.com<ma...@godatadriven.com>>
wrote:
>
> Dear all,
>
> Last week we added AIP-19 (
https://cwiki.apache.org/confluence/display/AIRFLOW/AIP+19+-+Making+the+webserver+stateless),
which aims to address various stability issues in the Airflow webserver
stemming from differences in DagBag state between the different processes
of the webserver. These stability issues are illustrated in this video:
https://youtu.be/sNrBruPS3r4.
>
> Our AIP aims to solve these issues by moving DAG-related information into
the database, rather than querying DAG metadata from the DagBag instance of
the given webserver process. By obtaining this information from the
database, we can ensure that there is a single-source-of-truth for
DAG-related metadata, thus avoiding differences in state between webserver
processes. To keep this AIP tractable, we propose to leverage the existing
ORM models for storing and querying DAG metadata from the database.
>
> More information information on this AIP is available in cwiki. Feedback
on the AIP is more than welcome! However, to keep the discussion
centralized, I propose to discuss this AIP proposal in the comment section
of cwiki.
>
> Best regards / met vriendelijke groet,
>
> Julian de Ruiter
> Machine learning engineer
>
> ▉▉▉▉▉▉▉ GoDataDriven
> Proudly part of the Xebia group
>
> M: +31 6 30 61 26 24
> W: http://www.godatadriven.com
>

Re: [AIP-19] Making the webserver stateless

Posted by Julian De Ruiter <ju...@godatadriven.com>.
Dear all,

As an update on AIP-19, I have added the list of expected changes to the discussion of the AIP: https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-19+Making+the+webserver+stateless.

Does anybody have more feedback on the proposal?

Best,
Julian de Ruiter


On 27 Mar 2019, at 20:03, Julian De Ruiter <ju...@godatadriven.com>> wrote:

Dear all,

Last week we added AIP-19 (https://cwiki.apache.org/confluence/display/AIRFLOW/AIP+19+-+Making+the+webserver+stateless), which aims to address various stability issues in the Airflow webserver stemming from differences in DagBag state between the different processes of the webserver. These stability issues are illustrated in this video: https://youtu.be/sNrBruPS3r4.

Our AIP aims to solve these issues by moving DAG-related information into the database, rather than querying DAG metadata from the DagBag instance of the given webserver process. By obtaining this information from the database, we can ensure that there is a single-source-of-truth for DAG-related metadata, thus avoiding differences in state between webserver processes. To keep this AIP tractable, we propose to leverage the existing ORM models for storing and querying DAG metadata from the database.

More information information on this AIP is available in cwiki. Feedback on the AIP is more than welcome! However, to keep the discussion centralized, I propose to discuss this AIP proposal in the comment section of cwiki.

Best regards / met vriendelijke groet,

Julian de Ruiter
Machine learning engineer

▉▉▉▉▉▉▉ GoDataDriven
Proudly part of the Xebia group

M: +31 6 30 61 26 24
W: http://www.godatadriven.com