You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Kaxil Naik <ka...@gmail.com> on 2020/03/14 01:23:50 UTC

Stateless Webserver with DAG Serialization

Hi all,

Happy to tell you all that we have completed the first phase of DAG
Serialisation i.e. the Webserver is stateless and can now run without
access to DAG Files.

The 2 limitations we had in 1.10.7-1.10.9 (
https://airflow.apache.org/docs/1.10.7/dag-serialization.html#limitations)
have been resolved.

Special thanks to @ash for his continuous guidance and contributions.

Also a special mention to Anita Fronczak and Zhou Fang for their
contributions along the way.

The next step is to remove SimpleDag representation in the Scheduler and
replace it with Serialized DAG (WIP PR:
https://github.com/apache/airflow/pull/7694)

*Advantages*:

   - *Reduction in Webserver startup time* for large number of DAGs.
   Without DAG Serialization all the DAGs are loaded in the DagBag during the
   Webserver startup. With DAG Serialization, an empty DagBag is created and
   Dags are loaded from DB only when needed (i.e. when a particular DAG is
   clicked on in the home page)
   - *No DAG Parsing / Consistency*: Webserver would load DAGs from DB and
   won't even need the DAG Files when DAG Serialization is turned on. DAGs are
   parsed, serialized and stored in DB by the Scheduler.
   - Rendered Templates for TasksInstances that have already run will now
   correctly display their value which was true at the time of the run instead
   of the current value.
   - Paves way for* DAG Versioning* (more details on it when I create a
   separate AIP / update an existing AIP for it) and *Scheduler HA *(AIP-15
   <https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=103092651>
   ).


I will create new JIRA issues for further steps with DAG Serialization and
DAG Versioning and would discuss them in our next sig-dag-serialization
call (later this month).

Regards,
Kaxil

Re: Stateless Webserver with DAG Serialization

Posted by Kaxil Naik <ka...@gmail.com>.
Thank you all :)



On Sun, Mar 15, 2020 at 8:35 AM Driesprong, Fokko <fo...@driesprong.frl>
wrote:

> Thanks for the update Kaxil, this is awesome! 💯
>
> Cheers, Fokko
>
> Op za 14 mrt. 2020 om 13:40 schreef Tomasz Urbaszek <turbaszek@apache.org
> >:
>
> > Thanks Kaxil, Ash, Anita and Zhou! This is a big step forward :)
> >
> > T.
> >
> > On Sat, Mar 14, 2020 at 12:03 PM Sumit Maheshwari <
> sumeet.manit@gmail.com>
> > wrote:
> >
> > > Great stuff!! Thanks to everyone who contributed 👏
> > >
> > > On Sat, Mar 14, 2020 at 2:26 PM Deng Xiaodong <xd...@gmail.com>
> > wrote:
> > >
> > > > It is really great news and would be a big plus.
> > > >
> > > > Thanks Kaxil, Ash, Anita, and Zhou Fang!
> > > >
> > > >
> > > > XD
> > > >
> > > > > On 14 Mar 2020, at 2:31 AM, Kevin Yang <yr...@gmail.com> wrote:
> > > > >
> > > > > This is thrilling news! Completely upleveling the experience using
> > and
> > > > > maintaining the webserver. Thank you so much everyone who
> contributed
> > > to
> > > > > this initiative!
> > > > >
> > > > >
> > > > > Cheers,
> > > > > Kevin Y
> > > > >
> > > > > On Fri, Mar 13, 2020 at 6:24 PM Kaxil Naik <ka...@gmail.com>
> > > wrote:
> > > > >
> > > > >> Hi all,
> > > > >>
> > > > >> Happy to tell you all that we have completed the first phase of
> DAG
> > > > >> Serialisation i.e. the Webserver is stateless and can now run
> > without
> > > > >> access to DAG Files.
> > > > >>
> > > > >> The 2 limitations we had in 1.10.7-1.10.9 (
> > > > >>
> > > >
> > >
> >
> https://airflow.apache.org/docs/1.10.7/dag-serialization.html#limitations)
> > > > >> have been resolved.
> > > > >>
> > > > >> Special thanks to @ash for his continuous guidance and
> > contributions.
> > > > >>
> > > > >> Also a special mention to Anita Fronczak and Zhou Fang for their
> > > > >> contributions along the way.
> > > > >>
> > > > >> The next step is to remove SimpleDag representation in the
> Scheduler
> > > and
> > > > >> replace it with Serialized DAG (WIP PR:
> > > > >> https://github.com/apache/airflow/pull/7694)
> > > > >>
> > > > >> *Advantages*:
> > > > >>
> > > > >>   - *Reduction in Webserver startup time* for large number of
> DAGs.
> > > > >>   Without DAG Serialization all the DAGs are loaded in the DagBag
> > > during
> > > > >> the
> > > > >>   Webserver startup. With DAG Serialization, an empty DagBag is
> > > created
> > > > >> and
> > > > >>   Dags are loaded from DB only when needed (i.e. when a particular
> > DAG
> > > > is
> > > > >>   clicked on in the home page)
> > > > >>   - *No DAG Parsing / Consistency*: Webserver would load DAGs from
> > DB
> > > > and
> > > > >>   won't even need the DAG Files when DAG Serialization is turned
> on.
> > > > DAGs
> > > > >> are
> > > > >>   parsed, serialized and stored in DB by the Scheduler.
> > > > >>   - Rendered Templates for TasksInstances that have already run
> will
> > > now
> > > > >>   correctly display their value which was true at the time of the
> > run
> > > > >> instead
> > > > >>   of the current value.
> > > > >>   - Paves way for* DAG Versioning* (more details on it when I
> > create a
> > > > >>   separate AIP / update an existing AIP for it) and *Scheduler HA
> > > > *(AIP-15
> > > > >>   <
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=103092651
> > > > >>>
> > > > >>   ).
> > > > >>
> > > > >>
> > > > >> I will create new JIRA issues for further steps with DAG
> > Serialization
> > > > and
> > > > >> DAG Versioning and would discuss them in our next
> > > sig-dag-serialization
> > > > >> call (later this month).
> > > > >>
> > > > >> Regards,
> > > > >> Kaxil
> > > > >>
> > > >
> > > >
> > >
> >
>

Re: Stateless Webserver with DAG Serialization

Posted by "Driesprong, Fokko" <fo...@driesprong.frl>.
Thanks for the update Kaxil, this is awesome! 💯

Cheers, Fokko

Op za 14 mrt. 2020 om 13:40 schreef Tomasz Urbaszek <tu...@apache.org>:

> Thanks Kaxil, Ash, Anita and Zhou! This is a big step forward :)
>
> T.
>
> On Sat, Mar 14, 2020 at 12:03 PM Sumit Maheshwari <su...@gmail.com>
> wrote:
>
> > Great stuff!! Thanks to everyone who contributed 👏
> >
> > On Sat, Mar 14, 2020 at 2:26 PM Deng Xiaodong <xd...@gmail.com>
> wrote:
> >
> > > It is really great news and would be a big plus.
> > >
> > > Thanks Kaxil, Ash, Anita, and Zhou Fang!
> > >
> > >
> > > XD
> > >
> > > > On 14 Mar 2020, at 2:31 AM, Kevin Yang <yr...@gmail.com> wrote:
> > > >
> > > > This is thrilling news! Completely upleveling the experience using
> and
> > > > maintaining the webserver. Thank you so much everyone who contributed
> > to
> > > > this initiative!
> > > >
> > > >
> > > > Cheers,
> > > > Kevin Y
> > > >
> > > > On Fri, Mar 13, 2020 at 6:24 PM Kaxil Naik <ka...@gmail.com>
> > wrote:
> > > >
> > > >> Hi all,
> > > >>
> > > >> Happy to tell you all that we have completed the first phase of DAG
> > > >> Serialisation i.e. the Webserver is stateless and can now run
> without
> > > >> access to DAG Files.
> > > >>
> > > >> The 2 limitations we had in 1.10.7-1.10.9 (
> > > >>
> > >
> >
> https://airflow.apache.org/docs/1.10.7/dag-serialization.html#limitations)
> > > >> have been resolved.
> > > >>
> > > >> Special thanks to @ash for his continuous guidance and
> contributions.
> > > >>
> > > >> Also a special mention to Anita Fronczak and Zhou Fang for their
> > > >> contributions along the way.
> > > >>
> > > >> The next step is to remove SimpleDag representation in the Scheduler
> > and
> > > >> replace it with Serialized DAG (WIP PR:
> > > >> https://github.com/apache/airflow/pull/7694)
> > > >>
> > > >> *Advantages*:
> > > >>
> > > >>   - *Reduction in Webserver startup time* for large number of DAGs.
> > > >>   Without DAG Serialization all the DAGs are loaded in the DagBag
> > during
> > > >> the
> > > >>   Webserver startup. With DAG Serialization, an empty DagBag is
> > created
> > > >> and
> > > >>   Dags are loaded from DB only when needed (i.e. when a particular
> DAG
> > > is
> > > >>   clicked on in the home page)
> > > >>   - *No DAG Parsing / Consistency*: Webserver would load DAGs from
> DB
> > > and
> > > >>   won't even need the DAG Files when DAG Serialization is turned on.
> > > DAGs
> > > >> are
> > > >>   parsed, serialized and stored in DB by the Scheduler.
> > > >>   - Rendered Templates for TasksInstances that have already run will
> > now
> > > >>   correctly display their value which was true at the time of the
> run
> > > >> instead
> > > >>   of the current value.
> > > >>   - Paves way for* DAG Versioning* (more details on it when I
> create a
> > > >>   separate AIP / update an existing AIP for it) and *Scheduler HA
> > > *(AIP-15
> > > >>   <
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=103092651
> > > >>>
> > > >>   ).
> > > >>
> > > >>
> > > >> I will create new JIRA issues for further steps with DAG
> Serialization
> > > and
> > > >> DAG Versioning and would discuss them in our next
> > sig-dag-serialization
> > > >> call (later this month).
> > > >>
> > > >> Regards,
> > > >> Kaxil
> > > >>
> > >
> > >
> >
>

Re: Stateless Webserver with DAG Serialization

Posted by Tomasz Urbaszek <tu...@apache.org>.
Thanks Kaxil, Ash, Anita and Zhou! This is a big step forward :)

T.

On Sat, Mar 14, 2020 at 12:03 PM Sumit Maheshwari <su...@gmail.com>
wrote:

> Great stuff!! Thanks to everyone who contributed 👏
>
> On Sat, Mar 14, 2020 at 2:26 PM Deng Xiaodong <xd...@gmail.com> wrote:
>
> > It is really great news and would be a big plus.
> >
> > Thanks Kaxil, Ash, Anita, and Zhou Fang!
> >
> >
> > XD
> >
> > > On 14 Mar 2020, at 2:31 AM, Kevin Yang <yr...@gmail.com> wrote:
> > >
> > > This is thrilling news! Completely upleveling the experience using and
> > > maintaining the webserver. Thank you so much everyone who contributed
> to
> > > this initiative!
> > >
> > >
> > > Cheers,
> > > Kevin Y
> > >
> > > On Fri, Mar 13, 2020 at 6:24 PM Kaxil Naik <ka...@gmail.com>
> wrote:
> > >
> > >> Hi all,
> > >>
> > >> Happy to tell you all that we have completed the first phase of DAG
> > >> Serialisation i.e. the Webserver is stateless and can now run without
> > >> access to DAG Files.
> > >>
> > >> The 2 limitations we had in 1.10.7-1.10.9 (
> > >>
> >
> https://airflow.apache.org/docs/1.10.7/dag-serialization.html#limitations)
> > >> have been resolved.
> > >>
> > >> Special thanks to @ash for his continuous guidance and contributions.
> > >>
> > >> Also a special mention to Anita Fronczak and Zhou Fang for their
> > >> contributions along the way.
> > >>
> > >> The next step is to remove SimpleDag representation in the Scheduler
> and
> > >> replace it with Serialized DAG (WIP PR:
> > >> https://github.com/apache/airflow/pull/7694)
> > >>
> > >> *Advantages*:
> > >>
> > >>   - *Reduction in Webserver startup time* for large number of DAGs.
> > >>   Without DAG Serialization all the DAGs are loaded in the DagBag
> during
> > >> the
> > >>   Webserver startup. With DAG Serialization, an empty DagBag is
> created
> > >> and
> > >>   Dags are loaded from DB only when needed (i.e. when a particular DAG
> > is
> > >>   clicked on in the home page)
> > >>   - *No DAG Parsing / Consistency*: Webserver would load DAGs from DB
> > and
> > >>   won't even need the DAG Files when DAG Serialization is turned on.
> > DAGs
> > >> are
> > >>   parsed, serialized and stored in DB by the Scheduler.
> > >>   - Rendered Templates for TasksInstances that have already run will
> now
> > >>   correctly display their value which was true at the time of the run
> > >> instead
> > >>   of the current value.
> > >>   - Paves way for* DAG Versioning* (more details on it when I create a
> > >>   separate AIP / update an existing AIP for it) and *Scheduler HA
> > *(AIP-15
> > >>   <
> > >>
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=103092651
> > >>>
> > >>   ).
> > >>
> > >>
> > >> I will create new JIRA issues for further steps with DAG Serialization
> > and
> > >> DAG Versioning and would discuss them in our next
> sig-dag-serialization
> > >> call (later this month).
> > >>
> > >> Regards,
> > >> Kaxil
> > >>
> >
> >
>

Re: Stateless Webserver with DAG Serialization

Posted by Sumit Maheshwari <su...@gmail.com>.
Great stuff!! Thanks to everyone who contributed 👏

On Sat, Mar 14, 2020 at 2:26 PM Deng Xiaodong <xd...@gmail.com> wrote:

> It is really great news and would be a big plus.
>
> Thanks Kaxil, Ash, Anita, and Zhou Fang!
>
>
> XD
>
> > On 14 Mar 2020, at 2:31 AM, Kevin Yang <yr...@gmail.com> wrote:
> >
> > This is thrilling news! Completely upleveling the experience using and
> > maintaining the webserver. Thank you so much everyone who contributed to
> > this initiative!
> >
> >
> > Cheers,
> > Kevin Y
> >
> > On Fri, Mar 13, 2020 at 6:24 PM Kaxil Naik <ka...@gmail.com> wrote:
> >
> >> Hi all,
> >>
> >> Happy to tell you all that we have completed the first phase of DAG
> >> Serialisation i.e. the Webserver is stateless and can now run without
> >> access to DAG Files.
> >>
> >> The 2 limitations we had in 1.10.7-1.10.9 (
> >>
> https://airflow.apache.org/docs/1.10.7/dag-serialization.html#limitations)
> >> have been resolved.
> >>
> >> Special thanks to @ash for his continuous guidance and contributions.
> >>
> >> Also a special mention to Anita Fronczak and Zhou Fang for their
> >> contributions along the way.
> >>
> >> The next step is to remove SimpleDag representation in the Scheduler and
> >> replace it with Serialized DAG (WIP PR:
> >> https://github.com/apache/airflow/pull/7694)
> >>
> >> *Advantages*:
> >>
> >>   - *Reduction in Webserver startup time* for large number of DAGs.
> >>   Without DAG Serialization all the DAGs are loaded in the DagBag during
> >> the
> >>   Webserver startup. With DAG Serialization, an empty DagBag is created
> >> and
> >>   Dags are loaded from DB only when needed (i.e. when a particular DAG
> is
> >>   clicked on in the home page)
> >>   - *No DAG Parsing / Consistency*: Webserver would load DAGs from DB
> and
> >>   won't even need the DAG Files when DAG Serialization is turned on.
> DAGs
> >> are
> >>   parsed, serialized and stored in DB by the Scheduler.
> >>   - Rendered Templates for TasksInstances that have already run will now
> >>   correctly display their value which was true at the time of the run
> >> instead
> >>   of the current value.
> >>   - Paves way for* DAG Versioning* (more details on it when I create a
> >>   separate AIP / update an existing AIP for it) and *Scheduler HA
> *(AIP-15
> >>   <
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=103092651
> >>>
> >>   ).
> >>
> >>
> >> I will create new JIRA issues for further steps with DAG Serialization
> and
> >> DAG Versioning and would discuss them in our next sig-dag-serialization
> >> call (later this month).
> >>
> >> Regards,
> >> Kaxil
> >>
>
>

Re: Stateless Webserver with DAG Serialization

Posted by Deng Xiaodong <xd...@gmail.com>.
It is really great news and would be a big plus.

Thanks Kaxil, Ash, Anita, and Zhou Fang!


XD

> On 14 Mar 2020, at 2:31 AM, Kevin Yang <yr...@gmail.com> wrote:
> 
> This is thrilling news! Completely upleveling the experience using and
> maintaining the webserver. Thank you so much everyone who contributed to
> this initiative!
> 
> 
> Cheers,
> Kevin Y
> 
> On Fri, Mar 13, 2020 at 6:24 PM Kaxil Naik <ka...@gmail.com> wrote:
> 
>> Hi all,
>> 
>> Happy to tell you all that we have completed the first phase of DAG
>> Serialisation i.e. the Webserver is stateless and can now run without
>> access to DAG Files.
>> 
>> The 2 limitations we had in 1.10.7-1.10.9 (
>> https://airflow.apache.org/docs/1.10.7/dag-serialization.html#limitations)
>> have been resolved.
>> 
>> Special thanks to @ash for his continuous guidance and contributions.
>> 
>> Also a special mention to Anita Fronczak and Zhou Fang for their
>> contributions along the way.
>> 
>> The next step is to remove SimpleDag representation in the Scheduler and
>> replace it with Serialized DAG (WIP PR:
>> https://github.com/apache/airflow/pull/7694)
>> 
>> *Advantages*:
>> 
>>   - *Reduction in Webserver startup time* for large number of DAGs.
>>   Without DAG Serialization all the DAGs are loaded in the DagBag during
>> the
>>   Webserver startup. With DAG Serialization, an empty DagBag is created
>> and
>>   Dags are loaded from DB only when needed (i.e. when a particular DAG is
>>   clicked on in the home page)
>>   - *No DAG Parsing / Consistency*: Webserver would load DAGs from DB and
>>   won't even need the DAG Files when DAG Serialization is turned on. DAGs
>> are
>>   parsed, serialized and stored in DB by the Scheduler.
>>   - Rendered Templates for TasksInstances that have already run will now
>>   correctly display their value which was true at the time of the run
>> instead
>>   of the current value.
>>   - Paves way for* DAG Versioning* (more details on it when I create a
>>   separate AIP / update an existing AIP for it) and *Scheduler HA *(AIP-15
>>   <
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=103092651
>>> 
>>   ).
>> 
>> 
>> I will create new JIRA issues for further steps with DAG Serialization and
>> DAG Versioning and would discuss them in our next sig-dag-serialization
>> call (later this month).
>> 
>> Regards,
>> Kaxil
>> 


Re: Stateless Webserver with DAG Serialization

Posted by Jarek Potiuk <Ja...@polidea.com>.
This is great ! I presume we will be cherry-picking it to 1.10.10 as well ?
Do you need help with that? Testing/cherry-picking itself? Just let me know!

J,


On Sat, Mar 14, 2020 at 2:31 AM Kevin Yang <yr...@gmail.com> wrote:

> This is thrilling news! Completely upleveling the experience using and
> maintaining the webserver. Thank you so much everyone who contributed to
> this initiative!
>
>
> Cheers,
> Kevin Y
>
> On Fri, Mar 13, 2020 at 6:24 PM Kaxil Naik <ka...@gmail.com> wrote:
>
> > Hi all,
> >
> > Happy to tell you all that we have completed the first phase of DAG
> > Serialisation i.e. the Webserver is stateless and can now run without
> > access to DAG Files.
> >
> > The 2 limitations we had in 1.10.7-1.10.9 (
> >
> https://airflow.apache.org/docs/1.10.7/dag-serialization.html#limitations)
> > have been resolved.
> >
> > Special thanks to @ash for his continuous guidance and contributions.
> >
> > Also a special mention to Anita Fronczak and Zhou Fang for their
> > contributions along the way.
> >
> > The next step is to remove SimpleDag representation in the Scheduler and
> > replace it with Serialized DAG (WIP PR:
> > https://github.com/apache/airflow/pull/7694)
> >
> > *Advantages*:
> >
> >    - *Reduction in Webserver startup time* for large number of DAGs.
> >    Without DAG Serialization all the DAGs are loaded in the DagBag during
> > the
> >    Webserver startup. With DAG Serialization, an empty DagBag is created
> > and
> >    Dags are loaded from DB only when needed (i.e. when a particular DAG
> is
> >    clicked on in the home page)
> >    - *No DAG Parsing / Consistency*: Webserver would load DAGs from DB
> and
> >    won't even need the DAG Files when DAG Serialization is turned on.
> DAGs
> > are
> >    parsed, serialized and stored in DB by the Scheduler.
> >    - Rendered Templates for TasksInstances that have already run will now
> >    correctly display their value which was true at the time of the run
> > instead
> >    of the current value.
> >    - Paves way for* DAG Versioning* (more details on it when I create a
> >    separate AIP / update an existing AIP for it) and *Scheduler HA
> *(AIP-15
> >    <
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=103092651
> > >
> >    ).
> >
> >
> > I will create new JIRA issues for further steps with DAG Serialization
> and
> > DAG Versioning and would discuss them in our next sig-dag-serialization
> > call (later this month).
> >
> > Regards,
> > Kaxil
> >
>


-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Re: Stateless Webserver with DAG Serialization

Posted by Kevin Yang <yr...@gmail.com>.
This is thrilling news! Completely upleveling the experience using and
maintaining the webserver. Thank you so much everyone who contributed to
this initiative!


Cheers,
Kevin Y

On Fri, Mar 13, 2020 at 6:24 PM Kaxil Naik <ka...@gmail.com> wrote:

> Hi all,
>
> Happy to tell you all that we have completed the first phase of DAG
> Serialisation i.e. the Webserver is stateless and can now run without
> access to DAG Files.
>
> The 2 limitations we had in 1.10.7-1.10.9 (
> https://airflow.apache.org/docs/1.10.7/dag-serialization.html#limitations)
> have been resolved.
>
> Special thanks to @ash for his continuous guidance and contributions.
>
> Also a special mention to Anita Fronczak and Zhou Fang for their
> contributions along the way.
>
> The next step is to remove SimpleDag representation in the Scheduler and
> replace it with Serialized DAG (WIP PR:
> https://github.com/apache/airflow/pull/7694)
>
> *Advantages*:
>
>    - *Reduction in Webserver startup time* for large number of DAGs.
>    Without DAG Serialization all the DAGs are loaded in the DagBag during
> the
>    Webserver startup. With DAG Serialization, an empty DagBag is created
> and
>    Dags are loaded from DB only when needed (i.e. when a particular DAG is
>    clicked on in the home page)
>    - *No DAG Parsing / Consistency*: Webserver would load DAGs from DB and
>    won't even need the DAG Files when DAG Serialization is turned on. DAGs
> are
>    parsed, serialized and stored in DB by the Scheduler.
>    - Rendered Templates for TasksInstances that have already run will now
>    correctly display their value which was true at the time of the run
> instead
>    of the current value.
>    - Paves way for* DAG Versioning* (more details on it when I create a
>    separate AIP / update an existing AIP for it) and *Scheduler HA *(AIP-15
>    <
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=103092651
> >
>    ).
>
>
> I will create new JIRA issues for further steps with DAG Serialization and
> DAG Versioning and would discuss them in our next sig-dag-serialization
> call (later this month).
>
> Regards,
> Kaxil
>