You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/01/16 23:39:34 UTC

[GitHub] [airflow] kaxil opened a new pull request #13722: Update DAG Serialization docs

kaxil opened a new pull request #13722:
URL: https://github.com/apache/airflow/pull/13722


   - Updated the figure to show Scheduler uses Serialized DAGs (also added in https://cwiki.apache.org/confluence/display/AIRFLOW/Drawio+Diagrams)
   - And updated the description
   
   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ryanahamilton commented on a change in pull request #13722: Update DAG Serialization docs

Posted by GitBox <gi...@apache.org>.
ryanahamilton commented on a change in pull request #13722:
URL: https://github.com/apache/airflow/pull/13722#discussion_r559662014



##########
File path: docs/apache-airflow/dag-serialization.rst
##########
@@ -33,25 +34,28 @@ With **DAG Serialization** we aim to decouple the webserver from DAG parsing
 which would make the Webserver very light-weight.
 
 As shown in the image above, when using the this feature,
-the Scheduler parses the DAG files, serializes them in JSON format and saves them in the Metadata DB
-as :class:`airflow.models.serialized_dag.SerializedDagModel` model.
+the :class:`~airflow.jobs.scheduler_job.DagFileProcessorProcess` in the Scheduler
+parses the DAG files, serializes them in JSON format and saves them in the Metadata DB
+as :class:`~airflow.models.serialized_dag.SerializedDagModel` model.
 
 The Webserver now instead of having to parse the DAG file again, reads the
 serialized DAGs in JSON, de-serializes them and create the DagBag and uses it
-to show in the UI.
+to show in the UI. And  the Scheduler does not need the actual DAG for making Scheduling decisions,

Review comment:
       ```suggestion
   to show in the UI. And the Scheduler does not need the actual DAG for making Scheduling decisions,
   ```

##########
File path: docs/apache-airflow/dag-serialization.rst
##########
@@ -33,25 +34,28 @@ With **DAG Serialization** we aim to decouple the webserver from DAG parsing
 which would make the Webserver very light-weight.
 
 As shown in the image above, when using the this feature,
-the Scheduler parses the DAG files, serializes them in JSON format and saves them in the Metadata DB
-as :class:`airflow.models.serialized_dag.SerializedDagModel` model.
+the :class:`~airflow.jobs.scheduler_job.DagFileProcessorProcess` in the Scheduler
+parses the DAG files, serializes them in JSON format and saves them in the Metadata DB
+as :class:`~airflow.models.serialized_dag.SerializedDagModel` model.
 
 The Webserver now instead of having to parse the DAG file again, reads the
 serialized DAGs in JSON, de-serializes them and create the DagBag and uses it
-to show in the UI.
+to show in the UI. And  the Scheduler does not need the actual DAG for making Scheduling decisions,
+instead of using the DAG files, we use Serialized DAGs that contain all the information needing to
+schedule the DAGs from Airflow 2.0.0 (this was done as part of :ref:`Scheduler HA <scheduler:ha>`).
 
 One of the key features that is implemented as the part of DAG Serialization is that
 instead of loading an entire DagBag when the WebServer starts we only load each DAG on demand from the
 Serialized Dag table. This helps reduce Webserver startup time and memory. The reduction is notable
 when you have large number of DAGs.
 
-You can enable the source code to be stored in the database to make it completely independent from DAG files.
+You can enable the source code to be stored in the database to make Webserver completely independent from DAG files.

Review comment:
       ```suggestion
   You can enable the source code to be stored in the database to make the Webserver completely independent of the DAG files.
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil merged pull request #13722: Update DAG Serialization docs

Posted by GitBox <gi...@apache.org>.
kaxil merged pull request #13722:
URL: https://github.com/apache/airflow/pull/13722


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org