You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "bramvandewalle (via GitHub)" <gi...@apache.org> on 2023/03/10 10:33:10 UTC

[GitHub] [airflow] bramvandewalle opened a new pull request, #30018: Improve the docs for pausing, deactivating and deleting DAGs

bramvandewalle opened a new pull request, #30018:
URL: https://github.com/apache/airflow/pull/30018

   As the title indicates.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #30018: Improve the docs for pausing, deactivating and deleting DAGs

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on PR #30018:
URL: https://github.com/apache/airflow/pull/30018#issuecomment-1463661491

   Not a native speaker so better to add @BasPH who is often looking at those - mostly grammar - corrections.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] github-actions[bot] closed pull request #30018: Improve the docs for pausing, deactivating and deleting DAGs

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] closed pull request #30018: Improve the docs for pausing, deactivating and deleting DAGs
URL: https://github.com/apache/airflow/pull/30018


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] BasPH commented on a diff in pull request #30018: Improve the docs for pausing, deactivating and deleting DAGs

Posted by "BasPH (via GitHub)" <gi...@apache.org>.
BasPH commented on code in PR #30018:
URL: https://github.com/apache/airflow/pull/30018#discussion_r1143215188


##########
docs/apache-airflow/core-concepts/dags.rst:
##########
@@ -842,39 +842,39 @@ the dependency graph.
 The dependency detector is configurable, so you can implement your own logic different than the defaults in
 :class:`~airflow.serialization.serialized_objects.DependencyDetector`
 
-DAG pausing, deactivation and deletion
---------------------------------------
+Pause, deactivate and delete DAGs
+---------------------------------
 
 The DAGs have several states when it comes to being "not running". DAGs can be paused, deactivated
-and finally all metadata for the DAG can be deleted.
+and finally all metadata of the DAGs can be deleted.
 
-Dag can be paused via UI when it is present in the ``DAGS_FOLDER``, and scheduler stored it in
-the database, but the user chose to disable it via the UI. The "pause" and "unpause" actions are available
-via UI and API. Paused DAG is not scheduled by the Scheduler, but you can trigger them via UI for
-manual runs. In the UI, you can see Paused DAGs (in ``Paused`` tab). The DAGs that are un-paused
+A DAG can be paused via the UI when it is present in the ``DAGS_FOLDER``, and when the scheduler is stored in
+the database. However, the user can choose to disable this functionality via the UI. The "pause" and "unpause" actions are available
+via the UI and the API. Paused DAGs will not be scheduled by the Scheduler, but you can trigger them via the UI for
+manual runs. In the UI, you can see the Paused DAGs in the ``Paused`` tab. The DAGs that are not in the "paused" state
 can be found in the ``Active`` tab.
 
-Dag can be deactivated (do not confuse it with ``Active`` tag in the UI) by removing them from the
-``DAGS_FOLDER``. When scheduler parses the ``DAGS_FOLDER`` and misses the DAG that it had seen
-before and stored in the database it will set is as deactivated. The metadata and history of the
-DAG` is kept for deactivated DAGs and when the DAG is re-added to the ``DAGS_FOLDER`` it will be again
-activated and history will be visible. You cannot activate/deactivate DAG via UI or API, this
-can only be done by removing files from the ``DAGS_FOLDER``. Once again - no data for historical runs of the
-DAG are lost when it is deactivated by the scheduler. Note that the ``Active`` tab in Airflow UI
-refers to DAGs that are not both ``Activated`` and ``Not paused`` so this might initially be a
+DAGs can be deactivated (do not confuse it with the ``Active`` tag in the UI) by removing them from the
+``DAGS_FOLDER``. When the scheduler parses the ``DAGS_FOLDER`` and misses a DAG that it had seen
+before and stored in the database, it will set it as deactivated. The metadata and history of the
+DAG is kept for deactivated DAGs and when the DAG is re-added to the ``DAGS_FOLDER`` it will be reactivated
+and the history will be visible. You cannot activate/deactivate DAGs via the UI or the API.
+It can only be done by removing files from the ``DAGS_FOLDER``. Once again, no data for historical runs of the
+DAG are lost when it is deactivated by the scheduler. Note that the ``Active`` tab in the Airflow UI
+refers to the DAGs that are not both ``Activated`` and ``Not paused``. So this might initially be a
 little confusing.
 
-You can't see the deactivated DAGs in the UI - you can sometimes see the historical runs, but when you try to
-see the information about those you will see the error that the DAG is missing.
+You can't see the deactivated DAGs in the UI. You can sometimes see the historical runs, but when you try to
+see the information about those, you will see an error indicating that the DAG is missing.
 
-You can also delete the DAG metadata from the metadata database using UI or API, but it does not
-always result in disappearing of the DAG from the UI - which might be also initially a bit confusing.
+You can also delete the DAG's metadata from the metadata database using the UI or the API, but it does not
+always result in the DAG disappearing from the UI - which might be a bit confusing at first.
 If the DAG is still in ``DAGS_FOLDER`` when you delete the metadata, the DAG will re-appear as
-Scheduler will parse the folder, only historical runs information for the DAG will be removed.
+the Scheduler will parse the folder. Only the information of historical runs for the DAG will be removed.
 
-This all means that if you want to actually delete a DAG and its all historical metadata, you need to do
+This all means that if you want to actually delete a DAG and all of its historical metadata, you need to do
 it in three steps:
 
-* pause the DAG
-* delete the historical metadata from the database, via UI or API
-* delete the DAG file from the ``DAGS_FOLDER`` and wait until it becomes inactive
+* Pause the DAG
+* Delete the historical metadata from the database, via the UI or the API
+* Delete the DAG's source file from the ``DAGS_FOLDER`` and wait until it becomes inactive

Review Comment:
   Yes, please update this PRđź‘Ť



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] github-actions[bot] commented on pull request #30018: Improve the docs for pausing, deactivating and deleting DAGs

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #30018:
URL: https://github.com/apache/airflow/pull/30018#issuecomment-1536927269

   This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] BasPH commented on a diff in pull request #30018: Improve the docs for pausing, deactivating and deleting DAGs

Posted by "BasPH (via GitHub)" <gi...@apache.org>.
BasPH commented on code in PR #30018:
URL: https://github.com/apache/airflow/pull/30018#discussion_r1141913598


##########
docs/apache-airflow/core-concepts/dags.rst:
##########
@@ -842,39 +842,39 @@ the dependency graph.
 The dependency detector is configurable, so you can implement your own logic different than the defaults in
 :class:`~airflow.serialization.serialized_objects.DependencyDetector`
 
-DAG pausing, deactivation and deletion
---------------------------------------
+Pause, deactivate and delete DAGs
+---------------------------------
 
 The DAGs have several states when it comes to being "not running". DAGs can be paused, deactivated
-and finally all metadata for the DAG can be deleted.
+and finally all metadata of the DAGs can be deleted.
 
-Dag can be paused via UI when it is present in the ``DAGS_FOLDER``, and scheduler stored it in
-the database, but the user chose to disable it via the UI. The "pause" and "unpause" actions are available
-via UI and API. Paused DAG is not scheduled by the Scheduler, but you can trigger them via UI for
-manual runs. In the UI, you can see Paused DAGs (in ``Paused`` tab). The DAGs that are un-paused
+A DAG can be paused via the UI when it is present in the ``DAGS_FOLDER``, and when the scheduler is stored in
+the database. However, the user can choose to disable this functionality via the UI. The "pause" and "unpause" actions are available
+via the UI and the API. Paused DAGs will not be scheduled by the Scheduler, but you can trigger them via the UI for
+manual runs. In the UI, you can see the Paused DAGs in the ``Paused`` tab. The DAGs that are not in the "paused" state
 can be found in the ``Active`` tab.
 
-Dag can be deactivated (do not confuse it with ``Active`` tag in the UI) by removing them from the
-``DAGS_FOLDER``. When scheduler parses the ``DAGS_FOLDER`` and misses the DAG that it had seen
-before and stored in the database it will set is as deactivated. The metadata and history of the
-DAG` is kept for deactivated DAGs and when the DAG is re-added to the ``DAGS_FOLDER`` it will be again
-activated and history will be visible. You cannot activate/deactivate DAG via UI or API, this
-can only be done by removing files from the ``DAGS_FOLDER``. Once again - no data for historical runs of the
-DAG are lost when it is deactivated by the scheduler. Note that the ``Active`` tab in Airflow UI
-refers to DAGs that are not both ``Activated`` and ``Not paused`` so this might initially be a
+DAGs can be deactivated (do not confuse it with the ``Active`` tag in the UI) by removing them from the
+``DAGS_FOLDER``. When the scheduler parses the ``DAGS_FOLDER`` and misses a DAG that it had seen
+before and stored in the database, it will set it as deactivated. The metadata and history of the
+DAG is kept for deactivated DAGs and when the DAG is re-added to the ``DAGS_FOLDER`` it will be reactivated
+and the history will be visible. You cannot activate/deactivate DAGs via the UI or the API.
+It can only be done by removing files from the ``DAGS_FOLDER``. Once again, no data for historical runs of the
+DAG are lost when it is deactivated by the scheduler. Note that the ``Active`` tab in the Airflow UI
+refers to the DAGs that are not both ``Activated`` and ``Not paused``. So this might initially be a
 little confusing.
 
-You can't see the deactivated DAGs in the UI - you can sometimes see the historical runs, but when you try to
-see the information about those you will see the error that the DAG is missing.
+You can't see the deactivated DAGs in the UI. You can sometimes see the historical runs, but when you try to
+see the information about those, you will see an error indicating that the DAG is missing.
 
-You can also delete the DAG metadata from the metadata database using UI or API, but it does not
-always result in disappearing of the DAG from the UI - which might be also initially a bit confusing.
+You can also delete the DAG's metadata from the metadata database using the UI or the API, but it does not
+always result in the DAG disappearing from the UI - which might be a bit confusing at first.
 If the DAG is still in ``DAGS_FOLDER`` when you delete the metadata, the DAG will re-appear as
-Scheduler will parse the folder, only historical runs information for the DAG will be removed.
+the Scheduler will parse the folder. Only the information of historical runs for the DAG will be removed.
 
-This all means that if you want to actually delete a DAG and its all historical metadata, you need to do
+This all means that if you want to actually delete a DAG and all of its historical metadata, you need to do
 it in three steps:
 
-* pause the DAG
-* delete the historical metadata from the database, via UI or API
-* delete the DAG file from the ``DAGS_FOLDER`` and wait until it becomes inactive
+* Pause the DAG
+* Delete the historical metadata from the database, via the UI or the API
+* Delete the DAG's source file from the ``DAGS_FOLDER`` and wait until it becomes inactive

Review Comment:
   Ooph this is a really dense and incoherent paragraph. I'd rather rewrite the whole thing instead of fixing smaller typos. For example:
   
   ```rst
   Pausing a DAG
   -------------
   
   A DAG can be (un)paused in the main view of Airflow:
   ```
   
   ![image](https://user-images.githubusercontent.com/6249654/226296973-cc48a39d-65cd-45fa-9599-6d827841838e.png)
   
   ```rst
   A paused DAG will not run on its defined schedule, but will run when it is triggered manually. When a paused
   DAG is unpaused, missed scheduled runs during the period in which the DAG was paused will be run when
   ``catchup`` is set to ``True``.
   
   Deleting a DAG
   --------------
   
   A DAG can be removed by deleting the corresponding file. The DAG will disappear from the UI. However, DAG
   state/metadata such as task history is preserved in the database. When you add back the DAG file, all DAG
   state/metadata is available again.
   
   To also remove a DAG's state and metadata, press the "Delete DAG" button in the UI, or call the
   ``DELETE /dags/{dag_id}`` REST API endpoint:
   ```
   
   ![image](https://user-images.githubusercontent.com/6249654/226305506-58851037-1a5d-46a1-a666-96c08f8c687b.png)
   
   ```rst
   Deleting a DAG will remove all DAG state and metadata. Note that task log metadata is always preserved.
   ```
   
   A few reasons why I think the original paragraph should be rewritten:
   
   - The whole paragraph seems written for a developer of the Airflow library, not an average Airflow DAG writer. Docs should focus on the latter.
   - Stating things like "the user can choose to disable this functionality via the UI" should be backed up by a screenshot or explanation of where to find such functionality.
   - The higher the number of concepts introduced, the more difficult a piece of text becomes to understand. Therefore, I'd only use the terminology "pausing" and "deleting" DAGs, and not "deactivating".
   - I suggest splitting the paragraph into smaller paragraphs. Pausing and deleting DAGs are two different activities and mixing those in a single paragraph makes it harder to understand.
   
   @bramvandewalle WDYT?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] bramvandewalle commented on a diff in pull request #30018: Improve the docs for pausing, deactivating and deleting DAGs

Posted by "bramvandewalle (via GitHub)" <gi...@apache.org>.
bramvandewalle commented on code in PR #30018:
URL: https://github.com/apache/airflow/pull/30018#discussion_r1141948959


##########
docs/apache-airflow/core-concepts/dags.rst:
##########
@@ -842,39 +842,39 @@ the dependency graph.
 The dependency detector is configurable, so you can implement your own logic different than the defaults in
 :class:`~airflow.serialization.serialized_objects.DependencyDetector`
 
-DAG pausing, deactivation and deletion
---------------------------------------
+Pause, deactivate and delete DAGs
+---------------------------------
 
 The DAGs have several states when it comes to being "not running". DAGs can be paused, deactivated
-and finally all metadata for the DAG can be deleted.
+and finally all metadata of the DAGs can be deleted.
 
-Dag can be paused via UI when it is present in the ``DAGS_FOLDER``, and scheduler stored it in
-the database, but the user chose to disable it via the UI. The "pause" and "unpause" actions are available
-via UI and API. Paused DAG is not scheduled by the Scheduler, but you can trigger them via UI for
-manual runs. In the UI, you can see Paused DAGs (in ``Paused`` tab). The DAGs that are un-paused
+A DAG can be paused via the UI when it is present in the ``DAGS_FOLDER``, and when the scheduler is stored in
+the database. However, the user can choose to disable this functionality via the UI. The "pause" and "unpause" actions are available
+via the UI and the API. Paused DAGs will not be scheduled by the Scheduler, but you can trigger them via the UI for
+manual runs. In the UI, you can see the Paused DAGs in the ``Paused`` tab. The DAGs that are not in the "paused" state
 can be found in the ``Active`` tab.
 
-Dag can be deactivated (do not confuse it with ``Active`` tag in the UI) by removing them from the
-``DAGS_FOLDER``. When scheduler parses the ``DAGS_FOLDER`` and misses the DAG that it had seen
-before and stored in the database it will set is as deactivated. The metadata and history of the
-DAG` is kept for deactivated DAGs and when the DAG is re-added to the ``DAGS_FOLDER`` it will be again
-activated and history will be visible. You cannot activate/deactivate DAG via UI or API, this
-can only be done by removing files from the ``DAGS_FOLDER``. Once again - no data for historical runs of the
-DAG are lost when it is deactivated by the scheduler. Note that the ``Active`` tab in Airflow UI
-refers to DAGs that are not both ``Activated`` and ``Not paused`` so this might initially be a
+DAGs can be deactivated (do not confuse it with the ``Active`` tag in the UI) by removing them from the
+``DAGS_FOLDER``. When the scheduler parses the ``DAGS_FOLDER`` and misses a DAG that it had seen
+before and stored in the database, it will set it as deactivated. The metadata and history of the
+DAG is kept for deactivated DAGs and when the DAG is re-added to the ``DAGS_FOLDER`` it will be reactivated
+and the history will be visible. You cannot activate/deactivate DAGs via the UI or the API.
+It can only be done by removing files from the ``DAGS_FOLDER``. Once again, no data for historical runs of the
+DAG are lost when it is deactivated by the scheduler. Note that the ``Active`` tab in the Airflow UI
+refers to the DAGs that are not both ``Activated`` and ``Not paused``. So this might initially be a
 little confusing.
 
-You can't see the deactivated DAGs in the UI - you can sometimes see the historical runs, but when you try to
-see the information about those you will see the error that the DAG is missing.
+You can't see the deactivated DAGs in the UI. You can sometimes see the historical runs, but when you try to
+see the information about those, you will see an error indicating that the DAG is missing.
 
-You can also delete the DAG metadata from the metadata database using UI or API, but it does not
-always result in disappearing of the DAG from the UI - which might be also initially a bit confusing.
+You can also delete the DAG's metadata from the metadata database using the UI or the API, but it does not
+always result in the DAG disappearing from the UI - which might be a bit confusing at first.
 If the DAG is still in ``DAGS_FOLDER`` when you delete the metadata, the DAG will re-appear as
-Scheduler will parse the folder, only historical runs information for the DAG will be removed.
+the Scheduler will parse the folder. Only the information of historical runs for the DAG will be removed.
 
-This all means that if you want to actually delete a DAG and its all historical metadata, you need to do
+This all means that if you want to actually delete a DAG and all of its historical metadata, you need to do
 it in three steps:
 
-* pause the DAG
-* delete the historical metadata from the database, via UI or API
-* delete the DAG file from the ``DAGS_FOLDER`` and wait until it becomes inactive
+* Pause the DAG
+* Delete the historical metadata from the database, via the UI or the API
+* Delete the DAG's source file from the ``DAGS_FOLDER`` and wait until it becomes inactive

Review Comment:
   I think you're completely right. I'm not very familiar (yet) with Airflow and just wanted to make the paragraph more readable. However, what you're proposing is even much better.
   
   To go forward with it. Do I change this PR? Or will you make a separate PR and we close this one?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org