You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/03/02 16:49:32 UTC

[GitHub] [airflow] ephraimbuddy commented on a change in pull request #21879: Add docs re upgrade / downgrade

ephraimbuddy commented on a change in pull request #21879:
URL: https://github.com/apache/airflow/pull/21879#discussion_r817893253



##########
File path: docs/apache-airflow/usage-cli.rst
##########
@@ -199,3 +199,63 @@ Both ``json`` and ``yaml`` formats make it easier to manipulate the data using c
     "sd": "2020-11-29T14:53:56.931243+00:00",
     "ed": "2020-11-29T14:53:57.126306+00:00"
   }
+
+.. _cli-db-clean:
+
+Purge history from metadata database
+------------------------------------
+
+.. note::
+
+  It's strongly recommended that you backup the metadata database before running the ``db clean`` command.
+
+The ``db clean`` command works by deleting from each table the records older than the provided ``--clean-before-timestamp``.
+
+You can optionally provide a list of tables to perform deletes on. If no list of tables is supplied, all tables will be included.
+
+You can use the ``--dry-run`` option to print the row counts in the primary tables to be cleaned.
+
+Beware cascading deletes
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+Keep in mind that some tables have foreign key relationships defined with ``ON DELETE CASCADE`` so deletes in one table may trigger deletes in others.  For example, the ``task_instance`` table keys to the ``dag_run`` table, so if a DagRun record is deleted, all of its associated task instances will also be deleted.
+
+Special handling for DAG runs
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Commonly, Airflow determines which DagRun to run next by looking up the latest DagRun.  If you delete all DAG runs, Airflow may schedule an old DAG run that was already completed, e.g. if you have set ``catchup=True``.  So the ``db clean`` command will preserve the latest non-manually-triggered DAG run to preserve continuity in scheduling.
+
+Considerations for backfillable DAGs
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Not all DAGs are designed for use with Airflow's backfill command.  But for those which are, special care is warranted.  If you delete DAG runs, and if you run backfill over a range of dates that includes the deleted DAG runs, those runs will be recreated and run again.  For this reason, if you have DAGs that fall into this category you may want to refrain from deleting DAG runs and only clean other large tables such as task instance and log etc.
+
+.. _cli-db-upgrade:
+
+Upgrading Airflow
+-----------------
+
+Run ``airflow db upgrade --help`` for usage details.
+
+Running migrations manually
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+If desired, you can generate the sql statements for an upgrade and apply each upgrade migration manually, one at a time.  To do so use the ``--revision-range`` option with ``db upgrade``.  Do *not* skip running the Alembic revision id update commands; this is how Airflow will know where you are upgrading from the next time you need to.  See :doc:`/migrations-ref.rst` for a mapping between revision and version.

Review comment:
       ```suggestion
   If desired, you can generate the sql statements for an upgrade and apply each upgrade migration manually, one at a time.  To do so use the migration revision range ``--revision-range`` or version range ``--range`` option with ``db upgrade``.  Do *not* skip running the Alembic revision id update commands; this is how Airflow will know where you are upgrading from the next time you need to.  See :doc:`/migrations-ref.rst` for a mapping between revision and version.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org