You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/06/30 08:20:18 UTC

[GitHub] [airflow] potiuk commented on issue #24745: Airflow DB cleanup is very slow

potiuk commented on issue #24745:
URL: https://github.com/apache/airflow/issues/24745#issuecomment-1170915187

   @rotemseekingalpha - if you feel like it, you can provide a PR for that (but I am afraid this is a very complex change to implement from the UI. The UI is not built in the way that it will handle ALL cases of all problems of database of all sizes. This is just impossible, and entirely not needed to not necessarily complicate the architecture of the UI.
   
   If you would like to introduce asynchronouse delete and polling and long deletion of a DAG with REALLY huge history (like apparently is your case), then it means you have to add a separate process that will control it and run it and UI just triggering the deletion and polling for it. Yes. It is possible, no we might choose not to do it because it unnecessary complicates things and there are other options:
   
   1) you can manually delete the dag history from the DB.
   
   2) (better) in latest Airflow 2.3. you have `airflow db clean` - CLI command whch will purge historical data (records older than a given date) and is a recommended way to keep your database small and snappy. You should upgrade and start using this command instead of the cleanup dags, which (while being helpful in the past and useful have been really replaced by the CLI command). We highly recommend using it. You can run those cleanup scripts in the way you find best (including running them in bash oparetor of airlfow). 
   
   If you feel like it and want to become one of the > 2100 contributors (most - users like you) - you are most welcome to propose such a solution. This is a free software, and anyone can do it. You can do it too, if you feel it will help with your problem and you find the other ways of solving the problem as  not "enough". Many of our users actually contributed something because they felt they should give back for the free software they use and their case was specific enough for their case  but also replicable by others, that they felt the need they can help others by contributing a change. 
   
   But just be warned -  this one might take quite some time as it migh require some architectural decisions and possibly an Airflow Improvement Proposal (https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals) to write.
   
   But if you really think it's useful and needed - you are most welcome if you want to join those more than 2100 contributors.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org