You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/08/14 14:29:20 UTC

[GitHub] [airflow] potiuk commented on issue #7911: Add data retention policy to Airflow

potiuk commented on issue #7911:
URL: https://github.com/apache/airflow/issues/7911#issuecomment-898901416


   > If I get some guidance on how/where to start I could try to do it
   
   I think good start is to take a look at the - quite popular - maintenance dags here: https://github.com/teamclairvoyant/airflow-maintenance-dags  - this is a set of 3rd-party maintenance DAGs that people are using for some kind of maintenance (`db-cleanup`). We do not know how "correct it is" and how well it copes with the new Airflow versions, but It can give an idea on how users deal with it.
   
   I think that might be a good idea to start from that and work out an approach (other than DAGs) implementing something like that in airlfow  as periodic Job  - especially that long term plans will be to not allow tasks to talk to the DB directly, the DAG-approach would not work in this case.
   
   I think personally this should start with at least discussion in the devlist or (maybe even better) a new AIP (Airflow Improvement Proposal - https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals ) as result of this discussion.
   
   I think there are many ways it can be done, but it needs some proposal and quite extensive discussion (on performance consequence, where should such cleanup be running, whether it should be a separate process or should it run within scheduler, how to deal with multiple-schedulers if we choose scheduler-embedded solution, etc. etc. It's actually quite an extensive one
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org