You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/09/17 14:26:00 UTC

[GitHub] [airflow] potiuk commented on issue #18317: Better Backfill User Experience

potiuk commented on issue #18317:
URL: https://github.com/apache/airflow/issues/18317#issuecomment-921840479


   Just one comment on that (and a bit of warning) and possibly an explanation to your @thejens surprise and disbelief (which is probably coming from not understanding the full scope of that task and impact it has on the distribute Airflow architecture).
   
   The UI backfill requires a bit more than "simple implementation". This has been discussed several times at the devlist and the problem here is not the UI but "control plane". When you use the CLI, the admin user fully controls and manages the terminal, all the errors and potentially long running process the backfill might be., If you backfill a  lot of data, it can take a lot of time and backfill generaly work in the way that it will sequentially run historical runs. 
   
   Currently, there is no "long running" process in the Wab UI. all what webserver runs are gunicorn processes, that are restarted periodically and none of the worker processes survive across a page refresh.  Webserver is stateless and keep all the state in database, so it can (and will be) restarted at any time.
   
   Backfill is entirely different thing. It has to run sometimes for hours and actively monitor the backfill process, react to failures etc. So running that from the webserver is not the best idea - ideally this should be another component in scheduler or a separate component like triggerer to run the backfill command and the UI should at most be used to trigger it, and display the status. Taking into account that backfill is an "afterthought" - not something that is and should be done on a regular basis - having a separate component to serve that case (where there is a CLI for ad-hoc operations) is not the highest priority.
   
   So in short this task is more of a backend/architecture change than UI , and it's quite a complex piece I think especially if you want to make sure that it works with multiple schedulers for example. 
   
   And yeah - I think it's a useful one, though IMHO it's not "critical" and weighting implementation complexity with the "value" of it (where you have CLI backfil) - it's not at all a surprising for me we do not have it yet. 
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org