You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/04/01 10:06:44 UTC

[GitHub] [airflow] ms32035 commented on pull request #13199: Create dag dependencies view

ms32035 commented on pull request #13199:
URL: https://github.com/apache/airflow/pull/13199#issuecomment-811802961


   @ashb this is not helpful at all.
   
   - You are not providing any (measurable) arguments as to why implementing an additional structure storing the dependencies, which would add to the scheduler overhead regardless if it is used or not, is a better solution than recalculating a clean state  in memory, with some time based caching, when the functionality is actually used. Maintaining an additional database table also adds a number of edge cases that would require handling and are not an issue for the current approach, such as dependencies pointing to invalid dags (these are texts) or tags or tasks being removed. Furthermore, there is no reason the scheduler needs to know/use these dependencies at all. Also a tricky question here. The same supposedly heavy `collect_dags_from_db` function is called in the web server by `/refresh_all` endpoint.
   - requirement 2) defies the purpose of the view, and I’ve written my arguments before
   
   To make it clear, the main structure behind the view is a dictionary which will be no more than kilobytes for deployments with thousands of dags, so storing that in memory is not an issue at all.
   
   One other idea I have is to add a check based on `max(serialized_dag.last_updated)` to check if recalculating is needed.
   
   I’d really appreciate more support to progress this, otherwise I’ll close the PR, leave it us a plugin, and potentially abandon in the future.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org