You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/07/06 00:40:58 UTC

[GitHub] [airflow] jedcunningham commented on pull request #24743: Get dataset-driven scheduling working

jedcunningham commented on PR #24743:
URL: https://github.com/apache/airflow/pull/24743#issuecomment-1175635715

   I think "any" is a more general and simpler first pass. If you go "all", and you have a really infrequently produced dataset, we are basically forced to provide more functionality initially. For instance, if you have a DAG that produces a companies holidays for the year, but other datasets are produced way more frequently.
   
   This sorta cuts the other way too though, e.g. depending on 2 daily datasets, one of which is slower/later in the day. However, in this case you could short circuit if only 1 of the 2 is ready since you at least have a dagrun. Not great, but doable (and arguably better than rerunning infrequent jobs at the lowest common frequency, if we kept it simple).
   
   > if you have 10 upstream datasets, and each is in its own dag, you could easily get 10 dag runs created.
   
   Yep, exactly. I'd say that's desired for "any".


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org