You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/07/04 15:39:04 UTC

[GitHub] [airflow] dstandish commented on pull request #24743: Get dataset-driven scheduling working

dstandish commented on PR #24743:
URL: https://github.com/apache/airflow/pull/24743#issuecomment-1173948817

OK addressed the comments

> We still need to address the HA/lock issue

As presently constructed we shouldn't get conflicts because I think each dagrun should get a unique run id based on timestamp. But yeah we do need to think through how dagrun creation should work in this area. Let me know if you have any thoughts around what we should do.

> you need to explain/highlight where you've diverged from the AIP (which is from what I can see the behaviour around multiple datasets.)

Yeah so the AIP does state intention to support multiple upstream datasets I guess the change is doing it now rather than later. The other difference is AIP shows support for single rather than in a sequence -- we could pretty easily update this code to tolerate either sequence or dataset obj -- but i kindof like just requiring sequence better. WDYT?

There is dagrun creation behavior that isn't addressed in AIP that we do need to sort that out. But if there's a need to unblock other work perhaps we can do that in followup after discussion and debate.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org