You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/08/30 10:19:32 UTC

[GitHub] [airflow] uranusjr commented on pull request #17891: Show error in UI when a DAG with same dag_id as another DAG is present

uranusjr commented on pull request #17891:
URL: https://github.com/apache/airflow/pull/17891#issuecomment-908224414


   This logic feels hacky to me. Even if the numbers of tasks are not the same, we can’t be sure it’s a duplicated ID either; the user might have renamed the file *and* made some modification. I think this is theoratically impossible to fix in the current structure.
   
   To really resolve this, we need a place to actually aggregate all the known DAG IDs during a DAG-parsing run. One possibility is to implement something like a standalone process exposing a message queue for every DAG-parsing process to send parsed DAG IDs to, and raises an error if it seems duplication.
   
   Another possibility is, instead of storing parsed DAGs directly to `SerialiazedDagModel`, DAG-parsing processes should save things to a different tables (say `ParsingDagModel`) when they are running. This table would be empty when a DAG-parsing round starts, so any duplicated IDs are guaranteed to be real duplication (barring some filesystem race condition edge cases which we don’t currently cover anyway). After all the parsing processes successfully finish parsing this round (without reporting duplication, this `ParsingDagModel` is dumped into `SerializedDagModel` and truncated for the next parsing round.
   
   Both would require some pretty involved change though.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org