You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/08/26 21:04:13 UTC

[GitHub] [airflow] ben-astro opened a new issue #17861: Print error in Airflow UI when a DAG with duplicate DAG_id as another DAG is present

ben-astro opened a new issue #17861:
URL: https://github.com/apache/airflow/issues/17861


   **Description**
   
   I would like to see an error in Airflow UI (like ``` "Broken DAG: duplicate DAG-id" ```) when a DAG is present that has the same (non-unique) DAG-id as another DAG. Currently, Airflow silently ignores one of these duplicated DAGs. 
   
   **Use case / motivation**
   For users that either knowingly or unknowingly have multiple DAGs with the same DAG-id (which could happen especially in the case of dynamically generating DAGs), unintended/weird functionality can result as Airflow tries to figure out which DAG is the right one. It is not clear how Airflow will choose, and if that choice can change each time it parses for DAG changes.  
   
   **Additional Info**
   Currently in Airflow, my understanding is that the DAG file processor creates a new process and a new DagBag per file. As a result each DagBag only parses one DAG-script and thus DAGs with duplicate ids in different scripts are never detected. https://github.com/apache/airflow/blob/main/airflow/dag_processing/processor.py#L613
   
   Additionally, there's already a test and exception (```AirflowDagDuplicatedIdException```) in place for this, but it depends on a single DagBag having multiple DAGs in it, which isn't the case when actually running Airflow, from discussion with @BasPH and @uranusjr. 
   https://github.com/apache/airflow/blob/2.1.3/airflow/models/dagbag.py#L405-L409
   https://github.com/apache/airflow/blob/main/tests/models/test_dagbag.py#L143-L173
   
   **Related Issues**
   
   I did not find other issues related to this, when searching for "duplicate DAG" or "duplicate DAG-id".


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #17861: Print error in Airflow UI when a DAG with duplicate DAG_id as another DAG is present

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #17861:
URL: https://github.com/apache/airflow/issues/17861#issuecomment-906741384


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on issue #17861: Print error in Airflow UI when a DAG with duplicate DAG_id as another DAG is present

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #17861:
URL: https://github.com/apache/airflow/issues/17861#issuecomment-907190059


   Seems like good idea - maybe you would try to contribute ti @ben-astro ? Might be a very useful contribution back.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] siddharthvp commented on issue #17861: Print error in Airflow UI when a DAG with duplicate DAG_id as another DAG is present

Posted by GitBox <gi...@apache.org>.
siddharthvp commented on issue #17861:
URL: https://github.com/apache/airflow/issues/17861#issuecomment-907398091


   I was taking a shot at this, but am unsure how it can be implemented *correctly*. We could have a function like: 
   ```py
       @provide_session
       def _check_if_dupe(self, dag, session=None):
           other_dag = session.query(SerializedDagModel).filter(
               SerializedDagModel.dag_id == dag.dag_id,
               SerializedDagModel.fileloc != dag.fileloc
           ).first()
       return other_dag is not None
   ```
   
   but this would give a false-positive alert if a DAG was moved from file A to file B and by chance events occur in this sequence:
   
    process A -> DAG moved to B -> process B
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org