You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/04/09 11:19:01 UTC

[GitHub] [airflow] uranusjr opened a new pull request #15302: Emit error on duplicated DAG ID

uranusjr opened a new pull request #15302:
URL: https://github.com/apache/airflow/pull/15302


   This will be shown in logs on initialization, and flashed in UI on later
   scheduled refreshes.
   
   Close #15248.
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on a change in pull request #15302: Emit error on duplicated DAG ID

Posted by GitBox <gi...@apache.org>.
uranusjr commented on a change in pull request #15302:
URL: https://github.com/apache/airflow/pull/15302#discussion_r611199277



##########
File path: airflow/models/dagbag.py
##########
@@ -195,6 +200,11 @@ def get_dag(self, dag_id, session: Session = None):
                     session=session,
                 )
                 if sd_last_updated_datetime and sd_last_updated_datetime > self.dags_last_fetched[dag_id]:
+                    self.dags = {  # Remove associated dags to re-add them.
+                        key: dag
+                        for key, dag in self.dags.items()
+                        if dag_id not in (key, dag.parent_dag.dag_id)

Review comment:
       Before anyone suggests `key != dag_id and dag.parent_dag.dag_id != dag_id`, which is both more logically easier to understand and probably more performant (shortcircuiting): pylint forces me to write this, and I can’t figure out a way to turn it off.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on a change in pull request #15302: Emit error on duplicated DAG ID

Posted by GitBox <gi...@apache.org>.
uranusjr commented on a change in pull request #15302:
URL: https://github.com/apache/airflow/pull/15302#discussion_r611199277



##########
File path: airflow/models/dagbag.py
##########
@@ -195,6 +200,11 @@ def get_dag(self, dag_id, session: Session = None):
                     session=session,
                 )
                 if sd_last_updated_datetime and sd_last_updated_datetime > self.dags_last_fetched[dag_id]:
+                    self.dags = {  # Remove associated dags to re-add them.
+                        key: dag
+                        for key, dag in self.dags.items()
+                        if dag_id not in (key, dag.parent_dag.dag_id)

Review comment:
       Before anyone suggests `key != dag_id and dag.parent_dag.dag_id != dag_id`, which is both more logically easier to understand and probably more performant (shortcircuiting): pylint forces me to write this, and I can’t figure out a way to turn it off.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on a change in pull request #15302: Emit error on duplicated DAG ID

Posted by GitBox <gi...@apache.org>.
uranusjr commented on a change in pull request #15302:
URL: https://github.com/apache/airflow/pull/15302#discussion_r611199277



##########
File path: airflow/models/dagbag.py
##########
@@ -195,6 +200,11 @@ def get_dag(self, dag_id, session: Session = None):
                     session=session,
                 )
                 if sd_last_updated_datetime and sd_last_updated_datetime > self.dags_last_fetched[dag_id]:
+                    self.dags = {  # Remove associated dags to re-add them.
+                        key: dag
+                        for key, dag in self.dags.items()
+                        if dag_id not in (key, dag.parent_dag.dag_id)

Review comment:
       Before any suggests `key != dag_id and dag.parent_dag.dag_id != dag_id`, which is both more logically easier to understand and probably more performant (shortcircuiting): pylint forces me to write this, and I can’t figure out a way to turn it off.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on pull request #15302: Emit error on duplicated DAG ID

Posted by GitBox <gi...@apache.org>.
uranusjr commented on pull request #15302:
URL: https://github.com/apache/airflow/pull/15302#issuecomment-817344650


   Alright, I think I’ve managed to fix all the issues in the test suite, and added a test for the new exception. Not sure why some jobs are erroring out.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #15302: Emit error on duplicated DAG ID

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #15302:
URL: https://github.com/apache/airflow/pull/15302#discussion_r626659477



##########
File path: airflow/models/dagbag.py
##########
@@ -390,7 +406,17 @@ def _process_modules(self, filepath, mods, file_last_changed_on_disk):
     def bag_dag(self, dag, root_dag):
         """
         Adds the DAG into the bag, recurses into sub dags.
-        Throws AirflowDagCycleException if a cycle is detected in this dag or its subdags
+
+        :throws: AirflowDagCycleException if a cycle is detected in this dag or its subdags.
+        :throws: AirflowDagDuplicatedIdException if this dag or its subdags already exists in the bag.

Review comment:
       Shouldn't this be `:raises:` ?
   
   ```suggestion
           :raises: AirflowDagCycleException if a cycle is detected in this dag or its subdags.
           :raises: AirflowDagDuplicatedIdException if this dag or its subdags already exists in the bag.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil merged pull request #15302: Emit error on duplicated DAG ID

Posted by GitBox <gi...@apache.org>.
kaxil merged pull request #15302:
URL: https://github.com/apache/airflow/pull/15302


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on pull request #15302: Emit error on duplicated DAG ID

Posted by GitBox <gi...@apache.org>.
uranusjr commented on pull request #15302:
URL: https://github.com/apache/airflow/pull/15302#issuecomment-816873816


   I need to figure out why *existing* tests are failing first.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #15302: Emit error on duplicated DAG ID

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #15302:
URL: https://github.com/apache/airflow/pull/15302#issuecomment-816627918






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on pull request #15302: Emit error on duplicated DAG ID

Posted by GitBox <gi...@apache.org>.
uranusjr commented on pull request #15302:
URL: https://github.com/apache/airflow/pull/15302#issuecomment-817238451


   Turns out the test suite actually contains quite several DAGs with duplicated ID 🙂 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #15302: Emit error on duplicated DAG ID

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #15302:
URL: https://github.com/apache/airflow/pull/15302#issuecomment-833232657


   [The Workflow run](https://github.com/apache/airflow/actions/runs/815683407) is cancelling this PR. Building images for the PR has failed. Follow the workflow link to check the reason.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on pull request #15302: Emit error on duplicated DAG ID

Posted by GitBox <gi...@apache.org>.
kaxil commented on pull request #15302:
URL: https://github.com/apache/airflow/pull/15302#issuecomment-817291151


   Whoops yeah -- lot more failing tests


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on a change in pull request #15302: Emit error on duplicated DAG ID

Posted by GitBox <gi...@apache.org>.
uranusjr commented on a change in pull request #15302:
URL: https://github.com/apache/airflow/pull/15302#discussion_r611222598



##########
File path: tests/dags/test_dag_with_no_tags.py
##########
@@ -27,5 +27,5 @@
     "start_date": DEFAULT_DATE,
 }
 
-with DAG(dag_id="test_only_dummy_tasks", default_args=default_args, schedule_interval='@once') as dag:
+with DAG(dag_id="test_dag_with_no_tags", default_args=default_args, schedule_interval='@once') as dag:

Review comment:
       There is a `test_only_dummy_tasks` DAG in `test_only_dummy_tasks.py`. Probably a copypasta error. DAG ID is not relevant here; the test this DAG serves never checks it, only the DAG’s content.

##########
File path: tests/core/test_impersonation_tests.py
##########
@@ -114,17 +114,19 @@ def create_user():
 
 @pytest.mark.quarantined
 class TestImpersonation(unittest.TestCase):
-    def setUp(self):
-        check_original_docker_image()
-        grant_permissions()
-        add_default_pool_if_not_exists()
-        self.dagbag = models.DagBag(
+    @classmethod
+    def setUpClass(cls):
+        cls.dagbag = models.DagBag(

Review comment:
       Not a required change, but sharing the DAG bag across tests provides a slight bit of performance gain. Most test case classes actually uses this pattern, not sure why it’s not applied here.

##########
File path: tests/api_connexion/endpoints/test_log_endpoint.py
##########
@@ -113,6 +113,7 @@ def _prepare_db(self):
         dagbag = self.app.dag_bag  # pylint: disable=no-member
         dag = DAG(self.DAG_ID, start_date=timezone.parse(self.default_time))
         dag.sync_to_db()
+        dagbag.dags.pop(self.DAG_ID, None)

Review comment:
       These `dict.pop()` calls resets the `app.bag_dag` state across tests so they can be re-populated for new things to test. An alternative fix is to use a different DAG ID for each test, which generates a larger diff, and probably slightly slower (DAGs created here need to be sync-ed into the db, so more DAG IDs = more db inserts; I didn’t actually benchmark this though).

##########
File path: tests/dags/test_backfill_pooled_tasks.py
##########
@@ -1,37 +0,0 @@
-#
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#   http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied.  See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-
-"""
-DAG designed to test what happens when a DAG with pooled tasks is run
-by a BackfillJob.
-Addresses issue #1225.
-"""
-from datetime import datetime
-
-from airflow.models import DAG
-from airflow.operators.dummy import DummyOperator
-
-dag = DAG(dag_id='test_backfill_pooled_task_dag')
-task = DummyOperator(
-    task_id='test_backfill_pooled_task',
-    dag=dag,
-    pool='test_backfill_pooled_task_pool',
-    owner='airflow',
-    start_date=datetime(2016, 2, 1),
-)

Review comment:
       This file is a strict subset of `test_issue_1225.py`, and the two `test_backfill_pooled_task_dag` were entirely equivalent.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #15302: Emit error on duplicated DAG ID

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #15302:
URL: https://github.com/apache/airflow/pull/15302#issuecomment-817129706


   The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest master at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on a change in pull request #15302: Emit error on duplicated DAG ID

Posted by GitBox <gi...@apache.org>.
uranusjr commented on a change in pull request #15302:
URL: https://github.com/apache/airflow/pull/15302#discussion_r611223539



##########
File path: tests/api_connexion/endpoints/test_log_endpoint.py
##########
@@ -113,6 +113,7 @@ def _prepare_db(self):
         dagbag = self.app.dag_bag  # pylint: disable=no-member
         dag = DAG(self.DAG_ID, start_date=timezone.parse(self.default_time))
         dag.sync_to_db()
+        dagbag.dags.pop(self.DAG_ID, None)

Review comment:
       These `dict.pop()` calls reset `app.bag_dag`’s state across tests so it can be re-populated for new things to test. An alternative fix is to use a different DAG ID for each test, which generates a larger diff, and probably slightly slower (DAGs created here need to be sync-ed into the db, so more DAG IDs = more db inserts; I didn’t actually benchmark this though).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on a change in pull request #15302: Emit error on duplicated DAG ID

Posted by GitBox <gi...@apache.org>.
uranusjr commented on a change in pull request #15302:
URL: https://github.com/apache/airflow/pull/15302#discussion_r627056163



##########
File path: airflow/models/dagbag.py
##########
@@ -390,7 +406,17 @@ def _process_modules(self, filepath, mods, file_last_changed_on_disk):
     def bag_dag(self, dag, root_dag):
         """
         Adds the DAG into the bag, recurses into sub dags.
-        Throws AirflowDagCycleException if a cycle is detected in this dag or its subdags
+
+        :throws: AirflowDagCycleException if a cycle is detected in this dag or its subdags.
+        :throws: AirflowDagDuplicatedIdException if this dag or its subdags already exists in the bag.

Review comment:
       You're right, I don't know why I wrote this.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on pull request #15302: Emit error on duplicated DAG ID

Posted by GitBox <gi...@apache.org>.
uranusjr commented on pull request #15302:
URL: https://github.com/apache/airflow/pull/15302#issuecomment-833233746


   Grrrrrr come ON. It’s midnight in the US and nobody is comitting things but me.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr closed pull request #15302: Emit error on duplicated DAG ID

Posted by GitBox <gi...@apache.org>.
uranusjr closed pull request #15302:
URL: https://github.com/apache/airflow/pull/15302


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #15302: Emit error on duplicated DAG ID

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #15302:
URL: https://github.com/apache/airflow/pull/15302#issuecomment-816628245


   [The Workflow run](https://github.com/apache/airflow/actions/runs/732912379) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #15302: Emit error on duplicated DAG ID

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #15302:
URL: https://github.com/apache/airflow/pull/15302#issuecomment-833209250


   [The Workflow run](https://github.com/apache/airflow/actions/runs/815599416) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org