You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/04/09 12:37:06 UTC

[GitHub] [airflow] zhongjiajie opened a new pull request #8231: Dag bulk_sync_to_db dag_tag only remove not exists

zhongjiajie opened a new pull request #8231: Dag bulk_sync_to_db dag_tag only remove not exists
URL: https://github.com/apache/airflow/pull/8231
 
 
   For now we remove all record in dag_tag, but actually
   we only need to delete tag not exists in dag file
   anymore
   
   ---
   Make sure to mark the boxes below before creating PR: [x]
   
   - [x] Description above provides context of the change
   - [x] Unit tests coverage for changes (not needed for documentation changes)
   - [x] Commits follow "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)"
   - [x] Relevant documentation is updated including usage instructions.
   - [x] I will engage committers as explained in [Contribution Workflow Example](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#contribution-workflow-example).
   
   ---
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   Read the [Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines) for more information.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] potiuk commented on issue #8231: Dag bulk_sync_to_db dag_tag only remove not exists

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #8231: Dag bulk_sync_to_db dag_tag only remove not exists
URL: https://github.com/apache/airflow/pull/8231#issuecomment-612601704
 
 
   Looks good to me - but I'd love others to take a look. @mik-laj has some intrinsic knowledge about saving the models in bulk particularly :)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] mik-laj commented on issue #8231: Dag bulk_sync_to_db dag_tag only remove not exists

Posted by GitBox <gi...@apache.org>.
mik-laj commented on issue #8231: Dag bulk_sync_to_db dag_tag only remove not exists
URL: https://github.com/apache/airflow/pull/8231#issuecomment-612602205
 
 
   I will look at this next week. I would like to use [perf-kit](https://github.com/apache/airflow/pull/7650) to check exactly what queries are executed here.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] zhongjiajie commented on a change in pull request #8231: Dag bulk_sync_to_db dag_tag only remove not exists

Posted by GitBox <gi...@apache.org>.
zhongjiajie commented on a change in pull request #8231: Dag bulk_sync_to_db dag_tag only remove not exists
URL: https://github.com/apache/airflow/pull/8231#discussion_r407281289
 
 

 ##########
 File path: airflow/models/dag.py
 ##########
 @@ -1524,12 +1524,12 @@ def bulk_sync_to_db(cls, dags: Collection["DAG"], sync_time=None, session=None):
             orm_dag.description = dag.description
             orm_dag.schedule_interval = dag.schedule_interval
             for orm_tag in list(orm_dag.tags):
-                if orm_tag.name not in orm_dag.tags:
+                if orm_tag.name not in set(dag.tags):
 
 Review comment:
   Change Iterable to set as discuss in https://github.com/apache/airflow/pull/8233#issuecomment-612643623

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] mik-laj commented on issue #8231: Dag bulk_sync_to_db dag_tag only remove not exists

Posted by GitBox <gi...@apache.org>.
mik-laj commented on issue #8231: Dag bulk_sync_to_db dag_tag only remove not exists
URL: https://github.com/apache/airflow/pull/8231#issuecomment-613515485
 
 
   I looked and your change is correct, but I wonder if we lack tests to automatically detect similar situations?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] zhongjiajie commented on issue #8231: Dag bulk_sync_to_db dag_tag only remove not exists

Posted by GitBox <gi...@apache.org>.
zhongjiajie commented on issue #8231: Dag bulk_sync_to_db dag_tag only remove not exists
URL: https://github.com/apache/airflow/pull/8231#issuecomment-613439829
 
 
   @mik-laj Not sure about it, Is it both two way we just delete/add `orm_dags` object but not query database? we just bulk to database with `session.commit()` statement? So whether remove all and add all or just remove not exists one, and add new one the query number are same?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] mik-laj commented on issue #8231: Dag bulk_sync_to_db dag_tag only remove not exists

Posted by GitBox <gi...@apache.org>.
mik-laj commented on issue #8231: Dag bulk_sync_to_db dag_tag only remove not exists
URL: https://github.com/apache/airflow/pull/8231#issuecomment-613511771
 
 
   @zhongjiajie 
   Here is SQL statements with parameters for your changes. 
   ```
   root@eeefbb427a9c:/opt/airflow# pytest tests/models/test_dag.py -k test_bulk_sync_to_db -s
   =========================================================================================== test session starts ============================================================================================
   platform linux -- Python 3.6.10, pytest-5.4.1, py-1.8.1, pluggy-0.13.1 -- /usr/local/bin/python
   cachedir: .pytest_cache
   rootdir: /opt/airflow, inifile: pytest.ini
   plugins: flaky-3.6.1, instafail-0.4.1.post0, requests-mock-1.7.0, celery-4.4.2, cov-2.8.1
   collected 63 items / 62 deselected / 1 selected
   
   tests/models/test_dag.py::TestDag::test_bulk_sync_to_db ========================= AIRFLOW ==========================
   Home of the user: /root
   Airflow home /root/airflow
   Skipping initializing of the DB as it was initialized already.
   You can re-initialize the database by adding --with-db-init flag when running tests.
   DELETE FROM dag_tag | {}
   DELETE FROM dag | {}
   [2020-04-14 15:26:58,412] {dag.py:1511} INFO - Sync 4 DAGs
   SELECT dag.dag_id AS dag_dag_id, dag.root_dag_id AS dag_root_dag_id, dag.is_paused AS dag_is_paused, dag.is_subdag AS dag_is_subdag, dag.is_active AS dag_is_active, dag.last_scheduler_run AS dag_last_scheduler_run, dag.last_pickled AS dag_last_pickled, dag.last_expired AS dag_last_expired, dag.scheduler_lock AS dag_scheduler_lock, dag.pickle_id AS dag_pickle_id, dag.fileloc AS dag_fileloc, dag.owners AS dag_owners, dag.description AS dag_description, dag.default_view AS dag_default_view, dag.schedule_interval AS dag_schedule_interval, dag_tag_1.name AS dag_tag_1_name, dag_tag_1.dag_id AS dag_tag_1_dag_id  FROM dag LEFT OUTER JOIN dag_tag AS dag_tag_1 ON dag.dag_id = dag_tag_1.dag_id  WHERE dag.dag_id IN (%(dag_id_1)s, %(dag_id_2)s, %(dag_id_3)s, %(dag_id_4)s) FOR UPDATE OF dag | {'dag_id_1': 'dag-bulk-sync-2', 'dag_id_2': 'dag-bulk-sync-1', 'dag_id_3': 'dag-bulk-sync-0', 'dag_id_4': 'dag-bulk-sync-3'}
   [2020-04-14 15:26:58,474] {dag.py:1532} INFO - Creating ORM DAG for dag-bulk-sync-2
   [2020-04-14 15:26:58,475] {dag.py:1532} INFO - Creating ORM DAG for dag-bulk-sync-1
   [2020-04-14 15:26:58,476] {dag.py:1532} INFO - Creating ORM DAG for dag-bulk-sync-0
   [2020-04-14 15:26:58,476] {dag.py:1532} INFO - Creating ORM DAG for dag-bulk-sync-3
   INSERT INTO dag (dag_id, root_dag_id, is_paused, is_subdag, is_active, last_scheduler_run, last_pickled, last_expired, scheduler_lock, pickle_id, fileloc, owners, description, default_view, schedule_interval) VALUES (%(dag_id)s, %(root_dag_id)s, %(is_paused)s, %(is_subdag)s, %(is_active)s, %(last_scheduler_run)s, %(last_pickled)s, %(last_expired)s, %(scheduler_lock)s, %(pickle_id)s, %(fileloc)s, %(owners)s, %(description)s, %(default_view)s, %(schedule_interval)s) | ({'dag_id': 'dag-bulk-sync-2', 'root_dag_id': None, 'is_paused': False, 'is_subdag': False, 'is_active': True, 'last_scheduler_run': datetime.datetime(2020, 4, 14, 15, 26, 58, 412818, tzinfo=<Timezone [UTC]>), 'last_pickled': None, 'last_expired': None, 'scheduler_lock': None, 'pickle_id': None, 'fileloc': '/opt/airflow/tests/models/test_dag.py', 'owners': '', 'description': None, 'default_view': 'tree', 'schedule_interval': '{"type": "timedelta", "attrs": {"days": 1, "seconds": 0, "microseconds": 0}}'}, {'dag_id': 'dag-bulk-sync-1', 'root_dag_id': None, 'is_paused': False, 'is_subdag': False, 'is_active': True, 'last_scheduler_run': datetime.datetime(2020, 4, 14, 15, 26, 58, 412818, tzinfo=<Timezone [UTC]>), 'last_pickled': None, 'last_expired': None, 'scheduler_lock': None, 'pickle_id': None, 'fileloc': '/opt/airflow/tests/models/test_dag.py', 'owners': '', 'description': None, 'default_view': 'tree', 'schedule_interval': '{"type": "timedelta", "attrs": {"days": 1, "seconds": 0, "microseconds": 0}}'}, {'dag_id': 'dag-bulk-sync-0', 'root_dag_id': None, 'is_paused': False, 'is_subdag': False, 'is_active': True, 'last_scheduler_run': datetime.datetime(2020, 4, 14, 15, 26, 58, 412818, tzinfo=<Timezone [UTC]>), 'last_pickled': None, 'last_expired': None, 'scheduler_lock': None, 'pickle_id': None, 'fileloc': '/opt/airflow/tests/models/test_dag.py', 'owners': '', 'description': None, 'default_view': 'tree', 'schedule_interval': '{"type": "timedelta", "attrs": {"days": 1, "seconds": 0, "microseconds": 0}}'}, {'dag_id': 'dag-bulk-sync-3', 'root_dag_id': None, 'is_paused': False, 'is_subdag': False, 'is_active': True, 'last_scheduler_run': datetime.datetime(2020, 4, 14, 15, 26, 58, 412818, tzinfo=<Timezone [UTC]>), 'last_pickled': None, 'last_expired': None, 'scheduler_lock': None, 'pickle_id': None, 'fileloc': '/opt/airflow/tests/models/test_dag.py', 'owners': '', 'description': None, 'default_view': 'tree', 'schedule_interval': '{"type": "timedelta", "attrs": {"days": 1, "seconds": 0, "microseconds": 0}}'})
   INSERT INTO dag_tag (name, dag_id) VALUES (%(name)s, %(dag_id)s) | ({'name': 'test-dag', 'dag_id': 'dag-bulk-sync-0'}, {'name': 'test-dag', 'dag_id': 'dag-bulk-sync-1'}, {'name': 'test-dag', 'dag_id': 'dag-bulk-sync-2'}, {'name': 'test-dag', 'dag_id': 'dag-bulk-sync-3'})
   SELECT dag.dag_id AS dag_dag_id  FROM dag | {}
   SELECT dag_tag.dag_id AS dag_tag_dag_id, dag_tag.name AS dag_tag_name  FROM dag_tag | {}
   [2020-04-14 15:26:58,497] {dag.py:1511} INFO - Sync 4 DAGs
   SELECT dag.dag_id AS dag_dag_id, dag.root_dag_id AS dag_root_dag_id, dag.is_paused AS dag_is_paused, dag.is_subdag AS dag_is_subdag, dag.is_active AS dag_is_active, dag.last_scheduler_run AS dag_last_scheduler_run, dag.last_pickled AS dag_last_pickled, dag.last_expired AS dag_last_expired, dag.scheduler_lock AS dag_scheduler_lock, dag.pickle_id AS dag_pickle_id, dag.fileloc AS dag_fileloc, dag.owners AS dag_owners, dag.description AS dag_description, dag.default_view AS dag_default_view, dag.schedule_interval AS dag_schedule_interval, dag_tag_1.name AS dag_tag_1_name, dag_tag_1.dag_id AS dag_tag_1_dag_id  FROM dag LEFT OUTER JOIN dag_tag AS dag_tag_1 ON dag.dag_id = dag_tag_1.dag_id  WHERE dag.dag_id IN (%(dag_id_1)s, %(dag_id_2)s, %(dag_id_3)s, %(dag_id_4)s) FOR UPDATE OF dag | {'dag_id_1': 'dag-bulk-sync-2', 'dag_id_2': 'dag-bulk-sync-1', 'dag_id_3': 'dag-bulk-sync-0', 'dag_id_4': 'dag-bulk-sync-3'}
   UPDATE dag SET last_scheduler_run=%(last_scheduler_run)s WHERE dag.dag_id = %(dag_dag_id)s | ({'last_scheduler_run': datetime.datetime(2020, 4, 14, 15, 26, 58, 497087, tzinfo=<Timezone [UTC]>), 'dag_dag_id': 'dag-bulk-sync-0'}, {'last_scheduler_run': datetime.datetime(2020, 4, 14, 15, 26, 58, 497087, tzinfo=<Timezone [UTC]>), 'dag_dag_id': 'dag-bulk-sync-1'}, {'last_scheduler_run': datetime.datetime(2020, 4, 14, 15, 26, 58, 497087, tzinfo=<Timezone [UTC]>), 'dag_dag_id': 'dag-bulk-sync-2'}, {'last_scheduler_run': datetime.datetime(2020, 4, 14, 15, 26, 58, 497087, tzinfo=<Timezone [UTC]>), 'dag_dag_id': 'dag-bulk-sync-3'})
   [2020-04-14 15:26:58,511] {dag.py:1511} INFO - Sync 4 DAGs
   SELECT dag.dag_id AS dag_dag_id, dag.root_dag_id AS dag_root_dag_id, dag.is_paused AS dag_is_paused, dag.is_subdag AS dag_is_subdag, dag.is_active AS dag_is_active, dag.last_scheduler_run AS dag_last_scheduler_run, dag.last_pickled AS dag_last_pickled, dag.last_expired AS dag_last_expired, dag.scheduler_lock AS dag_scheduler_lock, dag.pickle_id AS dag_pickle_id, dag.fileloc AS dag_fileloc, dag.owners AS dag_owners, dag.description AS dag_description, dag.default_view AS dag_default_view, dag.schedule_interval AS dag_schedule_interval, dag_tag_1.name AS dag_tag_1_name, dag_tag_1.dag_id AS dag_tag_1_dag_id  FROM dag LEFT OUTER JOIN dag_tag AS dag_tag_1 ON dag.dag_id = dag_tag_1.dag_id  WHERE dag.dag_id IN (%(dag_id_1)s, %(dag_id_2)s, %(dag_id_3)s, %(dag_id_4)s) FOR UPDATE OF dag | {'dag_id_1': 'dag-bulk-sync-2', 'dag_id_2': 'dag-bulk-sync-1', 'dag_id_3': 'dag-bulk-sync-0', 'dag_id_4': 'dag-bulk-sync-3'}
   UPDATE dag SET last_scheduler_run=%(last_scheduler_run)s WHERE dag.dag_id = %(dag_dag_id)s | ({'last_scheduler_run': datetime.datetime(2020, 4, 14, 15, 26, 58, 511952, tzinfo=<Timezone [UTC]>), 'dag_dag_id': 'dag-bulk-sync-0'}, {'last_scheduler_run': datetime.datetime(2020, 4, 14, 15, 26, 58, 511952, tzinfo=<Timezone [UTC]>), 'dag_dag_id': 'dag-bulk-sync-1'}, {'last_scheduler_run': datetime.datetime(2020, 4, 14, 15, 26, 58, 511952, tzinfo=<Timezone [UTC]>), 'dag_dag_id': 'dag-bulk-sync-2'}, {'last_scheduler_run': datetime.datetime(2020, 4, 14, 15, 26, 58, 511952, tzinfo=<Timezone [UTC]>), 'dag_dag_id': 'dag-bulk-sync-3'})
   [2020-04-14 15:26:58,527] {dag.py:1511} INFO - Sync 4 DAGs
   SELECT dag.dag_id AS dag_dag_id, dag.root_dag_id AS dag_root_dag_id, dag.is_paused AS dag_is_paused, dag.is_subdag AS dag_is_subdag, dag.is_active AS dag_is_active, dag.last_scheduler_run AS dag_last_scheduler_run, dag.last_pickled AS dag_last_pickled, dag.last_expired AS dag_last_expired, dag.scheduler_lock AS dag_scheduler_lock, dag.pickle_id AS dag_pickle_id, dag.fileloc AS dag_fileloc, dag.owners AS dag_owners, dag.description AS dag_description, dag.default_view AS dag_default_view, dag.schedule_interval AS dag_schedule_interval, dag_tag_1.name AS dag_tag_1_name, dag_tag_1.dag_id AS dag_tag_1_dag_id  FROM dag LEFT OUTER JOIN dag_tag AS dag_tag_1 ON dag.dag_id = dag_tag_1.dag_id  WHERE dag.dag_id IN (%(dag_id_1)s, %(dag_id_2)s, %(dag_id_3)s, %(dag_id_4)s) FOR UPDATE OF dag | {'dag_id_1': 'dag-bulk-sync-2', 'dag_id_2': 'dag-bulk-sync-1', 'dag_id_3': 'dag-bulk-sync-0', 'dag_id_4': 'dag-bulk-sync-3'}
   UPDATE dag SET last_scheduler_run=%(last_scheduler_run)s WHERE dag.dag_id = %(dag_dag_id)s | ({'last_scheduler_run': datetime.datetime(2020, 4, 14, 15, 26, 58, 527839, tzinfo=<Timezone [UTC]>), 'dag_dag_id': 'dag-bulk-sync-0'}, {'last_scheduler_run': datetime.datetime(2020, 4, 14, 15, 26, 58, 527839, tzinfo=<Timezone [UTC]>), 'dag_dag_id': 'dag-bulk-sync-1'}, {'last_scheduler_run': datetime.datetime(2020, 4, 14, 15, 26, 58, 527839, tzinfo=<Timezone [UTC]>), 'dag_dag_id': 'dag-bulk-sync-2'}, {'last_scheduler_run': datetime.datetime(2020, 4, 14, 15, 26, 58, 527839, tzinfo=<Timezone [UTC]>), 'dag_dag_id': 'dag-bulk-sync-3'})
   INSERT INTO dag_tag (name, dag_id) VALUES (%(name)s, %(dag_id)s) | ({'name': 'test-dag2', 'dag_id': 'dag-bulk-sync-0'}, {'name': 'test-dag2', 'dag_id': 'dag-bulk-sync-1'}, {'name': 'test-dag2', 'dag_id': 'dag-bulk-sync-2'}, {'name': 'test-dag2', 'dag_id': 'dag-bulk-sync-3'})
   SELECT dag.dag_id AS dag_dag_id  FROM dag | {}
   SELECT dag_tag.dag_id AS dag_tag_dag_id, dag_tag.name AS dag_tag_name  FROM dag_tag | {}
   [2020-04-14 15:26:58,551] {dag.py:1511} INFO - Sync 4 DAGs
   SELECT dag.dag_id AS dag_dag_id, dag.root_dag_id AS dag_root_dag_id, dag.is_paused AS dag_is_paused, dag.is_subdag AS dag_is_subdag, dag.is_active AS dag_is_active, dag.last_scheduler_run AS dag_last_scheduler_run, dag.last_pickled AS dag_last_pickled, dag.last_expired AS dag_last_expired, dag.scheduler_lock AS dag_scheduler_lock, dag.pickle_id AS dag_pickle_id, dag.fileloc AS dag_fileloc, dag.owners AS dag_owners, dag.description AS dag_description, dag.default_view AS dag_default_view, dag.schedule_interval AS dag_schedule_interval, dag_tag_1.name AS dag_tag_1_name, dag_tag_1.dag_id AS dag_tag_1_dag_id  FROM dag LEFT OUTER JOIN dag_tag AS dag_tag_1 ON dag.dag_id = dag_tag_1.dag_id  WHERE dag.dag_id IN (%(dag_id_1)s, %(dag_id_2)s, %(dag_id_3)s, %(dag_id_4)s) FOR UPDATE OF dag | {'dag_id_1': 'dag-bulk-sync-2', 'dag_id_2': 'dag-bulk-sync-1', 'dag_id_3': 'dag-bulk-sync-0', 'dag_id_4': 'dag-bulk-sync-3'}
   UPDATE dag SET last_scheduler_run=%(last_scheduler_run)s WHERE dag.dag_id = %(dag_dag_id)s | ({'last_scheduler_run': datetime.datetime(2020, 4, 14, 15, 26, 58, 551565, tzinfo=<Timezone [UTC]>), 'dag_dag_id': 'dag-bulk-sync-0'}, {'last_scheduler_run': datetime.datetime(2020, 4, 14, 15, 26, 58, 551565, tzinfo=<Timezone [UTC]>), 'dag_dag_id': 'dag-bulk-sync-1'}, {'last_scheduler_run': datetime.datetime(2020, 4, 14, 15, 26, 58, 551565, tzinfo=<Timezone [UTC]>), 'dag_dag_id': 'dag-bulk-sync-2'}, {'last_scheduler_run': datetime.datetime(2020, 4, 14, 15, 26, 58, 551565, tzinfo=<Timezone [UTC]>), 'dag_dag_id': 'dag-bulk-sync-3'})
   DELETE FROM dag_tag WHERE dag_tag.name = %(name)s AND dag_tag.dag_id = %(dag_id)s | ({'name': 'test-dag', 'dag_id': 'dag-bulk-sync-0'}, {'name': 'test-dag', 'dag_id': 'dag-bulk-sync-1'}, {'name': 'test-dag', 'dag_id': 'dag-bulk-sync-2'}, {'name': 'test-dag', 'dag_id': 'dag-bulk-sync-3'})
   SELECT dag.dag_id AS dag_dag_id  FROM dag | {}
   SELECT dag_tag.dag_id AS dag_tag_dag_id, dag_tag.name AS dag_tag_name  FROM dag_tag | {}
   ```
   I hope they will be helpful. I will try to finish the documentation for perf-kit today so that you too can easily experiment without copying the code.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] mik-laj commented on issue #8231: Dag bulk_sync_to_db dag_tag only remove not exists

Posted by GitBox <gi...@apache.org>.
mik-laj commented on issue #8231: Dag bulk_sync_to_db dag_tag only remove not exists
URL: https://github.com/apache/airflow/pull/8231#issuecomment-613395942
 
 
   I ran `tests.models.test_dag.TestDag.test_bulk_sync_to_db` test  with query tracking.
   ![Screenshot 2020-04-14 at 13 48 34](https://user-images.githubusercontent.com/12058428/79221767-b1976e00-7e56-11ea-8de3-7862b83ef131.png)
   On the left, we have database queries for this change. On the right, we have a master. 
   I don't see the difference:
   Did I miss something?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] zhongjiajie commented on a change in pull request #8231: Dag bulk_sync_to_db dag_tag only remove not exists

Posted by GitBox <gi...@apache.org>.
zhongjiajie commented on a change in pull request #8231: Dag bulk_sync_to_db dag_tag only remove not exists
URL: https://github.com/apache/airflow/pull/8231#discussion_r407281068
 
 

 ##########
 File path: airflow/models/dag.py
 ##########
 @@ -322,7 +322,7 @@ def __init__(
         self.is_paused_upon_creation = is_paused_upon_creation
 
         self.jinja_environment_kwargs = jinja_environment_kwargs
-        self.tags = tags
+        self.tags = tags or []
 
 Review comment:
   Add this to pass mypy check

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] potiuk commented on issue #8231: Dag bulk_sync_to_db dag_tag only remove not exists

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #8231: Dag bulk_sync_to_db dag_tag only remove not exists
URL: https://github.com/apache/airflow/pull/8231#issuecomment-613398809
 
 
   > I ran `tests.models.test_dag.TestDag.test_bulk_sync_to_db` test with query tracking.
   
   I love that we have this tool @mik-laj . It's super useful for any refactorings!

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [airflow] zhongjiajie commented on issue #8231: Dag bulk_sync_to_db dag_tag only remove not exists

Posted by GitBox <gi...@apache.org>.
zhongjiajie commented on issue #8231: Dag bulk_sync_to_db dag_tag only remove not exists
URL: https://github.com/apache/airflow/pull/8231#issuecomment-613543650
 
 
   > I looked and your change is correct, but I wonder if we lack tests to automatically detect similar situations?
   
   Ok, then we should not merge this until we find what we missing in config/test. I will try to help tomorrow  cause is midnight in my city

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services