You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/11/21 05:44:39 UTC

[GitHub] [airflow] houqp opened a new pull request #12530: fix dag serialization crash caused by preset DagContext

houqp opened a new pull request #12530:
URL: https://github.com/apache/airflow/pull/12530


   DagContext should not cause any side effect for `BaseOperator.get_serialized_fields`. This prevents serialization crash caused from use of `DagContext.push_context_managed_dag` in DAG definition.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #12530: fix dag serialization crash caused by preset DagContext

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #12530:
URL: https://github.com/apache/airflow/pull/12530#discussion_r528835294



##########
File path: tests/serialization/test_dag_serialization.py
##########
@@ -448,6 +448,14 @@ def test_deserialization_start_date(self, dag_start_date, task_start_date, expec
         simple_task = dag.task_dict["simple_task"]
         self.assertEqual(simple_task.start_date, expected_task_start_date)
 
+    def test_deserialization_with_dag_context(self):
+        dag = DAG(dag_id='simple_dag', start_date=datetime(2019, 8, 1, tzinfo=timezone.utc))
+        DagContext.push_context_managed_dag(dag)
+        BaseOperator(task_id='simple_task')
+        # should not raise RuntimeError: dictionary changed size during iteration
+        SerializedDAG.to_dict(dag)
+        DagContext.pop_context_managed_dag()

Review comment:
       ```suggestion
           with DAG(dag_id='simple_dag', start_date=datetime(2019, 8, 1, tzinfo=timezone.utc)) as dag:
               BaseOperator(task_id='simple_task')
               # should not raise RuntimeError: dictionary changed size during iteration
               SerializedDAG.to_dict(dag)
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] houqp commented on pull request #12530: fix dag serialization crash caused by preset DagContext

Posted by GitBox <gi...@apache.org>.
houqp commented on pull request #12530:
URL: https://github.com/apache/airflow/pull/12530#issuecomment-731944017


   @turbaszek @kaxil i think this is worth getting included into 2.0 rc release because it's breaking some of our DAGs that works prior to turn on dag serialization.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] houqp commented on a change in pull request #12530: Fix dag serialization crash caused by preset DagContext

Posted by GitBox <gi...@apache.org>.
houqp commented on a change in pull request #12530:
URL: https://github.com/apache/airflow/pull/12530#discussion_r528982217



##########
File path: airflow/models/baseoperator.py
##########
@@ -1336,6 +1336,9 @@ def get_extra_links(self, dttm: datetime, link_name: str) -> Optional[Dict[str,
     def get_serialized_fields(cls):
         """Stringified DAGs and operators contain exactly these fields."""
         if not cls.__serialized_fields:
+            from airflow.models.dag import DagContext
+
+            DagContext.push_context_managed_dag(None)

Review comment:
       Depends on how you use the context manager. `with DAG():` works as expected, but we have huge DAGs (> 9K loc) in production that would be a lot easier to maintain without that extra indentation introduced with the `with` block. So we are using `DagContext.push_context_managed_dag(dag)` directly in the DAG definition. This is what triggered the crash.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #12530: fix dag serialization crash caused by preset DagContext

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #12530:
URL: https://github.com/apache/airflow/pull/12530#discussion_r528710120



##########
File path: airflow/models/baseoperator.py
##########
@@ -1336,6 +1336,9 @@ def get_extra_links(self, dttm: datetime, link_name: str) -> Optional[Dict[str,
     def get_serialized_fields(cls):
         """Stringified DAGs and operators contain exactly these fields."""
         if not cls.__serialized_fields:
+            from airflow.models.dag import DagContext
+
+            DagContext.push_context_managed_dag(None)

Review comment:
       ok, maybe it is not required, since we don't add any dummy task.
   
   Although I am pretty sure we were able to use Context Manager with DAG Serialization.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #12530: fix dag serialization crash caused by preset DagContext

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #12530:
URL: https://github.com/apache/airflow/pull/12530#issuecomment-732316432


   [The Workflow run](https://github.com/apache/airflow/actions/runs/379456551) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #12530: Fix dag serialization crash caused by preset DagContext

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #12530:
URL: https://github.com/apache/airflow/pull/12530#discussion_r528983616



##########
File path: airflow/models/baseoperator.py
##########
@@ -1336,6 +1336,9 @@ def get_extra_links(self, dttm: datetime, link_name: str) -> Optional[Dict[str,
     def get_serialized_fields(cls):
         """Stringified DAGs and operators contain exactly these fields."""
         if not cls.__serialized_fields:
+            from airflow.models.dag import DagContext
+
+            DagContext.push_context_managed_dag(None)

Review comment:
       Got it, yeah :)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #12530: fix dag serialization crash caused by preset DagContext

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #12530:
URL: https://github.com/apache/airflow/pull/12530#issuecomment-732316185


   [The Workflow run](https://github.com/apache/airflow/actions/runs/379456105) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #12530: Fix dag serialization crash caused by preset DagContext

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #12530:
URL: https://github.com/apache/airflow/pull/12530#discussion_r529380391



##########
File path: tests/serialization/test_dag_serialization.py
##########
@@ -448,6 +447,12 @@ def test_deserialization_start_date(self, dag_start_date, task_start_date, expec
         simple_task = dag.task_dict["simple_task"]
         self.assertEqual(simple_task.start_date, expected_task_start_date)
 
+    def test_deserialization_with_dag_context(self):
+        with DAG(dag_id='simple_dag', start_date=datetime(2019, 8, 1, tzinfo=timezone.utc)) as dag:

Review comment:
       Was my suggestion here not valid then @houqp, given you said:
   
   
   > Depends on how you use the context manager. with DAG(): works as expected, but we have huge DAGs (> 9K loc) in production that would be a lot easier to maintain without that extra indentation introduced with the with block. So we are using DagContext.push_context_managed_dag(dag) directly in the DAG definition. This is what triggered the crash.
   
   I.e. does this actually test the code you need it to?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] houqp commented on a change in pull request #12530: Fix dag serialization crash caused by preset DagContext

Posted by GitBox <gi...@apache.org>.
houqp commented on a change in pull request #12530:
URL: https://github.com/apache/airflow/pull/12530#discussion_r528982217



##########
File path: airflow/models/baseoperator.py
##########
@@ -1336,6 +1336,9 @@ def get_extra_links(self, dttm: datetime, link_name: str) -> Optional[Dict[str,
     def get_serialized_fields(cls):
         """Stringified DAGs and operators contain exactly these fields."""
         if not cls.__serialized_fields:
+            from airflow.models.dag import DagContext
+
+            DagContext.push_context_managed_dag(None)

Review comment:
       Depends on how you use the context manager. `with DAG():` works as expected, but we have huge DAGs in production that would be a lot easier to maintain without that extra indentation introduced with the `with` block. So we are using `DagContext.push_context_managed_dag(dag)` directly in the DAG definition. This is what triggered the crash.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil merged pull request #12530: Fix dag serialization crash caused by preset DagContext

Posted by GitBox <gi...@apache.org>.
kaxil merged pull request #12530:
URL: https://github.com/apache/airflow/pull/12530


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] houqp commented on a change in pull request #12530: fix dag serialization crash caused by preset DagContext

Posted by GitBox <gi...@apache.org>.
houqp commented on a change in pull request #12530:
URL: https://github.com/apache/airflow/pull/12530#discussion_r528254977



##########
File path: airflow/models/baseoperator.py
##########
@@ -1336,6 +1336,9 @@ def get_extra_links(self, dttm: datetime, link_name: str) -> Optional[Dict[str,
     def get_serialized_fields(cls):
         """Stringified DAGs and operators contain exactly these fields."""
         if not cls.__serialized_fields:
+            from airflow.models.dag import DagContext
+
+            DagContext.push_context_managed_dag(None)

Review comment:
       @turbaszek added more comment in my commit. you should be able to reproduce the crash with the added test case as well. let me know if you would like me to add more clarification.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #12530: Fix dag serialization crash caused by preset DagContext

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #12530:
URL: https://github.com/apache/airflow/pull/12530#issuecomment-732328415


   The PR needs to run all tests because it modifies core of Airflow! Please rebase it to latest master or ask committer to re-run it!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #12530: fix dag serialization crash caused by preset DagContext

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #12530:
URL: https://github.com/apache/airflow/pull/12530#discussion_r528183318



##########
File path: airflow/models/baseoperator.py
##########
@@ -1336,6 +1336,9 @@ def get_extra_links(self, dttm: datetime, link_name: str) -> Optional[Dict[str,
     def get_serialized_fields(cls):
         """Stringified DAGs and operators contain exactly these fields."""
         if not cls.__serialized_fields:
+            from airflow.models.dag import DagContext
+
+            DagContext.push_context_managed_dag(None)

Review comment:
       @houqp can you tell more about why we need it?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] houqp commented on a change in pull request #12530: Fix dag serialization crash caused by preset DagContext

Posted by GitBox <gi...@apache.org>.
houqp commented on a change in pull request #12530:
URL: https://github.com/apache/airflow/pull/12530#discussion_r530029641



##########
File path: tests/serialization/test_dag_serialization.py
##########
@@ -448,6 +447,12 @@ def test_deserialization_start_date(self, dag_start_date, task_start_date, expec
         simple_task = dag.task_dict["simple_task"]
         self.assertEqual(simple_task.start_date, expected_task_start_date)
 
+    def test_deserialization_with_dag_context(self):
+        with DAG(dag_id='simple_dag', start_date=datetime(2019, 8, 1, tzinfo=timezone.utc)) as dag:

Review comment:
       your suggestion is still valid, calling serialization within the with block means the dummy task will be created with a DAG pushed to the top of dag context. it covers the same edge case, it's just not how some of our DAGs are written :)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #12530: fix dag serialization crash caused by preset DagContext

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #12530:
URL: https://github.com/apache/airflow/pull/12530#discussion_r528707175



##########
File path: airflow/models/baseoperator.py
##########
@@ -1336,6 +1336,9 @@ def get_extra_links(self, dttm: datetime, link_name: str) -> Optional[Dict[str,
     def get_serialized_fields(cls):
         """Stringified DAGs and operators contain exactly these fields."""
         if not cls.__serialized_fields:
+            from airflow.models.dag import DagContext
+
+            DagContext.push_context_managed_dag(None)

Review comment:
       Do we need to do this for the following places too?
   
   ```
   airflow/sensors/external_task_sensor.py:306:    def get_serialized_fields(cls):
   ```
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #12530: fix dag serialization crash caused by preset DagContext

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #12530:
URL: https://github.com/apache/airflow/pull/12530#discussion_r528869389



##########
File path: airflow/models/baseoperator.py
##########
@@ -1336,6 +1336,9 @@ def get_extra_links(self, dttm: datetime, link_name: str) -> Optional[Dict[str,
     def get_serialized_fields(cls):
         """Stringified DAGs and operators contain exactly these fields."""
         if not cls.__serialized_fields:
+            from airflow.models.dag import DagContext
+
+            DagContext.push_context_managed_dag(None)

Review comment:
       oh yeah realized after it and added another comment :) 
   
   >ok, maybe it is not required, since we don't add any dummy task.
   
   But I am still a bit surprised that we didn't catch or found this error before as I am pretty sure I have used Serialized DAG with `DAG context Manager`
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] houqp commented on a change in pull request #12530: fix dag serialization crash caused by preset DagContext

Posted by GitBox <gi...@apache.org>.
houqp commented on a change in pull request #12530:
URL: https://github.com/apache/airflow/pull/12530#discussion_r528867848



##########
File path: airflow/models/baseoperator.py
##########
@@ -1336,6 +1336,9 @@ def get_extra_links(self, dttm: datetime, link_name: str) -> Optional[Dict[str,
     def get_serialized_fields(cls):
         """Stringified DAGs and operators contain exactly these fields."""
         if not cls.__serialized_fields:
+            from airflow.models.dag import DagContext
+
+            DagContext.push_context_managed_dag(None)

Review comment:
       looks like it's calling `super().get_serialized_fields()`, so we should be good for ExternalTaskSensor




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org