You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/09/22 13:41:21 UTC

[GitHub] [airflow] ephraimbuddy opened a new pull request #18434: Add cascade to DagRun/TaskInstance relationship

ephraimbuddy opened a new pull request #18434:
URL: https://github.com/apache/airflow/pull/18434


   We currently have an issue where deleting dagruns causes a dependency error in Sqlalchemy because
   the session doesn't know what to do with the related taskinstances.
       
   This PR adds cascade so that when a dagrun is marked for deletion, the related taskinstances
   are also deleted
   
   ```
   Traceback (most recent call last):
     File "/usr/local/lib/python3.6/site-packages/flask_appbuilder/models/sqla/interface.py", line 698, in delete
       self.session.commit()
     File "/usr/local/lib/python3.6/site-packages/sqlalchemy/orm/scoping.py", line 163, in do
       return getattr(self.registry(), name)(*args, **kwargs)
     File "/usr/local/lib/python3.6/site-packages/sqlalchemy/orm/session.py", line 1046, in commit
       self.transaction.commit()
     File "/usr/local/lib/python3.6/site-packages/sqlalchemy/orm/session.py", line 504, in commit
       self._prepare_impl()
     File "/usr/local/lib/python3.6/site-packages/sqlalchemy/orm/session.py", line 483, in _prepare_impl
       self.session.flush()
     File "/usr/local/lib/python3.6/site-packages/sqlalchemy/orm/session.py", line 2540, in flush
       self._flush(objects)
     File "/usr/local/lib/python3.6/site-packages/sqlalchemy/orm/session.py", line 2682, in _flush
       transaction.rollback(_capture_exception=True)
     File "/usr/local/lib/python3.6/site-packages/sqlalchemy/util/langhelpers.py", line 70, in __exit__
       with_traceback=exc_tb,
     File "/usr/local/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
       raise exception
     File "/usr/local/lib/python3.6/site-packages/sqlalchemy/orm/session.py", line 2642, in _flush
       flush_context.execute()
     File "/usr/local/lib/python3.6/site-packages/sqlalchemy/orm/unitofwork.py", line 422, in execute
       rec.execute(self)
     File "/usr/local/lib/python3.6/site-packages/sqlalchemy/orm/unitofwork.py", line 538, in execute
       self.dependency_processor.process_deletes(uow, states)
     File "/usr/local/lib/python3.6/site-packages/sqlalchemy/orm/dependency.py", line 547, in process_deletes
       state, child, None, True, uowcommit, False
     File "/usr/local/lib/python3.6/site-packages/sqlalchemy/orm/dependency.py", line 604, in _synchronize
       sync.clear(dest, self.mapper, self.prop.synchronize_pairs)
     File "/usr/local/lib/python3.6/site-packages/sqlalchemy/orm/sync.py", line 88, in clear
       "column '%s' on instance '%s'" % (r, orm_util.state_str(dest))
   AssertionError: Dependency rule tried to blank-out primary key column 'task_instance.dag_id' on instance '<TaskInstance at 0x7fbdca3bc3c8>
   ```
   
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/main/UPDATING.md).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #18434: Add cascade to DagRun/TaskInstance relationship

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #18434:
URL: https://github.com/apache/airflow/pull/18434#discussion_r714029068



##########
File path: airflow/models/dagrun.py
##########
@@ -124,7 +124,9 @@ class DagRun(Base, LoggingMixin):
         ),
     )
 
-    task_instances = relationship(TI, back_populates="dag_run")
+    task_instances = relationship(
+        TI, back_populates="dag_run", cascade='save-update, merge, delete, delete-orphan'

Review comment:
       expunging the deleted rows from the session is fine -- it's been deleted from the DB anyway.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk merged pull request #18434: Add cascade to DagRun/TaskInstance relationship

Posted by GitBox <gi...@apache.org>.
potiuk merged pull request #18434:
URL: https://github.com/apache/airflow/pull/18434


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #18434: Add cascade to DagRun/TaskInstance relationship

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #18434:
URL: https://github.com/apache/airflow/pull/18434#issuecomment-924976941


   When did this start failing? Cos it has been passing for a while.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ephraimbuddy commented on a change in pull request #18434: Add cascade to DagRun/TaskInstance relationship

Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on a change in pull request #18434:
URL: https://github.com/apache/airflow/pull/18434#discussion_r713958582



##########
File path: airflow/models/dagrun.py
##########
@@ -124,7 +124,7 @@ class DagRun(Base, LoggingMixin):
         ),
     )
 
-    task_instances = relationship(TI, back_populates="dag_run")
+    task_instances = relationship(TI, back_populates="dag_run", cascade='all, delete, delete-orphan')

Review comment:
       Not sure the best option for us:
   @uranusjr @ashb @kaxil 
   https://docs.sqlalchemy.org/en/14/orm/cascades.html#unitofwork-cascades




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #18434: Add cascade to DagRun/TaskInstance relationship

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #18434:
URL: https://github.com/apache/airflow/pull/18434#issuecomment-925108373


   > Oh new test added in #16634 that passedon PR but never worked on main :(
   
   Sorry again - my fault in #17883 -  missed image tagging. This has been fixed in #18433 . 
   Also protection for the future added in #18435 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #18434: Add cascade to DagRun/TaskInstance relationship

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #18434:
URL: https://github.com/apache/airflow/pull/18434#discussion_r714003363



##########
File path: airflow/models/dagrun.py
##########
@@ -124,7 +124,9 @@ class DagRun(Base, LoggingMixin):
         ),
     )
 
-    task_instances = relationship(TI, back_populates="dag_run")
+    task_instances = relationship(
+        TI, back_populates="dag_run", cascade='save-update, merge, delete, delete-orphan'

Review comment:
       ```suggestion
           TI, back_populates="dag_run", cascade='all, delete-orphan'
   ```
   I think? That seems to be what the docs recommend as "typical"




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ephraimbuddy commented on a change in pull request #18434: Add cascade to DagRun/TaskInstance relationship

Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on a change in pull request #18434:
URL: https://github.com/apache/airflow/pull/18434#discussion_r714019112



##########
File path: airflow/models/dagrun.py
##########
@@ -124,7 +124,9 @@ class DagRun(Base, LoggingMixin):
         ),
     )
 
-    task_instances = relationship(TI, back_populates="dag_run")
+    task_instances = relationship(
+        TI, back_populates="dag_run", cascade='save-update, merge, delete, delete-orphan'

Review comment:
       Hmm. `all` is short for `save-update, merge, refresh-expire, expunge, delete`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ephraimbuddy commented on a change in pull request #18434: Add cascade to DagRun/TaskInstance relationship

Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on a change in pull request #18434:
URL: https://github.com/apache/airflow/pull/18434#discussion_r714007962



##########
File path: airflow/models/dagrun.py
##########
@@ -124,7 +124,9 @@ class DagRun(Base, LoggingMixin):
         ),
     )
 
-    task_instances = relationship(TI, back_populates="dag_run")
+    task_instances = relationship(
+        TI, back_populates="dag_run", cascade='save-update, merge, delete, delete-orphan'

Review comment:
       Was confused about the many options. :)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on a change in pull request #18434: Add cascade to DagRun/TaskInstance relationship

Posted by GitBox <gi...@apache.org>.
uranusjr commented on a change in pull request #18434:
URL: https://github.com/apache/airflow/pull/18434#discussion_r714015864



##########
File path: airflow/models/dagrun.py
##########
@@ -124,7 +124,9 @@ class DagRun(Base, LoggingMixin):
         ),
     )
 
-    task_instances = relationship(TI, back_populates="dag_run")
+    task_instances = relationship(
+        TI, back_populates="dag_run", cascade='save-update, merge, delete, delete-orphan'

Review comment:
       See https://github.com/apache/airflow/pull/18434#discussion_r713958136 (not sure, but the `explunge` part feels undesired)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #18434: Add cascade to DagRun/TaskInstance relationship

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #18434:
URL: https://github.com/apache/airflow/pull/18434#issuecomment-925013465


   The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb edited a comment on pull request #18434: Add cascade to DagRun/TaskInstance relationship

Posted by GitBox <gi...@apache.org>.
ashb edited a comment on pull request #18434:
URL: https://github.com/apache/airflow/pull/18434#issuecomment-924976941


   When did this start failing? Cos it has been passing for a while.
   
   Oh new test added in https://github.com/apache/airflow/pull/16634 that passedon PR but never worked on main :(


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on a change in pull request #18434: Add cascade to DagRun/TaskInstance relationship

Posted by GitBox <gi...@apache.org>.
uranusjr commented on a change in pull request #18434:
URL: https://github.com/apache/airflow/pull/18434#discussion_r713958136



##########
File path: airflow/models/dagrun.py
##########
@@ -124,7 +124,7 @@ class DagRun(Base, LoggingMixin):
         ),
     )
 
-    task_instances = relationship(TI, back_populates="dag_run")
+    task_instances = relationship(TI, back_populates="dag_run", cascade='all, delete, delete-orphan')

Review comment:
       According to documentation `all` implies `delete` so the latter is not needed. However, I don’t think we should do `expunge` (also implied by `all`). I’m also not sure about `refresh-expire`. Maybe a safer route would be `save-update, merge, delete, delete-orphan` (the first two are the default value).
   
   https://docs.sqlalchemy.org/en/14/orm/relationship_api.html#sqlalchemy.orm.relationship.params.cascade




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org