You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/06/18 21:23:57 UTC

[GitHub] [airflow] ManiBharataraju opened a new pull request #16508: Fixes #16507 Subdags are now created with pause status based on their parent.

ManiBharataraju opened a new pull request #16508:
URL: https://github.com/apache/airflow/pull/16508


   closes: #16507
   In this PR I'm checking the pause status of the parent and assigning it to the subdag if they are newly added.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ManiBharataraju commented on a change in pull request #16508: Fixes #16507 Subdags are now created with pause status based on their parent.

Posted by GitBox <gi...@apache.org>.
ManiBharataraju commented on a change in pull request #16508:
URL: https://github.com/apache/airflow/pull/16508#discussion_r670194063



##########
File path: airflow/models/dag.py
##########
@@ -1847,7 +1847,10 @@ def bulk_write_to_db(cls, dags: Collection["DAG"], session=None):
         for missing_dag_id in missing_dag_ids:
             orm_dag = DagModel(dag_id=missing_dag_id)
             dag = dag_by_ids[missing_dag_id]
-            if dag.is_paused_upon_creation is not None:
+
+            if dag.is_subdag:
+                orm_dag.is_paused = dag.parent_dag.get_is_paused()
+            elif dag.is_paused_upon_creation is not None:

Review comment:
       @uranusjr - Okay. So what is happening is, let's say you create a dag and deploy it and it is running. At a later point, you want to add an enhancement to the dag which is a subdag. You would expect the subdag to run in the next schedule. However, what is happening is, the newly added subdag is created in off state because `is_paused_upon_creation` by default is False. My PR fixes this issue by checking the status of the parent dag for such cases only. 
   
   The workaround for this currently is, turn off the parent dag and turn it back on so that both the parent and subdag is in the same state.
   
   > When a parent DAG is set to pause, its subdags should also be paused?
   
   This is currently working fine and this PR does not change anything wrt this.
   If you still have questions, I can ping you over the slack channel. Thanks!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ManiBharataraju commented on pull request #16508: Fixes #16507 Subdags are now created with pause status based on their parent.

Posted by GitBox <gi...@apache.org>.
ManiBharataraju commented on pull request #16508:
URL: https://github.com/apache/airflow/pull/16508#issuecomment-1072990248


   @potiuk - Apologies. Completely missed this. Should I just do a git pull --rebase on my branch? Could you help.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on a change in pull request #16508: Fixes #16507 Subdags are now created with pause status based on their parent.

Posted by GitBox <gi...@apache.org>.
uranusjr commented on a change in pull request #16508:
URL: https://github.com/apache/airflow/pull/16508#discussion_r670205197



##########
File path: airflow/models/dag.py
##########
@@ -1847,7 +1847,10 @@ def bulk_write_to_db(cls, dags: Collection["DAG"], session=None):
         for missing_dag_id in missing_dag_ids:
             orm_dag = DagModel(dag_id=missing_dag_id)
             dag = dag_by_ids[missing_dag_id]
-            if dag.is_paused_upon_creation is not None:
+
+            if dag.is_subdag:
+                orm_dag.is_paused = dag.parent_dag.get_is_paused()
+            elif dag.is_paused_upon_creation is not None:

Review comment:
       I’m still not following so please bear with me…
   
   > However, what is happening is, the newly added subdag is created in off state because is_paused_upon_creation by default is False.
   
   If the default is False (i.e. not paused), why is the subdag created in the off state?
   
   ----
   
   The default `dags_are_paused_at_creation` value is actually True, so by default, dags are created in in the paused state. So when the parent DAG is unpaused, the newly-created subdag would indeed be in a different state. Is this what you’re trying to say? (but not describing the true/false on/off values correctly… which is understandable, the various negative value names are difficult to keep straight)
   
   ----
   
   Assuming the above is the case, setting a subdag to unpaused by default if the parent dag is unpaused does sound like a reasonable thing to do, and the patch is also right (from what I can tell). But that would arguably be a backward incompatibility behavioural change I’m not sure we should just do. I hope another maintainer can provide second opinion on this.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ManiBharataraju commented on a change in pull request #16508: Fixes #16507 Subdags are now created with pause status based on their parent.

Posted by GitBox <gi...@apache.org>.
ManiBharataraju commented on a change in pull request #16508:
URL: https://github.com/apache/airflow/pull/16508#discussion_r670215236



##########
File path: airflow/models/dag.py
##########
@@ -1847,7 +1847,10 @@ def bulk_write_to_db(cls, dags: Collection["DAG"], session=None):
         for missing_dag_id in missing_dag_ids:
             orm_dag = DagModel(dag_id=missing_dag_id)
             dag = dag_by_ids[missing_dag_id]
-            if dag.is_paused_upon_creation is not None:
+
+            if dag.is_subdag:
+                orm_dag.is_paused = dag.parent_dag.get_is_paused()
+            elif dag.is_paused_upon_creation is not None:

Review comment:
       > If the default is False (i.e. not paused), why is the subdag created in the off state?
   
   My bad, I just fixed my comment. It is true by default.
   
   > The default dags_are_paused_at_creation value is actually True, so by default, dags are created in in the paused state. So when the parent DAG is unpaused, the newly-created subdag would indeed be in a different state. Is this what you’re trying to say? (but not describing the true/false on/off values correctly… which is understandable, the various negative value names are difficult to keep straight)
   
   Yes right. 
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on pull request #16508: Fixes #16507 Subdags are now created with pause status based on their parent.

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on pull request #16508:
URL: https://github.com/apache/airflow/pull/16508#issuecomment-863446729


   Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
   Here are some useful points:
   - Pay attention to the quality of your code (flake8, pylint and type annotations). Our [pre-commits]( https://github.com/apache/airflow/blob/main/STATIC_CODE_CHECKS.rst#prerequisites-for-pre-commit-hooks) will help you with that.
   - In case of a new feature add useful documentation (in docstrings or in `docs/` directory). Adding a new operator? Check this short [guide](https://github.com/apache/airflow/blob/main/docs/apache-airflow/howto/custom-operator.rst) Consider adding an example DAG that shows how users should use it.
   - Consider using [Breeze environment](https://github.com/apache/airflow/blob/main/BREEZE.rst) for testing locally, it’s a heavy docker but it ships with a working Airflow and a lot of integrations.
   - Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
   - Please follow [ASF Code of Conduct](https://www.apache.org/foundation/policies/conduct) for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
   - Be sure to read the [Airflow Coding style]( https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#coding-style-and-best-practices).
   Apache Airflow is a community-driven project and together we are making it better 🚀.
   In case of doubts contact the developers at:
   Mailing List: dev@airflow.apache.org
   Slack: https://s.apache.org/airflow-slack
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk closed pull request #16508: Fixes #16507 Subdags are now created with pause status based on their parent.

Posted by GitBox <gi...@apache.org>.
potiuk closed pull request #16508:
URL: https://github.com/apache/airflow/pull/16508


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ManiBharataraju commented on a change in pull request #16508: Fixes #16507 Subdags are now created with pause status based on their parent.

Posted by GitBox <gi...@apache.org>.
ManiBharataraju commented on a change in pull request #16508:
URL: https://github.com/apache/airflow/pull/16508#discussion_r670215236



##########
File path: airflow/models/dag.py
##########
@@ -1847,7 +1847,10 @@ def bulk_write_to_db(cls, dags: Collection["DAG"], session=None):
         for missing_dag_id in missing_dag_ids:
             orm_dag = DagModel(dag_id=missing_dag_id)
             dag = dag_by_ids[missing_dag_id]
-            if dag.is_paused_upon_creation is not None:
+
+            if dag.is_subdag:
+                orm_dag.is_paused = dag.parent_dag.get_is_paused()
+            elif dag.is_paused_upon_creation is not None:

Review comment:
       > If the default is False (i.e. not paused), why is the subdag created in the off state?
   
   My bad, Sorry. I just fixed my comment. It is true by default.
   
   > The default dags_are_paused_at_creation value is actually True, so by default, dags are created in in the paused state. So when the parent DAG is unpaused, the newly-created subdag would indeed be in a different state. Is this what you’re trying to say? (but not describing the true/false on/off values correctly… which is understandable, the various negative value names are difficult to keep straight)
   
   Yes right. 
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ManiBharataraju commented on a change in pull request #16508: Fixes #16507 Subdags are now created with pause status based on their parent.

Posted by GitBox <gi...@apache.org>.
ManiBharataraju commented on a change in pull request #16508:
URL: https://github.com/apache/airflow/pull/16508#discussion_r669837961



##########
File path: airflow/models/dag.py
##########
@@ -1847,7 +1847,10 @@ def bulk_write_to_db(cls, dags: Collection["DAG"], session=None):
         for missing_dag_id in missing_dag_ids:
             orm_dag = DagModel(dag_id=missing_dag_id)
             dag = dag_by_ids[missing_dag_id]
-            if dag.is_paused_upon_creation is not None:
+
+            if dag.is_subdag:
+                orm_dag.is_paused = dag.parent_dag.get_is_paused()
+            elif dag.is_paused_upon_creation is not None:

Review comment:
       @uranusjr - any thoughts?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ManiBharataraju commented on a change in pull request #16508: Fixes #16507 Subdags are now created with pause status based on their parent.

Posted by GitBox <gi...@apache.org>.
ManiBharataraju commented on a change in pull request #16508:
URL: https://github.com/apache/airflow/pull/16508#discussion_r670194063



##########
File path: airflow/models/dag.py
##########
@@ -1847,7 +1847,10 @@ def bulk_write_to_db(cls, dags: Collection["DAG"], session=None):
         for missing_dag_id in missing_dag_ids:
             orm_dag = DagModel(dag_id=missing_dag_id)
             dag = dag_by_ids[missing_dag_id]
-            if dag.is_paused_upon_creation is not None:
+
+            if dag.is_subdag:
+                orm_dag.is_paused = dag.parent_dag.get_is_paused()
+            elif dag.is_paused_upon_creation is not None:

Review comment:
       @uranusjr - Okay. So what is happening is, let's say you create a dag and deploy it and it is running. At a later point, you want to add an enhancement to the dag which is a subdag. You would expect the subdag to run in the next schedule. However, what is happening is, the newly added subdag is created in off state because `is_paused_upon_creation` by default is True. My PR fixes this issue by checking the status of the parent dag for such cases only. 
   
   The workaround for this currently is, turn off the parent dag and turn it back on so that both the parent and subdag is in the same state.
   
   > When a parent DAG is set to pause, its subdags should also be paused?
   
   This is currently working fine and this PR does not change anything wrt this.
   If you still have questions, I can ping you over the slack channel. Thanks!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on a change in pull request #16508: Fixes #16507 Subdags are now created with pause status based on their parent.

Posted by GitBox <gi...@apache.org>.
uranusjr commented on a change in pull request #16508:
URL: https://github.com/apache/airflow/pull/16508#discussion_r670144247



##########
File path: airflow/models/dag.py
##########
@@ -1847,7 +1847,10 @@ def bulk_write_to_db(cls, dags: Collection["DAG"], session=None):
         for missing_dag_id in missing_dag_ids:
             orm_dag = DagModel(dag_id=missing_dag_id)
             dag = dag_by_ids[missing_dag_id]
-            if dag.is_paused_upon_creation is not None:
+
+            if dag.is_subdag:
+                orm_dag.is_paused = dag.parent_dag.get_is_paused()
+            elif dag.is_paused_upon_creation is not None:

Review comment:
       > setting is_paused_upon_creation there will impact existing subdags as well (don't think that will negatively impact but still).
   
   I’m definitely missing something, because I thought this is what this PR is for? When a parent DAG is set to pause, its subdags should also be paused?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #16508: Fixes #16507 Subdags are now created with pause status based on their parent.

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #16508:
URL: https://github.com/apache/airflow/pull/16508#issuecomment-1019918357


   Can you plese rebase this one @ManiBharataraju ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ManiBharataraju commented on a change in pull request #16508: Fixes #16507 Subdags are now created with pause status based on their parent.

Posted by GitBox <gi...@apache.org>.
ManiBharataraju commented on a change in pull request #16508:
URL: https://github.com/apache/airflow/pull/16508#discussion_r656103344



##########
File path: airflow/models/dag.py
##########
@@ -1847,7 +1847,10 @@ def bulk_write_to_db(cls, dags: Collection["DAG"], session=None):
         for missing_dag_id in missing_dag_ids:
             orm_dag = DagModel(dag_id=missing_dag_id)
             dag = dag_by_ids[missing_dag_id]
-            if dag.is_paused_upon_creation is not None:
+
+            if dag.is_subdag:
+                orm_dag.is_paused = dag.parent_dag.get_is_paused()
+            elif dag.is_paused_upon_creation is not None:

Review comment:
       @uranusjr Per my understanding, the `is_subdag` parameter is set to true only in the `_bag_dag` method, and setting `is_paused_upon_creation` there will impact existing subdags as well(don't think that will negatively impact but still). Changing is_paused there will be overwritten by this method for newly added ones. So, I chose to make the changes here. Please correct me if my understanding is wrong.
   https://github.com/apache/airflow/blob/28e285ef9a4702b3babf6ed3c094af07c017581f/airflow/models/dagbag.py#L447-L450




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #16508: Fixes #16507 Subdags are now created with pause status based on their parent.

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #16508:
URL: https://github.com/apache/airflow/pull/16508#issuecomment-886145443


   The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on a change in pull request #16508: Fixes #16507 Subdags are now created with pause status based on their parent.

Posted by GitBox <gi...@apache.org>.
uranusjr commented on a change in pull request #16508:
URL: https://github.com/apache/airflow/pull/16508#discussion_r654856286



##########
File path: airflow/models/dag.py
##########
@@ -1847,7 +1847,10 @@ def bulk_write_to_db(cls, dags: Collection["DAG"], session=None):
         for missing_dag_id in missing_dag_ids:
             orm_dag = DagModel(dag_id=missing_dag_id)
             dag = dag_by_ids[missing_dag_id]
-            if dag.is_paused_upon_creation is not None:
+
+            if dag.is_subdag:
+                orm_dag.is_paused = dag.parent_dag.get_is_paused()
+            elif dag.is_paused_upon_creation is not None:

Review comment:
       This feels a bit hacky to me. Can the subdag not be created with `is_paused_upon_creation` correctly set to True? Why not?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #16508: Fixes #16507 Subdags are now created with pause status based on their parent.

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #16508:
URL: https://github.com/apache/airflow/pull/16508#issuecomment-915654182


   This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on a change in pull request #16508: Fixes #16507 Subdags are now created with pause status based on their parent.

Posted by GitBox <gi...@apache.org>.
uranusjr commented on a change in pull request #16508:
URL: https://github.com/apache/airflow/pull/16508#discussion_r670205197



##########
File path: airflow/models/dag.py
##########
@@ -1847,7 +1847,10 @@ def bulk_write_to_db(cls, dags: Collection["DAG"], session=None):
         for missing_dag_id in missing_dag_ids:
             orm_dag = DagModel(dag_id=missing_dag_id)
             dag = dag_by_ids[missing_dag_id]
-            if dag.is_paused_upon_creation is not None:
+
+            if dag.is_subdag:
+                orm_dag.is_paused = dag.parent_dag.get_is_paused()
+            elif dag.is_paused_upon_creation is not None:

Review comment:
       I’m still not following so please bear with me…
   
   > However, what is happening is, the newly added subdag is created in off state because is_paused_upon_creation by default is False.
   
   If the default is False (i.e. not paused), why is the subdag created in the off state?
   
   The default `dags_are_paused_at_creation` value is actually True, so by default, dags are created in in the paused state. So when the parent DAG is unpaused, the newly-created subdag would indeed be in a different state. Is this what you’re trying to say? (but not describing the true/false on/off values correctly… which is understandable, the various negative value names are difficult to keep straight)
   
   ----
   
   Assuming that’s the case, setting a subdag to unpaused by default if the parent dag is unpaused does sound like a reasonable thing to do, and the patch is also right (from what I can tell). But that would arguably be a backward incompatibility behavioural change I’m not sure we should just do. I hope another maintainer can provide second opinion on this.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #16508: Fixes #16507 Subdags are now created with pause status based on their parent.

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #16508:
URL: https://github.com/apache/airflow/pull/16508#issuecomment-1072995362


   see the [docs](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#id14)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org