You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/06/27 10:54:21 UTC

[GitHub] [airflow] yuqian90 opened a new pull request #16681: Introduce CLEARED_WHEN_RUNNING state

yuqian90 opened a new pull request #16681:
URL: https://github.com/apache/airflow/pull/16681


   closes: #16680
   
   This PR makes sure that when a user clears a running task, the task does not fail. Instead it is killed and retried gracefully.
   
   This is done by introducing a new State called `CLEARED_WHEN_RUNNING`. As the name suggests, a `TaskInstance` is set to this state when it's cleared while running. Most of the places handles `CLEARED_WHEN_RUNNING` the same way `SHUTDOWN` is handled, except in `TaskInstance.is_eligible_to_retry`, where it is always be treated as eligible for retry.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #16681: Introduce CLEARED_WHEN_RUNNING state

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #16681:
URL: https://github.com/apache/airflow/pull/16681#discussion_r659976072



##########
File path: airflow/models/taskinstance.py
##########
@@ -1503,6 +1503,10 @@ def handle_failure_with_callback(
 
     def is_eligible_to_retry(self):
         """Is task instance is eligible for retry"""
+        if self.state == State.CLEARED_WHEN_RUNNING:
+            # If a task was cleared when running, it is always eligible for retry

Review comment:
       I am not sure you'd is the case. If it's already run its maximal number of tries out shouldn't run again should it?

##########
File path: airflow/models/taskinstance.py
##########
@@ -1503,6 +1503,10 @@ def handle_failure_with_callback(
 
     def is_eligible_to_retry(self):
         """Is task instance is eligible for retry"""
+        if self.state == State.CLEARED_WHEN_RUNNING:
+            # If a task was cleared when running, it is always eligible for retry

Review comment:
       I am not sure ~you'd~ this is the case. If it's already run its maximal number of tries out shouldn't run again should it?

##########
File path: airflow/models/taskinstance.py
##########
@@ -1503,6 +1503,10 @@ def handle_failure_with_callback(
 
     def is_eligible_to_retry(self):
         """Is task instance is eligible for retry"""
+        if self.state == State.CLEARED_WHEN_RUNNING:
+            # If a task was cleared when running, it is always eligible for retry

Review comment:
       Okay yes, that is a compelling argument.
   
   How about `RESTARTING` as the state name instead? It's slightly clearer what the behaviour will be (to me at least)
   
   Though I guess this isn't quite accurate either, as it would go back via the scheduler loop, rather than a direct restart in the worker.)
   
   Hmm




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] yuqian90 commented on pull request #16681: Introduce RESTARTING state

Posted by GitBox <gi...@apache.org>.
yuqian90 commented on pull request #16681:
URL: https://github.com/apache/airflow/pull/16681#issuecomment-879093095


   Thanks everyone for the suggestions in #6762. @kaxil @potiuk @mik-laj @kaverisharma09 I re-created the diagram with draw.io and here's a draft. Some modifications I made:
   - Added "restarting" state introduced in #16681 
   - Named the state according to their definitions in state.py
   - Added a choice node to represent the alternative state transition
   - Moved the standard lifecycle to the top of the graph. I.e. "none, to scheduled, to queued, to running, and finally to success" is illustrated at the top of the diagram.
   
   Please let me know what you think. I'll include the diagram in my PR and upload to  https://cwiki.apache.org/confluence/display/AIRFLOW/File+lists if you think it works.
   
   ![task_states](https://user-images.githubusercontent.com/6637585/125461038-0d369e18-adc3-4320-b1da-8d146972662c.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on pull request #16681: Introduce RESTARTING state

Posted by GitBox <gi...@apache.org>.
uranusjr commented on pull request #16681:
URL: https://github.com/apache/airflow/pull/16681#issuecomment-872797863


   Are we having a service interruption? Triggering a rebuild…


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #16681: Introduce RESTARTING state

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #16681:
URL: https://github.com/apache/airflow/pull/16681#discussion_r665299592



##########
File path: airflow/models/dagrun.py
##########
@@ -487,7 +487,9 @@ def task_instance_scheduling_decisions(self, session: Session = None) -> TISched
         schedulable_tis: List[TI] = []
         changed_tis = False
 
-        tis = list(self.get_task_instances(session=session, state=State.task_states + (State.SHUTDOWN,)))
+        tis = list(
+            self.get_task_instances(session=session, state=State.task_states + tuple(State.terminated_states))
+        )

Review comment:
       It's not clear to me why `State.SHUTDOWN` wasn't in task_states before, but it looks like Andew fixed that in #15285, so this can just become
   
   ```suggestion
           tis = list(self.get_task_instances(session=session, state=State.task_states))
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #16681: Introduce RESTARTING state

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #16681:
URL: https://github.com/apache/airflow/pull/16681#issuecomment-886764485


   Hey @yuqian90 , can you please rebase? We had some master issues recently, but they should be solved already :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #16681: Introduce CLEARED_WHEN_RUNNING state

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #16681:
URL: https://github.com/apache/airflow/pull/16681#discussion_r660507542



##########
File path: airflow/models/taskinstance.py
##########
@@ -1503,6 +1503,10 @@ def handle_failure_with_callback(
 
     def is_eligible_to_retry(self):
         """Is task instance is eligible for retry"""
+        if self.state == State.CLEARED_WHEN_RUNNING:
+            # If a task was cleared when running, it is always eligible for retry

Review comment:
       Okay yes, that is a compelling argument.
   
   How about `RESTARTING` as the state name instead? It's slightly clearer what the behaviour will be (to me at least)
   
   Though I guess this isn't quite accurate either, as it would go back via the scheduler loop, rather than a direct restart in the worker.)
   
   Hmm




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #16681: Introduce RESTARTING state

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #16681:
URL: https://github.com/apache/airflow/pull/16681#issuecomment-886718405


   The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] yuqian90 commented on pull request #16681: Introduce CLEARED_WHEN_RUNNING state

Posted by GitBox <gi...@apache.org>.
yuqian90 commented on pull request #16681:
URL: https://github.com/apache/airflow/pull/16681#issuecomment-870493349


   > Is a new state strictly required for this? Cos each state we add needs to be shown in the UI and increases the possible places we need to deal with the new state
   
   I think another way to implement this is to make `clear_task_instances` increase the `max_tries` of the running `TaskInstance` by 1. However, when I tried to do that I realized it's easy. There are more places that need to be changed to do the right thing. One example is `is_eligible_to_retry()`. It first checks if `self.task.retries` is zero. If it is, it short-circuits and returns False. But `self.task.retries` isn't something that can be incremented by  `clear_task_instances` because it's not stored in the db.
   
   ```
       def is_eligible_to_retry(self):
           """Is task instance is eligible for retry"""
           return self.task.retries and self.try_number <= self.max_tries
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] yuqian90 commented on pull request #16681: Introduce RESTARTING state

Posted by GitBox <gi...@apache.org>.
yuqian90 commented on pull request #16681:
URL: https://github.com/apache/airflow/pull/16681#issuecomment-872249071


   Ready for review. The failing [Helm Chart; KubernetesExecutor (pull_request)](https://github.com/apache/airflow/pull/16681/checks?check_run_id=2958003473) looks unrelated to this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #16681: Introduce RESTARTING state

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #16681:
URL: https://github.com/apache/airflow/pull/16681#discussion_r679566959



##########
File path: airflow/www/static/css/tree.css
##########
@@ -67,6 +67,10 @@ rect.shutdown {
   fill: blue;
 }
 
+rect.restarting {
+  fill: violet;

Review comment:
       We should add this state in https://github.com/apache/airflow/blob/c384f9b0f509bab704a70380465be18754800a52/airflow/settings.py#L83-L95




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] yuqian90 commented on a change in pull request #16681: Introduce RESTARTING state

Posted by GitBox <gi...@apache.org>.
yuqian90 commented on a change in pull request #16681:
URL: https://github.com/apache/airflow/pull/16681#discussion_r665419327



##########
File path: airflow/models/dagrun.py
##########
@@ -487,7 +487,9 @@ def task_instance_scheduling_decisions(self, session: Session = None) -> TISched
         schedulable_tis: List[TI] = []
         changed_tis = False
 
-        tis = list(self.get_task_instances(session=session, state=State.task_states + (State.SHUTDOWN,)))
+        tis = list(
+            self.get_task_instances(session=session, state=State.task_states + tuple(State.terminated_states))
+        )

Review comment:
       Thanks. This is great.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #16681: Introduce RESTARTING state

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #16681:
URL: https://github.com/apache/airflow/pull/16681#discussion_r665497589



##########
File path: airflow/utils/state.py
##########
@@ -180,6 +184,11 @@ def color_fg(cls, state):
     A list of states indicating that a task or dag is a success state.
     """
 
+    terminated_states = frozenset([TaskInstanceState.SHUTDOWN, TaskInstanceState.RESTARTING])

Review comment:
       ```suggestion
       terminating_states = frozenset([TaskInstanceState.SHUTDOWN, TaskInstanceState.RESTARTING])
   ```
   
   (update docs too etc please)

##########
File path: airflow/utils/state.py
##########
@@ -180,6 +184,11 @@ def color_fg(cls, state):
     A list of states indicating that a task or dag is a success state.
     """
 
+    terminated_states = frozenset([TaskInstanceState.SHUTDOWN, TaskInstanceState.RESTARTING])

Review comment:
       ```suggestion
       terminating_states = frozenset([TaskInstanceState.SHUTDOWN, TaskInstanceState.RESTARTING])
   ```
   
   (update docs/comments too etc please)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #16681: Introduce CLEARED_WHEN_RUNNING state

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #16681:
URL: https://github.com/apache/airflow/pull/16681#issuecomment-869867833


   Is a new state strictly required for this? Cos each state we add needs to be shown in the UI and increases the possible places we need to deal with the new state


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr closed pull request #16681: Introduce RESTARTING state

Posted by GitBox <gi...@apache.org>.
uranusjr closed pull request #16681:
URL: https://github.com/apache/airflow/pull/16681


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #16681: Introduce CLEARED_WHEN_RUNNING state

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #16681:
URL: https://github.com/apache/airflow/pull/16681#discussion_r659976072



##########
File path: airflow/models/taskinstance.py
##########
@@ -1503,6 +1503,10 @@ def handle_failure_with_callback(
 
     def is_eligible_to_retry(self):
         """Is task instance is eligible for retry"""
+        if self.state == State.CLEARED_WHEN_RUNNING:
+            # If a task was cleared when running, it is always eligible for retry

Review comment:
       I am not sure you'd is the case. If it's already run its maximal number of tries out shouldn't run again should it?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] yuqian90 commented on pull request #16681: Introduce CLEARED_WHEN_RUNNING state

Posted by GitBox <gi...@apache.org>.
yuqian90 commented on pull request #16681:
URL: https://github.com/apache/airflow/pull/16681#issuecomment-870493349


   > Is a new state strictly required for this? Cos each state we add needs to be shown in the UI and increases the possible places we need to deal with the new state
   
   I think another way to implement this is to make `clear_task_instances` increase the `max_tries` of the running `TaskInstance` by 1. However, when I tried to do that I realized it's easy. There are more places that need to be changed to do the right thing. One example is `is_eligible_to_retry()`. It first checks if `self.task.retries` is zero. If it is, it short-circuits and returns False. But `self.task.retries` isn't something that can be incremented by  `clear_task_instances` because it's not stored in the db.
   
   ```
       def is_eligible_to_retry(self):
           """Is task instance is eligible for retry"""
           return self.task.retries and self.try_number <= self.max_tries
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] yuqian90 commented on a change in pull request #16681: Introduce CLEARED_WHEN_RUNNING state

Posted by GitBox <gi...@apache.org>.
yuqian90 commented on a change in pull request #16681:
URL: https://github.com/apache/airflow/pull/16681#discussion_r660500931



##########
File path: airflow/models/taskinstance.py
##########
@@ -1503,6 +1503,10 @@ def handle_failure_with_callback(
 
     def is_eligible_to_retry(self):
         """Is task instance is eligible for retry"""
+        if self.state == State.CLEARED_WHEN_RUNNING:
+            # If a task was cleared when running, it is always eligible for retry

Review comment:
       Hi @ashb , I think we should run it again. For example, if a task is configured with `max_retries=0`, and it's now running with current `try_number` 0. Then the user clears it (e.g. to pick up a code change), should we kill the job? Yes. Should it retry? I think the answer is Yes too. Intuitively, the Clear operation in Airflow is always associated with clearing the current state of the taskinstance and get it ready to run again once its dependencies are met. The only exception is if the task is currently running. If a running task is cleared, it currently fails, not because of the clearing, but because it gets a SIGTERM after its state is set to SHUTDOWN. So the failure and the `on_failure_callback` received when a running task is cleared is counter-intuitive to many users.
   
   If the user insists he wants to see a failure, he can still use Mark Failed which does exactly that. So in order to make Clear consistently clear the state of the task, I think it makes more sense to quietly kill the task and let `is_eligible_to_retry` return True here. Let me know what you think.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] bbovenzi commented on a change in pull request #16681: Introduce RESTARTING state

Posted by GitBox <gi...@apache.org>.
bbovenzi commented on a change in pull request #16681:
URL: https://github.com/apache/airflow/pull/16681#discussion_r678227175



##########
File path: airflow/utils/state.py
##########
@@ -105,6 +107,7 @@ class State:
         TaskInstanceState.RUNNING: 'lime',
         TaskInstanceState.SUCCESS: 'green',
         TaskInstanceState.SHUTDOWN: 'blue',
+        TaskInstanceState.RESTARTING: 'goldenrod',

Review comment:
       Can we do a different color? `goldenrod` is too similar `gold`, `orange` and `tan`
   
   We don't use many blues, violets, or browns. [Here](https://www.w3schools.com/cssref/css_colors.asp) are browser supported colors to choose from.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #16681: Introduce RESTARTING state

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #16681:
URL: https://github.com/apache/airflow/pull/16681#issuecomment-875712476


   I mostly also thinking somewhere in the published docs tree.
   
   Ah https://airflow.apache.org/docs/apache-airflow/stable/concepts/tasks.html#task-instances
   
   That diagram needs updating :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #16681: Introduce RESTARTING state

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #16681:
URL: https://github.com/apache/airflow/pull/16681#issuecomment-872386267


   > Ready for review. The failing [Helm Chart; KubernetesExecutor (pull_request)](https://github.com/apache/airflow/pull/16681/checks?check_run_id=2958003473) looks unrelated to this PR.
   
   Yep hopefully already fixed by #16750 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] yuqian90 commented on pull request #16681: Introduce RESTARTING state

Posted by GitBox <gi...@apache.org>.
yuqian90 commented on pull request #16681:
URL: https://github.com/apache/airflow/pull/16681#issuecomment-875648056


   > Code and name looks good now, though this is probably worth documenting _somewhere_ so users can know about this behaviour.
   > 
   > (And you'll have to resolve conflicts obviously.)
   
   Thanks. That's done. I added a paragraph in UPDATING.md about this new behaviour. Please see if that helps.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] yuqian90 merged pull request #16681: Introduce RESTARTING state

Posted by GitBox <gi...@apache.org>.
yuqian90 merged pull request #16681:
URL: https://github.com/apache/airflow/pull/16681


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] yuqian90 commented on a change in pull request #16681: Introduce CLEARED_WHEN_RUNNING state

Posted by GitBox <gi...@apache.org>.
yuqian90 commented on a change in pull request #16681:
URL: https://github.com/apache/airflow/pull/16681#discussion_r660500931



##########
File path: airflow/models/taskinstance.py
##########
@@ -1503,6 +1503,10 @@ def handle_failure_with_callback(
 
     def is_eligible_to_retry(self):
         """Is task instance is eligible for retry"""
+        if self.state == State.CLEARED_WHEN_RUNNING:
+            # If a task was cleared when running, it is always eligible for retry

Review comment:
       Hi @ashb , I think we should run it again. For example, if a task is configured with `max_retries=0`, and it's now running with current `try_number` 0. Then the user clears it (e.g. to pick up a code change), should we kill the job? Yes. Should it retry? I think the answer is Yes too. Intuitively, the Clear operation in Airflow is always associated with clearing the current state of the taskinstance and get it ready to run again once its dependencies are met. The only exception is if the task is currently running. If a running task is cleared, it currently fails, not because of the clearing, but because it gets a SIGTERM after its state is set to SHUTDOWN. So the failure and the `on_failure_callback` received when a running task is cleared is counter-intuitive to many users.
   
   If the user insists he wants to see a failure, he can still use Mark Failed which does exactly that. So in order to make Clear consistently clear the state of the task, I think it makes more sense to quietly kill the task and let `is_eligible_to_retry` return True here. Let me know what you think.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #16681: Introduce CLEARED_WHEN_RUNNING state

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #16681:
URL: https://github.com/apache/airflow/pull/16681#issuecomment-869867833


   Is a new state strictly required for this? Cos each state we add needs to be shown in the UI and increases the possible places we need to deal with the new state


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] yuqian90 commented on a change in pull request #16681: Introduce RESTARTING state

Posted by GitBox <gi...@apache.org>.
yuqian90 commented on a change in pull request #16681:
URL: https://github.com/apache/airflow/pull/16681#discussion_r680296647



##########
File path: airflow/www/static/css/tree.css
##########
@@ -67,6 +67,10 @@ rect.shutdown {
   fill: blue;
 }
 
+rect.restarting {
+  fill: violet;

Review comment:
       Thanks @kaxil. I don't think that's necessary. Actually, the color is already hardcode in `state_color`, and then `STATE_COLORS` is used to override the values in `state_color` on this line:
   https://github.com/apache/airflow/blob/c384f9b0f509bab704a70380465be18754800a52/airflow/utils/state.py#L117
   Some states are already missing from `STATE_COLORS`, such as `NONE` and `SHUTDOWN`. I think we should either deprecate `STATE_COLORS` or auto-populate it from `state_color`. But maybe that's for a different PR.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] yuqian90 commented on a change in pull request #16681: Introduce RESTARTING state

Posted by GitBox <gi...@apache.org>.
yuqian90 commented on a change in pull request #16681:
URL: https://github.com/apache/airflow/pull/16681#discussion_r679030916



##########
File path: airflow/utils/state.py
##########
@@ -105,6 +107,7 @@ class State:
         TaskInstanceState.RUNNING: 'lime',
         TaskInstanceState.SUCCESS: 'green',
         TaskInstanceState.SHUTDOWN: 'blue',
+        TaskInstanceState.RESTARTING: 'goldenrod',

Review comment:
       Thanks. Changed to `violet`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #16681: Introduce CLEARED_WHEN_RUNNING state

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #16681:
URL: https://github.com/apache/airflow/pull/16681#discussion_r659976072



##########
File path: airflow/models/taskinstance.py
##########
@@ -1503,6 +1503,10 @@ def handle_failure_with_callback(
 
     def is_eligible_to_retry(self):
         """Is task instance is eligible for retry"""
+        if self.state == State.CLEARED_WHEN_RUNNING:
+            # If a task was cleared when running, it is always eligible for retry

Review comment:
       I am not sure ~you'd~ this is the case. If it's already run its maximal number of tries out shouldn't run again should it?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] yuqian90 commented on a change in pull request #16681: Introduce CLEARED_WHEN_RUNNING state

Posted by GitBox <gi...@apache.org>.
yuqian90 commented on a change in pull request #16681:
URL: https://github.com/apache/airflow/pull/16681#discussion_r660698247



##########
File path: airflow/models/taskinstance.py
##########
@@ -1503,6 +1503,10 @@ def handle_failure_with_callback(
 
     def is_eligible_to_retry(self):
         """Is task instance is eligible for retry"""
+        if self.state == State.CLEARED_WHEN_RUNNING:
+            # If a task was cleared when running, it is always eligible for retry

Review comment:
       Thanks @ashb. I agree `RESTARTING` is a more concise and easier to understand name. So I'm changing to that. 
   Regarding "it would go back via the scheduler loop", I think that's actually a better behaviour. TaskInstances that are cleared should go back to the scheduler loop instead of being restarted by the worker immediately. One example scenario is a DAG like this:
   ```
   A >> B
   ```
   If A is in success state, and B is a long running task that's currently running. And some external conditions requires a user to re-run both A and B. He can do this by clearing A. This sets the state of A to NONE and state of B to RESTARTING. B is then sent a SIGTERM and goes into UP_FOR_RETRY. While that's happening, the scheduler starts to run A first. Once A finishes, B can then run. 
   
   However, if we allow the worker to restart B immediately after it is cleared when running, it would have violated the dependency graph because A might not done yet when the worker restarts B. 
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on pull request #16681: Introduce RESTARTING state

Posted by GitBox <gi...@apache.org>.
kaxil commented on pull request #16681:
URL: https://github.com/apache/airflow/pull/16681#issuecomment-879907345


   > Thanks everyone for the suggestions in #6762. @kaxil @potiuk @mik-laj @kaverisharma09 I re-created the diagram with draw.io and here's a draft. Some modifications I made:
   > 
   > * Added "restarting" state introduced in [Introduce RESTARTING state #16681](https://github.com/apache/airflow/pull/16681)
   > * Named the state according to their definitions in state.py
   > * Added a choice node to represent the alternative state transition
   > * Moved the standard lifecycle to the top of the graph. I.e. "none, to scheduled, to queued, to running, and finally to success" is illustrated at the top of the diagram.
   > 
   > Please let me know what you think. I'll include the diagram in my PR and upload to https://cwiki.apache.org/confluence/display/AIRFLOW/File+lists if you think it works.
   > 
   > ![task_states](https://user-images.githubusercontent.com/6637585/125461038-0d369e18-adc3-4320-b1da-8d146972662c.png)
   
   Sounds good


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #16681: Introduce RESTARTING state

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #16681:
URL: https://github.com/apache/airflow/pull/16681#issuecomment-886610089


   Hey @ashb  - any more comments ? I think I would love to merge that one :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #16681: Introduce RESTARTING state

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #16681:
URL: https://github.com/apache/airflow/pull/16681#issuecomment-880013298


   Looks good to me :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #16681: Introduce RESTARTING state

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #16681:
URL: https://github.com/apache/airflow/pull/16681#discussion_r679567231



##########
File path: UPDATING.md
##########
@@ -127,6 +127,10 @@ If you are using DAGs Details API endpoint, use `max_active_tasks` instead of `c
 
 When marking a task success/failed in Graph View, its downstream tasks that are in failed/upstream_failed state are automatically cleared.
 
+### Clearing a running task sets its state to RESTARTING
+
+Previously, clearing a running task sets its state to SHUTDOWN. The task gets killed and goes into FAILED state. After [#16681](https://github.com/apache/airflow/pull/16681), clearing a running task sets its state to RESTARTING. The task is eligible for retry without going into FAILED state.
+

Review comment:
       ```suggestion
   ### Clearing a running task sets its state to `RESTARTING`
   
   Previously, clearing a running task sets its state to `SHUTDOWN`. The task gets killed and goes into `FAILED` state. After [#16681](https://github.com/apache/airflow/pull/16681), clearing a running task sets its state to `RESTARTING`. The task is eligible for retry without going into `FAILED` state.
   
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr edited a comment on pull request #16681: Introduce RESTARTING state

Posted by GitBox <gi...@apache.org>.
uranusjr edited a comment on pull request #16681:
URL: https://github.com/apache/airflow/pull/16681#issuecomment-872797863


   Are we having a service interruption? Triggering a rebuild… (Edit: That seems to work)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org