You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/02/27 08:28:30 UTC

[GitHub] [airflow] vemikhaylov opened a new pull request #14500: Clear tasks by task ids in REST API

vemikhaylov opened a new pull request #14500:
URL: https://github.com/apache/airflow/pull/14500


   closes: #13225


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kurtqq commented on pull request #14500: Clear tasks by task ids in REST API

Posted by GitBox <gi...@apache.org>.
kurtqq commented on pull request #14500:
URL: https://github.com/apache/airflow/pull/14500#issuecomment-813639446


   @kaxil  @ephraimbuddy  gentle ping


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ephraimbuddy commented on a change in pull request #14500: Clear tasks by task ids in REST API

Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on a change in pull request #14500:
URL: https://github.com/apache/airflow/pull/14500#discussion_r586901576



##########
File path: airflow/api_connexion/openapi/v1.yaml
##########
@@ -2218,6 +2218,13 @@ components:
           type: boolean
           default: true
 
+        task_ids:
+          description: A list of task ids to clear.
+          type: array
+          items:
+            type: string
+          minItems: 1

Review comment:
       Does this not imply that task_ids must not be empty?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] vemikhaylov commented on a change in pull request #14500: Clear tasks by task ids in REST API

Posted by GitBox <gi...@apache.org>.
vemikhaylov commented on a change in pull request #14500:
URL: https://github.com/apache/airflow/pull/14500#discussion_r587250624



##########
File path: airflow/api_connexion/openapi/v1.yaml
##########
@@ -2218,6 +2218,13 @@ components:
           type: boolean
           default: true
 
+        task_ids:
+          description: A list of task ids to clear.
+          type: array
+          items:
+            type: string
+          minItems: 1

Review comment:
       Yes, actually I have two points there:
   
   1) Empty list for `task_ids` can look a little ambiguous. Should it be treated as `null` and no filters should be applied in the case? Either should no tasks be cleared?
   
   2) If no tasks should be cleared in the case of empty list, what is the purpose to call the endpoint with it?
   
   What do you think?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ephraimbuddy commented on pull request #14500: Clear tasks by task ids in REST API

Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on pull request #14500:
URL: https://github.com/apache/airflow/pull/14500#issuecomment-792567986


   Please rebase land push


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ephraimbuddy edited a comment on pull request #14500: Clear tasks by task ids in REST API

Posted by GitBox <gi...@apache.org>.
ephraimbuddy edited a comment on pull request #14500:
URL: https://github.com/apache/airflow/pull/14500#issuecomment-792567986


   Please rebase and push


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] vemikhaylov commented on a change in pull request #14500: Clear tasks by task ids in REST API

Posted by GitBox <gi...@apache.org>.
vemikhaylov commented on a change in pull request #14500:
URL: https://github.com/apache/airflow/pull/14500#discussion_r584191749



##########
File path: airflow/models/dag.py
##########
@@ -1227,6 +1230,8 @@ def clear(
             tis = tis.filter(or_(TI.state == State.FAILED, TI.state == State.UPSTREAM_FAILED))
         if only_running:
             tis = tis.filter(TI.state == State.RUNNING)
+        if task_ids:
+            tis = tis.filter(TI.task_id.in_(task_ids))

Review comment:
       Actually the conditions are just added with conjunction:
   
   ```python
   # tst_double_in_query.py
   from sqlalchemy.orm import Session
   
   from airflow.models import TaskInstance
   
   session = Session()
   query = session.query(TaskInstance.task_id).filter(TaskInstance.task_id.in_(["foo"]))
   print(str(query.statement))
   query = query.filter(TaskInstance.task_id.in_(["bar"]))
   print(str(query.statement))
   ```
   
   ```
   $ python tst_double_in_query.py
   First query:
   SELECT task_instance.task_id
   FROM task_instance
   WHERE task_instance.task_id IN (:task_id_1)
   
   Second query:
   SELECT task_instance.task_id
   FROM task_instance
   WHERE task_instance.task_id IN (:task_id_1) AND task_instance.task_id IN (:task_id_2)
   ```
   
   So the second filter narrows down the search space if `task_ids` are provided.
   
   Naturally we can intersect the sets preliminary and apply the filter once, it can make the generated SQL code a little more efficient. Would it be better, how do you feel?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] vemikhaylov commented on a change in pull request #14500: Clear tasks by task ids in REST API

Posted by GitBox <gi...@apache.org>.
vemikhaylov commented on a change in pull request #14500:
URL: https://github.com/apache/airflow/pull/14500#discussion_r584191749



##########
File path: airflow/models/dag.py
##########
@@ -1227,6 +1230,8 @@ def clear(
             tis = tis.filter(or_(TI.state == State.FAILED, TI.state == State.UPSTREAM_FAILED))
         if only_running:
             tis = tis.filter(TI.state == State.RUNNING)
+        if task_ids:
+            tis = tis.filter(TI.task_id.in_(task_ids))

Review comment:
       Actually the conditions are just added with conjunction:
   
   ```python
   # tst_double_in_query.py
   from sqlalchemy.orm import Session
   
   from airflow.models import TaskInstance
   
   session = Session()
   query = session.query(TaskInstance.task_id).filter(TaskInstance.task_id.in_(["foo"]))
   print(str(query.statement))
   query = query.filter(TaskInstance.task_id.in_(["bar"]))
   print(str(query.statement))
   ```
   
   ```
   $ python tst_double_in_query.py
   First query:
   SELECT task_instance.task_id
   FROM task_instance
   WHERE task_instance.task_id IN (:task_id_1)
   
   Second query:
   SELECT task_instance.task_id
   FROM task_instance
   WHERE task_instance.task_id IN (:task_id_1) AND task_instance.task_id IN (:task_id_2)
   ```
   
   So the second filter narrows down the search space if `task_ids` are provided.
   
   Naturally we can intersect the sets preliminary and apply the filter once, it can make the generated SQL code a little more efficient.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] vemikhaylov commented on pull request #14500: Clear tasks by task ids in REST API

Posted by GitBox <gi...@apache.org>.
vemikhaylov commented on pull request #14500:
URL: https://github.com/apache/airflow/pull/14500#issuecomment-787150128


   I'm not sure whether we should check provided `task_ids` and raise the `400` or the `404` error if any of them doesn't exist for the given DAG or just apply the filter blindly.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ephraimbuddy commented on pull request #14500: Clear tasks by task ids in REST API

Posted by GitBox <gi...@apache.org>.
ephraimbuddy commented on pull request #14500:
URL: https://github.com/apache/airflow/pull/14500#issuecomment-813689725


   Closing and reopening to trigger test runs


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ephraimbuddy closed pull request #14500: Clear tasks by task ids in REST API

Posted by GitBox <gi...@apache.org>.
ephraimbuddy closed pull request #14500:
URL: https://github.com/apache/airflow/pull/14500


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ephraimbuddy merged pull request #14500: Clear tasks by task ids in REST API

Posted by GitBox <gi...@apache.org>.
ephraimbuddy merged pull request #14500:
URL: https://github.com/apache/airflow/pull/14500


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #14500: Clear tasks by task ids in REST API

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #14500:
URL: https://github.com/apache/airflow/pull/14500#issuecomment-792072533


   The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest master at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #14500: Clear tasks by task ids in REST API

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #14500:
URL: https://github.com/apache/airflow/pull/14500#discussion_r584109095



##########
File path: airflow/models/dag.py
##########
@@ -1227,6 +1230,8 @@ def clear(
             tis = tis.filter(or_(TI.state == State.FAILED, TI.state == State.UPSTREAM_FAILED))
         if only_running:
             tis = tis.filter(TI.state == State.RUNNING)
+        if task_ids:
+            tis = tis.filter(TI.task_id.in_(task_ids))

Review comment:
       How this will work with 
   ```
    tis = tis.filter(TI.task_id.in_(self.task_ids))
   ```
   from L1197?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] vemikhaylov commented on a change in pull request #14500: Clear tasks by task ids in REST API

Posted by GitBox <gi...@apache.org>.
vemikhaylov commented on a change in pull request #14500:
URL: https://github.com/apache/airflow/pull/14500#discussion_r587250624



##########
File path: airflow/api_connexion/openapi/v1.yaml
##########
@@ -2218,6 +2218,13 @@ components:
           type: boolean
           default: true
 
+        task_ids:
+          description: A list of task ids to clear.
+          type: array
+          items:
+            type: string
+          minItems: 1

Review comment:
       Yes, actually I have two points there:
   
   1) Empty list for `task_ids` can look a little ambiguous. Should it be treated as `null` and no filters should be applied in the case? Either should no tasks be cleared?
   
   2) If no tasks should be cleared in the case of empty list, what is the purpose to call the endpoint?
   
   What do you think?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org