You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/02/25 09:44:38 UTC

[GitHub] [airflow] RikHeijdens opened a new pull request #14452: Handle missing start dates

RikHeijdens opened a new pull request #14452:
URL: https://github.com/apache/airflow/pull/14452


   closes: https://github.com/apache/airflow/issues/14384
   
   This PR fixes two issues:
   1. A `TypeError` that would be raised from `_emit_duration_stats_for_finished_state()` when the scheduler transitions a `DagRun` from a `running` state into a `success` or `failed` state if the `DagRun` did not have a `start_date` or `end_date` set.
   2. An issue with the `DagRunEditForm`, which would clear the `start_date` and `end_date` for a DagRun, if the form was used to transition a `DagRun` from a `failed` state back into a `running` state (or any other state). In the event where the scheduler would determine the `DagRun` should've been in the `failed` or `success` state (e.g. because the task instances weren't cleared), then this would lead to a scheduler crash.
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] XD-DENG commented on a change in pull request #14452: Gracefully handle missing start_date and end_date for DagRun

Posted by GitBox <gi...@apache.org>.
XD-DENG commented on a change in pull request #14452:
URL: https://github.com/apache/airflow/pull/14452#discussion_r582800690



##########
File path: airflow/www/views.py
##########
@@ -3302,7 +3302,7 @@ class DagRunModelView(AirflowModelView):
         'external_trigger',
         'conf',
     ]
-    edit_columns = ['state', 'dag_id', 'execution_date', 'run_id', 'conf']
+    edit_columns = ['state', 'dag_id', 'execution_date', 'start_date', 'end_date', 'run_id', 'conf']

Review comment:
       May you clarify a bit more about this change? Thanks.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] RikHeijdens commented on a change in pull request #14452: Gracefully handle missing start_date and end_date for DagRun

Posted by GitBox <gi...@apache.org>.
RikHeijdens commented on a change in pull request #14452:
URL: https://github.com/apache/airflow/pull/14452#discussion_r582829692



##########
File path: airflow/www/views.py
##########
@@ -3302,7 +3302,7 @@ class DagRunModelView(AirflowModelView):
         'external_trigger',
         'conf',
     ]
-    edit_columns = ['state', 'dag_id', 'execution_date', 'run_id', 'conf']
+    edit_columns = ['state', 'dag_id', 'execution_date', 'start_date', 'end_date', 'run_id', 'conf']

Review comment:
       Sure! Happy to clarify why I choose to venture down this route.
   
   The [`DagRunEditForm`](https://github.com/apache/airflow/blob/master/airflow/www/forms.py#L172-L173) contains the `start_date` and `end_date` fields, and these are supposed to be "read-only" as they use [`AirflowDateTimePickerROWidget`](https://github.com/apache/airflow/blob/45e72ca83049a7db526b1f0fbd94c75f5f92cc75/airflow/www/widgets.py#L51).
   
   However, what would happen when I would use the `DagRunEditForm` through the Web UI is that the `start_date` and `end_date` would be set to `NULL` when submitting the form (even if they were set previously). This happened because `start_date` and `end_date` are not contained in the `edit_columns` list, and thus the form would not contain any data for them and overwrite old values.
   
   By adding `start_date` and `end_date` as entries to `edit_columns` columns, the widget displays them as "read-only" in the form, and no longer nulls them out when the form is submitted.
   
   Note that any DagRun that is in the `running` state, but does not have a `start_date` set may cause a scheduler crash. This crash is avoided by https://github.com/apache/airflow/pull/14452/commits/a780e8a029c3801cd60da784fcf0b7af2c58bf13.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] XD-DENG commented on a change in pull request #14452: Gracefully handle missing start_date and end_date for DagRun

Posted by GitBox <gi...@apache.org>.
XD-DENG commented on a change in pull request #14452:
URL: https://github.com/apache/airflow/pull/14452#discussion_r582871243



##########
File path: airflow/www/views.py
##########
@@ -3302,7 +3302,7 @@ class DagRunModelView(AirflowModelView):
         'external_trigger',
         'conf',
     ]
-    edit_columns = ['state', 'dag_id', 'execution_date', 'run_id', 'conf']
+    edit_columns = ['state', 'dag_id', 'execution_date', 'start_date', 'end_date', 'run_id', 'conf']

Review comment:
       Thanks @RikHeijdens for the details.
   
   Makes sense to me, and also looks good to me after I tested in a local minimized setup.
   
   I will approve. Meanwhile @kaxil may you also share your inputs or any question to further clarify?  




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #14452: Gracefully handle missing start_date and end_date for DagRun

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #14452:
URL: https://github.com/apache/airflow/pull/14452#discussion_r582885703



##########
File path: airflow/www/views.py
##########
@@ -3302,7 +3302,7 @@ class DagRunModelView(AirflowModelView):
         'external_trigger',
         'conf',
     ]
-    edit_columns = ['state', 'dag_id', 'execution_date', 'run_id', 'conf']
+    edit_columns = ['state', 'dag_id', 'execution_date', 'start_date', 'end_date', 'run_id', 'conf']

Review comment:
       ![image](https://user-images.githubusercontent.com/8811558/100955885-4991b700-350f-11eb-9a32-ce199f530d6e.gif)
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #14452: Gracefully handle missing start_date and end_date for DagRun

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #14452:
URL: https://github.com/apache/airflow/pull/14452#discussion_r582808739



##########
File path: airflow/www/views.py
##########
@@ -3302,7 +3302,7 @@ class DagRunModelView(AirflowModelView):
         'external_trigger',
         'conf',
     ]
-    edit_columns = ['state', 'dag_id', 'execution_date', 'run_id', 'conf']
+    edit_columns = ['state', 'dag_id', 'execution_date', 'start_date', 'end_date', 'run_id', 'conf']

Review comment:
       I see this in the PR description but not sure if I understand it -- when you mark a DagRun to state the Scheduler should set the appropriate date instead of the user
   
   >An issue with the DagRunEditForm, which would clear the start_date and end_date for a DagRun, if the form was used to transition a DagRun from a failed state back into a running state (or any other state). In the event where the scheduler would determine the DagRun should've been in the failed or success state (e.g. because the task instances weren't cleared), then this would lead to a scheduler crash.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil merged pull request #14452: Gracefully handle missing start_date and end_date for DagRun

Posted by GitBox <gi...@apache.org>.
kaxil merged pull request #14452:
URL: https://github.com/apache/airflow/pull/14452


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #14452: Gracefully handle missing start_date and end_date for DagRun

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #14452:
URL: https://github.com/apache/airflow/pull/14452#discussion_r582885147



##########
File path: airflow/www/views.py
##########
@@ -3302,7 +3302,7 @@ class DagRunModelView(AirflowModelView):
         'external_trigger',
         'conf',
     ]
-    edit_columns = ['state', 'dag_id', 'execution_date', 'run_id', 'conf']
+    edit_columns = ['state', 'dag_id', 'execution_date', 'start_date', 'end_date', 'run_id', 'conf']

Review comment:
       Oh yes I forgot -- I had added the Read-only Widgets for TI and DagRuns in https://github.com/apache/airflow/pull/12770 to fix another bug




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] XD-DENG commented on a change in pull request #14452: Gracefully handle missing start_date and end_date for DagRun

Posted by GitBox <gi...@apache.org>.
XD-DENG commented on a change in pull request #14452:
URL: https://github.com/apache/airflow/pull/14452#discussion_r582802881



##########
File path: airflow/www/views.py
##########
@@ -3302,7 +3302,7 @@ class DagRunModelView(AirflowModelView):
         'external_trigger',
         'conf',
     ]
-    edit_columns = ['state', 'dag_id', 'execution_date', 'run_id', 'conf']
+    edit_columns = ['state', 'dag_id', 'execution_date', 'start_date', 'end_date', 'run_id', 'conf']

Review comment:
       I'm not sure why we need this change to address the issue.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] RikHeijdens commented on a change in pull request #14452: Gracefully handle missing start_date and end_date for DagRun

Posted by GitBox <gi...@apache.org>.
RikHeijdens commented on a change in pull request #14452:
URL: https://github.com/apache/airflow/pull/14452#discussion_r582829692



##########
File path: airflow/www/views.py
##########
@@ -3302,7 +3302,7 @@ class DagRunModelView(AirflowModelView):
         'external_trigger',
         'conf',
     ]
-    edit_columns = ['state', 'dag_id', 'execution_date', 'run_id', 'conf']
+    edit_columns = ['state', 'dag_id', 'execution_date', 'start_date', 'end_date', 'run_id', 'conf']

Review comment:
       Sure! Happy to clarify why I chose to venture down this route.
   
   The [`DagRunEditForm`](https://github.com/apache/airflow/blob/master/airflow/www/forms.py#L172-L173) contains the `start_date` and `end_date` fields, and these are supposed to be "read-only" as they use [`AirflowDateTimePickerROWidget`](https://github.com/apache/airflow/blob/45e72ca83049a7db526b1f0fbd94c75f5f92cc75/airflow/www/widgets.py#L51).
   
   However, what would happen when I would use the `DagRunEditForm` through the Web UI is that the `start_date` and `end_date` would be set to `NULL` when submitting the form (even if they were set previously). This happened because `start_date` and `end_date` are not contained in the `edit_columns` list, and thus the form would not contain any data for them and overwrite old values.
   
   By adding `start_date` and `end_date` as entries to `edit_columns` columns, the widget displays them as "read-only" in the form, and no longer nulls them out when the form is submitted.
   
   Note that any DagRun that is in the `running` state, but does not have a `start_date` set may cause a scheduler crash. This crash is avoided by https://github.com/apache/airflow/pull/14452/commits/a780e8a029c3801cd60da784fcf0b7af2c58bf13.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org