You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/09/23 09:37:36 UTC

[GitHub] [airflow] kosteev opened a new issue, #26627: "task_fail" contains duplicates for FK to "task_instance" table

kosteev opened a new issue, #26627:
URL: https://github.com/apache/airflow/issues/26627

   ### Apache Airflow version
   
   Other Airflow 2 version
   
   ### What happened
   
   Airflow 2.3.3.
   Task instance failures produce duplicates in "task_fail" table for this constraint (dag_id, task_id, run_id, map_index).
   
   ### What you think should happen instead
   
   Recently FK constraint between task_fail and task_instance tables was introduced:
   https://github.com/apache/airflow/pull/22260
   And then there was a change to purge duplicates for this constraint from task_fail table (on db upgrade):
   https://github.com/apache/airflow/pull/22769
   
   Removing duplicates on db upgrade to Airflow 2.3+ before establishing FK between tables makes sense, however these duplicates can occur in running Airflow 2.3+ instance (see "How to reproduce").
   
   What is rationale for removing duplicates once on upgrading to Airflow 2.3+ but keeping this possibility to generate duplicates again? Isn’t it going to break foreign key and integrity of these two tables?
   
   ### How to reproduce
   
   Trigger DAG with task that fails multiple times for different tries, it will produce duplications in "task_fail" table.
   Example (two tries with 5 mins interval for retries):
   ```
   id | task_id |    dag_id    |          start_date           |           end_date            | duration | map_index |                run_id
   1  | task    | dag1_failing | 2022-09-23 09:11:44.102894+00 | 2022-09-23 09:11:44.469007+00 |        0 |        -1 | scheduled__2022-09-22T00:00:00+00:00
   3  | task    | dag1_failing | 2022-09-23 09:16:44.995269+00 | 2022-09-23 09:16:45.310398+00 |        0 |        -1 | scheduled__2022-09-22T00:00:00+00:00
   ```
   
   ### Operating System
   
   Linux
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Deployment
   
   Composer
   
   ### Deployment details
   
   _No response_
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] kosteev commented on issue #26627: "task_fail" contains duplicates for FK to "task_instance" table

Posted by GitBox <gi...@apache.org>.
kosteev commented on issue #26627:
URL: https://github.com/apache/airflow/issues/26627#issuecomment-1257540348

   Sorry, I didn't stated it clearly, basically multiple failures of the same task instance produce duplicates in task_fail table.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] dstandish commented on issue #26627: "task_fail" contains duplicates for FK to "task_instance" table

Posted by GitBox <gi...@apache.org>.
dstandish commented on issue #26627:
URL: https://github.com/apache/airflow/issues/26627#issuecomment-1258336098

   I'm taking a look at how we use taskfail though... i the effort is not super great, it would be best if we can figure out what is actually the key of this table and enforce it.  i know it's used in the `dags/./duration` view ... not sure what the consequence is of having the duplicates.
   
   but from a referential integrity perspective, there's no issue because it is TF -> TI and not the other way around


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] uranusjr commented on issue #26627: "task_fail" contains duplicates for FK to "task_instance" table

Posted by GitBox <gi...@apache.org>.
uranusjr commented on issue #26627:
URL: https://github.com/apache/airflow/issues/26627#issuecomment-1258925356

   From a “recording the history” perspective, I’d expect repeated TaskFail events be all be kept, and if the UI can only show one, it should have logic to only pull in the last failure for a ti.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] dstandish commented on issue #26627: "task_fail" contains duplicates for FK to "task_instance" table

Posted by GitBox <gi...@apache.org>.
dstandish commented on issue #26627:
URL: https://github.com/apache/airflow/issues/26627#issuecomment-1259784044

   Yeah, sigh... PR to remove the check:  https://github.com/apache/airflow/pull/26714
   
   I think in my head the FK ref direction was flipped so in my mind the TF records had to be deduped before adding the key.  My mistake.  Fortunately, it seems of little consequence.
   
   I looked at the code in the task duration and gantt views, the two locations where some processing of taskfail records is done.  It doesn't seem to work quite correctly.  
   
   For the task duration view, task fails are only accounted for in the cumulative view (they are ignored in the non-cumulative view), and the cumulative view seems broken because it can go down with increasing time.
   
   For gantt view, it seems that taskfail records do not have an effect on the chart.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] uranusjr commented on issue #26627: "task_fail" contains duplicates for FK to "task_instance" table

Posted by GitBox <gi...@apache.org>.
uranusjr commented on issue #26627:
URL: https://github.com/apache/airflow/issues/26627#issuecomment-1257448199

   I tend to believe this is not the intention, and feel free to submit fixes where duplicates can be created. (Can you provide examples? You didn’t explain _when_ duplicates happen.)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] dstandish commented on issue #26627: "task_fail" contains duplicates for FK to "task_instance" table

Posted by GitBox <gi...@apache.org>.
dstandish commented on issue #26627:
URL: https://github.com/apache/airflow/issues/26627#issuecomment-1258304326

   Yes I think that removing the duplicates is not necassary.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] uranusjr commented on issue #26627: "task_fail" contains duplicates for FK to "task_instance" table

Posted by GitBox <gi...@apache.org>.
uranusjr commented on issue #26627:
URL: https://github.com/apache/airflow/issues/26627#issuecomment-1257544581

   Hmm yeah that makes sense. I’m guessing the duplicated entries should be expected, and deleting them on upgrade could be an accident…? Not sure. cc @dstandish 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org