You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/09/23 09:37:36 UTC
[GitHub] [airflow] kosteev opened a new issue, #26627: "task_fail" contains duplicates for FK to "task_instance" table
kosteev opened a new issue, #26627:
URL: https://github.com/apache/airflow/issues/26627
### Apache Airflow version
Other Airflow 2 version
### What happened
Airflow 2.3.3.
Task instance failures produce duplicates in "task_fail" table for this constraint (dag_id, task_id, run_id, map_index).
### What you think should happen instead
Recently FK constraint between task_fail and task_instance tables was introduced:
https://github.com/apache/airflow/pull/22260
And then there was a change to purge duplicates for this constraint from task_fail table (on db upgrade):
https://github.com/apache/airflow/pull/22769
Removing duplicates on db upgrade to Airflow 2.3+ before establishing FK between tables makes sense, however these duplicates can occur in running Airflow 2.3+ instance (see "How to reproduce").
What is rationale for removing duplicates once on upgrading to Airflow 2.3+ but keeping this possibility to generate duplicates again? Isn’t it going to break foreign key and integrity of these two tables?
### How to reproduce
Trigger DAG with task that fails multiple times for different tries, it will produce duplications in "task_fail" table.
Example (two tries with 5 mins interval for retries):
```
id | task_id | dag_id | start_date | end_date | duration | map_index | run_id
1 | task | dag1_failing | 2022-09-23 09:11:44.102894+00 | 2022-09-23 09:11:44.469007+00 | 0 | -1 | scheduled__2022-09-22T00:00:00+00:00
3 | task | dag1_failing | 2022-09-23 09:16:44.995269+00 | 2022-09-23 09:16:45.310398+00 | 0 | -1 | scheduled__2022-09-22T00:00:00+00:00
```
### Operating System
Linux
### Versions of Apache Airflow Providers
_No response_
### Deployment
Composer
### Deployment details
_No response_
### Anything else
_No response_
### Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] kosteev commented on issue #26627: "task_fail" contains duplicates for FK to "task_instance" table
Posted by GitBox <gi...@apache.org>.
kosteev commented on issue #26627:
URL: https://github.com/apache/airflow/issues/26627#issuecomment-1257540348
Sorry, I didn't stated it clearly, basically multiple failures of the same task instance produce duplicates in task_fail table.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] dstandish commented on issue #26627: "task_fail" contains duplicates for FK to "task_instance" table
Posted by GitBox <gi...@apache.org>.
dstandish commented on issue #26627:
URL: https://github.com/apache/airflow/issues/26627#issuecomment-1258336098
I'm taking a look at how we use taskfail though... i the effort is not super great, it would be best if we can figure out what is actually the key of this table and enforce it. i know it's used in the `dags/./duration` view ... not sure what the consequence is of having the duplicates.
but from a referential integrity perspective, there's no issue because it is TF -> TI and not the other way around
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] uranusjr commented on issue #26627: "task_fail" contains duplicates for FK to "task_instance" table
Posted by GitBox <gi...@apache.org>.
uranusjr commented on issue #26627:
URL: https://github.com/apache/airflow/issues/26627#issuecomment-1258925356
From a “recording the history” perspective, I’d expect repeated TaskFail events be all be kept, and if the UI can only show one, it should have logic to only pull in the last failure for a ti.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] dstandish commented on issue #26627: "task_fail" contains duplicates for FK to "task_instance" table
Posted by GitBox <gi...@apache.org>.
dstandish commented on issue #26627:
URL: https://github.com/apache/airflow/issues/26627#issuecomment-1259784044
Yeah, sigh... PR to remove the check: https://github.com/apache/airflow/pull/26714
I think in my head the FK ref direction was flipped so in my mind the TF records had to be deduped before adding the key. My mistake. Fortunately, it seems of little consequence.
I looked at the code in the task duration and gantt views, the two locations where some processing of taskfail records is done. It doesn't seem to work quite correctly.
For the task duration view, task fails are only accounted for in the cumulative view (they are ignored in the non-cumulative view), and the cumulative view seems broken because it can go down with increasing time.
For gantt view, it seems that taskfail records do not have an effect on the chart.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] uranusjr commented on issue #26627: "task_fail" contains duplicates for FK to "task_instance" table
Posted by GitBox <gi...@apache.org>.
uranusjr commented on issue #26627:
URL: https://github.com/apache/airflow/issues/26627#issuecomment-1257448199
I tend to believe this is not the intention, and feel free to submit fixes where duplicates can be created. (Can you provide examples? You didn’t explain _when_ duplicates happen.)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] dstandish commented on issue #26627: "task_fail" contains duplicates for FK to "task_instance" table
Posted by GitBox <gi...@apache.org>.
dstandish commented on issue #26627:
URL: https://github.com/apache/airflow/issues/26627#issuecomment-1258304326
Yes I think that removing the duplicates is not necassary.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] uranusjr commented on issue #26627: "task_fail" contains duplicates for FK to "task_instance" table
Posted by GitBox <gi...@apache.org>.
uranusjr commented on issue #26627:
URL: https://github.com/apache/airflow/issues/26627#issuecomment-1257544581
Hmm yeah that makes sense. I’m guessing the duplicated entries should be expected, and deleting them on upgrade could be an accident…? Not sure. cc @dstandish
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org