You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "set92 (via GitHub)" <gi...@apache.org> on 2023/03/02 11:22:53 UTC

[GitHub] [airflow] set92 opened a new issue, #29872: Task running again after finishing getting a SUCCESS

set92 opened a new issue, #29872:
URL: https://github.com/apache/airflow/issues/29872

   ### Apache Airflow version
   
   Other Airflow 2 version (please specify below)
   
   ### What happened
   
   Airflow version: 2.4.2
   
   It looks like it is related to #27614 , but not sure how to reproduce it. It happened couple times last week, when we run a task that only log the current execution to a Postgres db through a python operator, but today has happened again 3 times.
   
   The thing is that the task runs once, and it finishes perfectly, but then idk why it runs again (it doesn't have retries), and it fails because of the PK (`Key (execution_id, taskgroup_id)=(2573, task_name) already exists.`). We think it only happens in this task, but since most of the other task are inserts into Bigquery, which doesn't have PKs, they could be appending twice the data without us knowing.
   
   To give you a visual representation of the taskgroup itself:
   
                                                            Few PythonOperators & BranchOperator
                                                         /                                                                            \
   Whitelisting: BranchOperator                                                                                 -- task_insert_pg_log: PythonOperator
                                                         \                                                                            /
                                                            -----------------------------------------------------------
   All the times that we got the error was after some BranchOperator (maybe this operator trigger twice even when they shouldn't?). The trigger_rule of the task_insert_pg_log is `none_failed_min_one_success`. But even if that would be the case the whitelisting is basically like a ShortCircuitOperator that stops the taskgroup from running, and the upper path it got executed, so it shouldn't have run the lower path. And it could have happened in other tasks, but since they didn't return an error, we think the problem is only here.
   
   I was checking the logs from audit log, but I can only see that the DAG tried to run this task 4 times, 2 in each scheduler? But idk why that happens. I checked other tasks, and they mostly have 2 runs, 1 per scheduler. Although some have 3.
   
   ```
   | Id      | Dttm                 | Dag Id                | Task Id                                                                         | Event        | Logical Date         | Owner    | Extra                                                                                                                                                                                                                                                                                                                                                                                                 |
   |---------|----------------------|-----------------------|---------------------------------------------------------------------------------|--------------|----------------------|----------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
   | 2327881 | 2023-03-01, 22:52:16 | generate_database_dag | taskgroup_name.task_insert_pg_log | cli_task_run |                      | root     | {"host_name": "generatedatabasedagtaskgroup-549cbe4e79214e0f91827164fc4657c6", "full_command": "['/opt/username/.venv/bin/airflow', 'tasks', 'run', 'generate_database_dag', 'taskgroup_name.task_insert_pg_log', 'scheduled__2023-02-27T22:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/master_dag_factory_generate_database_dag.py']"} |
   | 2327880 | 2023-03-01, 22:52:16 | generate_database_dag | taskgroup_name.task_insert_pg_log | running      | 2023-02-27, 22:00:00 | admin |                                                                                                                                                                                                                                                                                                                                                                                                       |
   | 2327662 | 2023-03-01, 22:50:13 | generate_database_dag | taskgroup_name.task_insert_pg_log | cli_task_run |                      | root     | {"host_name": "generatedatabasedagtaskgroup-549cbe4e79214e0f91827164fc4657c6", "full_command": "['/opt/username/.venv/bin/airflow', 'tasks', 'run', 'generate_database_dag', 'taskgroup_name.task_insert_pg_log', 'scheduled__2023-02-27T22:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/master_dag_factory_generate_database_dag.py']"} |
   | 2327641 | 2023-03-01, 22:50:02 | generate_database_dag | taskgroup_name.task_insert_pg_log | success      | 2023-02-27, 22:00:00 | admin |                                                                                                                                                                                                                                                                                                                                                                                                       |
   | 2327633 | 2023-03-01, 22:50:00 | generate_database_dag | taskgroup_name.task_insert_pg_log | cli_task_run |                      | root     | {"host_name": "generatedatabasedagtaskgroup-0c7227b07ac04dadbaae3df6f58b6edb", "full_command": "['/opt/username/.venv/bin/airflow', 'tasks', 'run', 'generate_database_dag', 'taskgroup_name.task_insert_pg_log', 'scheduled__2023-02-27T22:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/master_dag_factory_generate_database_dag.py']"} |
   | 2327631 | 2023-03-01, 22:50:00 | generate_database_dag | taskgroup_name.task_insert_pg_log | running      | 2023-02-27, 22:00:00 | admin |                                                                                                                                                                                                                                                                                                                                                                                                       |
   | 2327417 | 2023-03-01, 22:47:44 | generate_database_dag | taskgroup_name.task_insert_pg_log | cli_task_run |                      | root     | {"host_name": "generatedatabasedagtaskgroup-0c7227b07ac04dadbaae3df6f58b6edb", "full_command": "['/opt/username/.venv/bin/airflow', 'tasks', 'run', 'generate_database_dag', 'taskgroup_name.task_insert_pg_log', 'scheduled__2023-02-27T22:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/master_dag_factory_generate_database_dag.py']"} |
   ```
   
   So, not sure where else I could look for more logs or more information to try to known why those tasks got triggered.
   
   ### What you think should happen instead
   
   I thought that after a task has a SUCCESS it doesn't try to run again. And the worse is that I don't know where to look, when this is going to happen again, if what you mentioned in #27614 about the listener API will fix things without doing anything, and therefore it would be best to upgrade to 2.5.0 (we were waiting for 2.6.0 to upgrade to the new interface) or we will need to start adding some extra control at the start of each Operator
   
   ### How to reproduce
   
   Don't know, would love to get to the root of the problem, to be sure what is the problem and how can I avoid it, but don't know which logs or where I can look for more information.
   
   ### Operating System
   
   ubuntu 20.04
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow-providers-amazon==6.0.0
   apache-airflow-providers-cncf-kubernetes==4.4.0
   apache-airflow-providers-common-sql==1.2.0
   apache-airflow-providers-ftp==3.1.0
   apache-airflow-providers-google==6.8.0
   apache-airflow-providers-http==4.0.0
   apache-airflow-providers-imap==3.0.0
   apache-airflow-providers-postgres==5.2.2
   apache-airflow-providers-sendgrid==3.0.0
   apache-airflow-providers-slack==6.0.0
   apache-airflow-providers-sqlite==3.2.1
   
   ### Deployment
   
   Other 3rd-party Helm chart
   
   ### Deployment details
   
   _No response_
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] set92 commented on issue #29872: Task running again after finishing getting a SUCCESS

Posted by "set92 (via GitHub)" <gi...@apache.org>.
set92 commented on issue #29872:
URL: https://github.com/apache/airflow/issues/29872#issuecomment-1451821475

   Tracking the error and reading the other issue I think it may have to do with splitting the Dag to its own pod, we thought that this way it would generate and update the dags faster, but if the dagprocessor can make the tasks run, this could be what made this error to happen.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk closed issue #29872: Task running again after finishing getting a SUCCESS

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk closed issue #29872: Task running again after finishing getting a SUCCESS
URL: https://github.com/apache/airflow/issues/29872


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org