You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "deanmorin (via GitHub)" <gi...@apache.org> on 2023/02/17 23:15:15 UTC

[GitHub] [airflow] deanmorin opened a new issue, #29601: EcsRunTaskOperator reattach does not work

deanmorin opened a new issue, #29601:
URL: https://github.com/apache/airflow/issues/29601

   ### Apache Airflow Provider(s)
   
   amazon
   
   ### Versions of Apache Airflow Providers
   
   `apache-airflow-providers-amazon==7.1.0`
   
   ### Apache Airflow version
   
   2.5.1
   
   ### Operating System
   
   Docker on ECS (apache/airflow:2.5.1-python3.9)
   
   ### Deployment
   
   Other Docker-based deployment
   
   ### Deployment details
   
   _No response_
   
   ### What happened
   
   The `EcsRunTaskOperator` has a [`reattach` option](https://github.com/apache/airflow/blob/2.5.1/airflow/providers/amazon/aws/operators/ecs.py#L310-L313). The idea is that when a task is launched on ECS, its `arn` will be saved in the `xcom` table so that if airflow restarts or something, it'll be able to reattach to the currently-running task in ECS rather than launching a new one.
   
   It always fails to get the `arn` of the running task from the `xcom` table however. 
   
   ### What you think should happen instead
   
   When airflow restarts and retries an `EcsRunTaskOperator` task that was killed by the the restart, it should find the `arn` of the currently-running ECS task and continue waiting for that ECS task to finish instead of starting a new one.
   
   ### How to reproduce
   
   1. Create a DAG containing an `EcsRunTaskOperator` task which, for testing purposes, takes at least a few minutes to complete
   2. Trigger a DAG run
   3. While the task is running, redeploy airflow
   
   Check the task logs when the task restarts. In the logs you'll see "No active previously launched task found to reattach"
   
   
   ### Anything else
   
   There are two problems from what I can tell.
   
   ### Problem 1
   
   [When is pushes the xcom data](https://github.com/apache/airflow/blob/2.5.1/airflow/providers/amazon/aws/operators/ecs.py#L496), it uses the `task_id` of the task.
   
   [When in tries to retrieve the data](https://github.com/apache/airflow/blob/2.5.1/airflow/providers/amazon/aws/operators/ecs.py#L508-L512) it uses a made-up `task_id`, so it'll never find the one saved earlier.
   
   [It also uses the same made-up `task_id`](https://github.com/apache/airflow/blob/2.5.1/airflow/providers/amazon/aws/operators/ecs.py#L422) when it tries to later delete the `xcom` data.
   
   ### Problem 2
   
   Switching from the made-up `task_id` to the normal `task_id` during retrieval doesn't help, since all `xcom` rows with the task/dag/run id are deleted when the task restarts, so the `arn` saved on the previous attempt is never available. 
   
   I tried changing the `xcom_push` to this:
   ```python
   XCom.set(
       task_id=self.REATTACH_XCOM_TASK_ID_TEMPLATE.format(task_id=self.task_id),
       key=self.REATTACH_XCOM_KEY,
       value=self.arn,
       dag_id=self.dag_id,
       run_id=context["ti"].run_id,
   )
   
   ```
   This causes this error:
   ```
    psycopg2.errors.ForeignKeyViolation: insert or update on table "xcom" violates foreign key constraint "xcom_task_instance_fkey"
   DETAIL:  Key (dag_id, task_id, run_id, map_index)=(meltano_distribution, distribution_to_snowflake_task_arn, manual__2023-02-17T19:08:34.015748+00:00, -1) is not present in table "task_instance".
   ```
   
   You used to be able to make up a `task_id` as a hack to save things in the `xcom` table, but that foreign key constraint must have been added at some point.
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] Taragolis commented on issue #29601: EcsRunTaskOperator reattach does not work

Posted by "Taragolis (via GitHub)" <gi...@apache.org>.
Taragolis commented on issue #29601:
URL: https://github.com/apache/airflow/issues/29601#issuecomment-1435402446

   https://github.com/apache/airflow/pull/29447


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk closed issue #29601: EcsRunTaskOperator reattach does not work

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk closed issue #29601: EcsRunTaskOperator reattach does not work
URL: https://github.com/apache/airflow/issues/29601


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org