You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/12/02 17:46:42 UTC

[GitHub] [airflow] hterik opened a new issue, #28071: Kubernetes logging errors - attempting to adopt taskinstance which was not specified by database

hterik opened a new issue, #28071:
URL: https://github.com/apache/airflow/issues/28071

   ### Apache Airflow version
   
   2.4.3
   
   ### What happened
   
   Using following config
   ```
   executor = CeleryKubernetesExecutor
   delete_worker_pods = False
   ```
   
   1. Start a few dags running in kubernetes, wait for them to complete.
   2. Restart Scheduler.
   3. Logs are flooded with hundreds of errors like` ERROR - attempting to adopt taskinstance which was not specified by database: TaskInstanceKey(dag_id='xxx', task_id='yyy', run_id='zzz', try_number=1, map_index=-1)`
   
   This is problematic because:
   * Our installation has thousands of dags and pods so this becomes very noisy and the adoption-process adds excessive startup-time to the scheduler, up to a minute some times.
   * It's hiding actual errors with resetting orphaned tasks, something that also happens for inexplicable reasons on scheduler restart with following log: `Reset the following 6 orphaned TaskInstances`. Making such much harder to debug. The cause of them can not be easily correlated with those that were not specified by database.
   
   
   The cause of these logs are the Kubernetes executor on startup loads all pods (`try_adopt_task_instances`), it then cross references them with all `RUNNING` TaskInstances loaded via `scheduler_job.adopt_or_reset_orphaned_tasks`.
   For all pods where a running TI can not be found, it logs the error above - But for TIs that were already completed this is not an error, and the pods should not have to be loaded at all.
   
   I have an idea of adding some code in the kubernetes_executor that patches in something like a `completion-acknowleged`-label whenever a pod is completed (unless `delete_worker_pods` is set). Then on startup, all pods having this label can be excluded. Is this a good idea or do you see other potential solutions?
   
   ### What you think should happen instead
   
   _No response_
   
   ### How to reproduce
   
   _No response_
   
   ### Operating System
   
   Ubuntu 22.04
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Deployment
   
   Other Docker-based deployment
   
   ### Deployment details
   
   _No response_
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on issue #28071: Kubernetes logging errors - attempting to adopt taskinstance which was not specified by database

Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #28071:
URL: https://github.com/apache/airflow/issues/28071#issuecomment-1336652508

   cc: @dstandish @ephraimbuddy WDYT? is the suggestion good? @hterik - feel free to propose a fix in the meantime, I think it will be easier to understand the bug/fix when we see the code proposal and maybe then we can think if better solution is possible?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] NickYadance commented on issue #28071: Kubernetes logging errors - attempting to adopt taskinstance which was not specified by database

Posted by GitBox <gi...@apache.org>.
NickYadance commented on issue #28071:
URL: https://github.com/apache/airflow/issues/28071#issuecomment-1358818031

   related to this #27983


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] ephraimbuddy closed issue #28071: Kubernetes logging errors - attempting to adopt taskinstance which was not specified by database

Posted by GitBox <gi...@apache.org>.
ephraimbuddy closed issue #28071: Kubernetes logging errors - attempting to adopt taskinstance which was not specified by database
URL: https://github.com/apache/airflow/issues/28071


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org