You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/03/30 20:11:18 UTC

[GitHub] [airflow] chrismclennon opened a new issue #9610: Pod logs from KubernetesPodOperator occasionally get replaced with "Task is not able to run"

chrismclennon opened a new issue #9610:
URL: https://github.com/apache/airflow/issues/9610


   **Apache Airflow version**: 1.10.10
   **Kubernetes version (if you are using kubernetes)** (use `kubectl version`): 1.17.2
   
   **Environment**:
   
   - **Cloud provider or hardware configuration**: AWS
   - **OS** (e.g. from /etc/os-release): CentOS Linux
   - **Kernel** (e.g. `uname -a`): `Linux airflow-worker-0 5.6.13-1.el7.elrepo.x86_64 #1 SMP Thu May 14 08:05:24 EDT 2020 x86_64 x86_64 x86_64 GNU/Linux`
   - **Install tools**:
   - **Others**:
   
   **What happened**:
   
   I run Airflow as a way to orchestrate jobs on Kubernetes using the KubernetesPodOperator. While most of the time logs appear correctly in the Airflow webserver, I do notice increasingly that the logs do not appear and instead just show a message, "Task is not able to be run", such as in the snippet below:
   
   ```
   *** Reading remote log from s3://*****/****/******/2020-06-30T10:00:00+00:00/1.log.
   [2020-06-30 23:07:40,362] {taskinstance.py:663} INFO - Dependencies not met for <TaskInstance: ****.***** 2020-06-30T10:00:00+00:00 [running]>, dependency 'Task Instance State' FAILED: Task is in the 'running' state which is not a valid state for execution. The task must be cleared in order to be run.
   [2020-06-30 23:07:40,363] {logging_mixin.py:112} INFO - [2020-06-30 23:07:40,363] {local_task_job.py:91} INFO - Task is not able to be run
   ```
   
   Unusually, when I go check what is happening on the Kubernetes cluster, the pod is actually running and emitting logs when I run a `kubectl logs` command. When the pod is complete, Airflow will reflect that the task has completed as well.
   
   **What you expected to happen**: I expected pod logs to be printed out.
   
   **How to reproduce it**: Very unfortunately, I am unsure what circumstances cause this error and am currently trying to gather evidence to replicate.
   
   **Anything else we need to know**: 
   * I have remote logging set to an S3 bucket.
   * I've noticed this issue increasingly with the 1.10.10 update, and I find this error daily.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] auvipy commented on issue #9610: Pod logs from KubernetesPodOperator occasionally get replaced with "Task is not able to run"

Posted by GitBox <gi...@apache.org>.
auvipy commented on issue #9610:
URL: https://github.com/apache/airflow/issues/9610#issuecomment-810718428


   celery Redis has some issues which we still couldn't resolve. does running with rabbitmq or other amqp broker result the same? I only suggest that some on facing this in prod dig deep into celery. that way the possible root cause could be found in celery.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dennyac edited a comment on issue #9610: Pod logs from KubernetesPodOperator occasionally get replaced with "Task is not able to run"

Posted by GitBox <gi...@apache.org>.
dennyac edited a comment on issue #9610:
URL: https://github.com/apache/airflow/issues/9610#issuecomment-706425932


   Upvoted that ticket. I understand that this is tied to a celery issue about repeated enqueueing. But there could be valid scenarios where the same job is enqueued more than once. 
   
   I'm concerned how airflow's handling of this scenario as well. Especially about not honoring execution task timeout, determining which job to use to determine task state and logs.
   
   Would be good to have a solution for non Kubernetes deployments as well, as this issue is more generic.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dimberman commented on issue #9610: Pod logs from KubernetesPodOperator occasionally get replaced with "Task is not able to run"

Posted by GitBox <gi...@apache.org>.
dimberman commented on issue #9610:
URL: https://github.com/apache/airflow/issues/9610#issuecomment-688920360


   @chrismclennon Do you have any non K8sPodOperator tasks that run for over an hour? Is it possible that this happens with any task that runs for a long time?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on issue #9610: Pod logs from KubernetesPodOperator occasionally get replaced with "Task is not able to run"

Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #9610:
URL: https://github.com/apache/airflow/issues/9610#issuecomment-785358399


   > Upvoted that ticket. I understand that this is tied to a celery issue about repeated enqueueing. But there could be valid scenarios where the same job is enqueued more than once.
   > 
   > I'm concerned about how airflow handles this scenario as well. Especially about not honoring task execution timeout, determining which job to use to determine task state and logs.
   > 
   > Would be good to have a solution for non Kubernetes deployments as well, as this issue is more generic.
   
   Have you tried it out with Airflow 2.0.1 ? If not can you please test it with it and report back with findings, thanks


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] chrismclennon edited a comment on issue #9610: Pod logs from KubernetesPodOperator occasionally get replaced with "Task is not able to run"

Posted by GitBox <gi...@apache.org>.
chrismclennon edited a comment on issue #9610:
URL: https://github.com/apache/airflow/issues/9610#issuecomment-687219067


   Hey @dimberman. This still happens when we are running a single worker, so I don't believe it's a race condition between workers. I don't notice any dependency failure in the completed logs loaded in S3. 
   
   Some other interesting observations we've made:
   
   - The logs in S3 has the "Task is not able to run" message
   - In reproducing the issue, we saw that after a task ran 1 hour 10 mins, the worker uploads the bad logs to s3
   - If we manually mark the task as succeeded, the worker then uploads the good log
   - All the tasks which encounter this issue do not have the log present locally. So after an hour or so, if the task is still running, the worker uploads the error log to S3, and then we see the issue as the read pulls the log down from S3.
   
   So it seems like the current question on our mind is why the worker is uploading this bad log to S3 in the first place.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] himabindu07 commented on issue #9610: Pod logs from KubernetesPodOperator occasionally get replaced with "Task is not able to run"

Posted by GitBox <gi...@apache.org>.
himabindu07 commented on issue #9610:
URL: https://github.com/apache/airflow/issues/9610#issuecomment-788366276


   I verified using 2.0.1 and not able to reproduceble, can try it with Airflow 2.0.1
   Thanks


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] imjuanleonard commented on issue #9610: Pod logs from KubernetesPodOperator occasionally get replaced with "Task is not able to run"

Posted by GitBox <gi...@apache.org>.
imjuanleonard commented on issue #9610:
URL: https://github.com/apache/airflow/issues/9610#issuecomment-730883386


   Any update with this? We have a usecase where restart is not the case


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] stevenwoods commented on issue #9610: Pod logs from KubernetesPodOperator occasionally get replaced with "Task is not able to run"

Posted by GitBox <gi...@apache.org>.
stevenwoods commented on issue #9610:
URL: https://github.com/apache/airflow/issues/9610#issuecomment-810505695


   I am not using kubernetes and I am using ` v2.0.1`, I get this message when I have long running tasks (what seems like over 10 hours).
   
   ```
   *** Reading remote log from s3://***/2021-03-18T19:00:00+00:00/1.log.
   [2021-03-27 10:58:40,962] {taskinstance.py:845} INFO - Dependencies not met for <TaskInstance: dump_restore_tables 2021-03-18T19:00:00+00:00 [running]>, dependency 'Task Instance State' FAILED: Task is in the 'running' state which is not a valid state for execution. The task must be cleared in order to be run.
   [2021-03-27 10:58:40,962] {taskinstance.py:845} INFO - Dependencies not met for <TaskInstance: dump_restore_tables 2021-03-18T19:00:00+00:00 [running]>, dependency 'Task Instance Not Running' FAILED: Task is in the running state
   [2021-03-27 10:58:40,963] {local_task_job.py:93} INFO - Task is not able to be run
   
   [2021-03-27 10:58:40,962] {taskinstance.py:845} INFO - Dependencies not met for <TaskInstance: dump_restore_tables 2021-03-18T19:00:00+00:00 [running]>, dependency 'Task Instance State' FAILED: Task is in the 'running' state which is not a valid state for execution. The task must be cleared in order to be run.
   [2021-03-27 10:58:40,962] {taskinstance.py:845} INFO - Dependencies not met for <TaskInstance: dump_restore_tables 2021-03-18T19:00:00+00:00 [running]>, dependency 'Task Instance Not Running' FAILED: Task is in the running state
   [2021-03-27 10:58:40,963] {local_task_job.py:93} INFO - Task is not able to be run
   [2021-03-27 11:02:48,909] {bash.py:16} INFO - SET
   ```
   
   Meanwhile I can tell on the instance and the database it is still running in the background.
   
   At 11:02 is when the task started logging.
   
   Here is the end of the log file:
   ```
   [2021-03-27 15:47:12,873] {bash.py:18} INFO - returncode: 0
   [2021-03-27 15:47:12,953] {python.py:118} INFO - Done. Returned value was: None
   [2021-03-27 15:47:12,967] {taskinstance.py:1166} INFO - Marking task as SUCCESS. dag_id=* task_id=dump_restore_tables, execution_date=20210318T190000, start_date=20210327T085655, end_date=20210327T194712
   [2021-03-27 15:47:13,486] {local_task_job.py:188} WARNING - State of this instance has been externally set to success. Terminating instance.
   [2021-03-27 15:47:13,500] {process_utils.py:100} INFO - Sending Signals.SIGTERM to GPID 7599
   [2021-03-27 15:47:13,500] {taskinstance.py:1239} ERROR - Received SIGTERM. Terminating subprocesses.
   [2021-03-27 15:47:13,712] {process_utils.py:66} INFO - Process psutil.Process(pid=7599, status='terminated') (7599) terminated with exit code 1
   ```
   
   This happens on every DAG run of tasks that are long running.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dimberman edited a comment on issue #9610: Pod logs from KubernetesPodOperator occasionally get replaced with "Task is not able to run"

Posted by GitBox <gi...@apache.org>.
dimberman edited a comment on issue #9610:
URL: https://github.com/apache/airflow/issues/9610#issuecomment-689981527


   Hi @raj-manvar Thank you for updating us! Yes this is similar to what we've been seeing. We have both seen that this is tied to the celery  visibility_timeout and that increasing the timeout significantly seems to make this happen after an hour.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on issue #9610: Pod logs from KubernetesPodOperator occasionally get replaced with "Task is not able to run"

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #9610:
URL: https://github.com/apache/airflow/issues/9610#issuecomment-1027411216


   This issue has been automatically marked as stale because it has been open for 30 days with no response from the author. It will be closed in next 7 days if no further activity occurs from the issue author.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jedcunningham commented on issue #9610: Pod logs from KubernetesPodOperator occasionally get replaced with "Task is not able to run"

Posted by GitBox <gi...@apache.org>.
jedcunningham commented on issue #9610:
URL: https://github.com/apache/airflow/issues/9610#issuecomment-958366896


   @stevenwoods, can you try again with 2.1.3 or later? I suspect what you are running into was fixed with #16289.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dennyac commented on issue #9610: Pod logs from KubernetesPodOperator occasionally get replaced with "Task is not able to run"

Posted by GitBox <gi...@apache.org>.
dennyac commented on issue #9610:
URL: https://github.com/apache/airflow/issues/9610#issuecomment-706425932


   Upvoted that ticket. I understand that this is tied to a celery issue about repeated enqueueing. But there could be valid scenarios where the same job is enqueued more than once. 
   
   I'm concerned how airflow's handling of this scenario as well. Especially about not honoring execution task timeout, determining which job to use to determine task state and logs. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] himabindu07 edited a comment on issue #9610: Pod logs from KubernetesPodOperator occasionally get replaced with "Task is not able to run"

Posted by GitBox <gi...@apache.org>.
himabindu07 edited a comment on issue #9610:
URL: https://github.com/apache/airflow/issues/9610#issuecomment-788366276


   I verified using 2.0.1 and not able to reproduceble, can you try it with Airflow 2.0.1
   Thanks


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] chrismclennon commented on issue #9610: Pod logs from KubernetesPodOperator occasionally get replaced with "Task is not able to run"

Posted by GitBox <gi...@apache.org>.
chrismclennon commented on issue #9610:
URL: https://github.com/apache/airflow/issues/9610#issuecomment-687219067


   Hey @dimberman. This still happens when we are running a single worker, so I don't believe it's a race condition between workers. I don't notice any dependency failure in the completed logs loaded in S3. 
   
   Some other interesting observations we've made:
   
   - The logs in s3 has the "Task is not able to run" message
   - In reproducing the issue, we saw that after a task ran 1 hour 10 mins, the worker uploads the bad logs to s3
   - If we manually mark the task as succeeded, the worker then uploads the good log
   - All the tasks which encounter this issue do not have the log present locally. So after an hour or so, if the task is still running, the worker uploads the error log to S3, and then we see the issue as the read pulls the log down from S3.
   
   So it seems like the current question on our mind is why the worker is uploading this bad log to S3 in the first place.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Pseverin commented on issue #9610: Pod logs from KubernetesPodOperator occasionally get replaced with "Task is not able to run"

Posted by GitBox <gi...@apache.org>.
Pseverin commented on issue #9610:
URL: https://github.com/apache/airflow/issues/9610#issuecomment-663404625


   I'm receiving the same message, when I'm running **long** task (duration more than 1 day) with **KubernetesPodOperator**.
   
   ```{taskinstance.py:624} INFO - Dependencies not met for <TaskInstance: **** 2020-07-23T11:23:35+00:00 [running]>, dependency 'Task Instance State' FAILED: Task is in the 'running' state which is not a valid state for execution. The task must be cleared in order to be run.```
   
   Task finishes successfully, but Airflow status is marked as FAILED and downstream tasks are not run


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mikys-next-insurance commented on issue #9610: Pod logs from KubernetesPodOperator occasionally get replaced with "Task is not able to run"

Posted by GitBox <gi...@apache.org>.
mikys-next-insurance commented on issue #9610:
URL: https://github.com/apache/airflow/issues/9610#issuecomment-760081014


   > The current work-around we have is a view plugin which fetches logs from the workers instead of S3.
   
   @raj-manvar , Can you please share this view plugin with us?
   Thanks


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dennyac edited a comment on issue #9610: Pod logs from KubernetesPodOperator occasionally get replaced with "Task is not able to run"

Posted by GitBox <gi...@apache.org>.
dennyac edited a comment on issue #9610:
URL: https://github.com/apache/airflow/issues/9610#issuecomment-706425932


   Upvoted that ticket. I understand that this is tied to a celery issue about repeated enqueueing. But there could be valid scenarios where the same job is enqueued more than once. 
   
   I'm concerned about how airflow handles this scenario as well. Especially about not honoring task execution timeout, determining which job to use to determine task state and logs.
   
   Would be good to have a solution for non Kubernetes deployments as well, as this issue is more generic.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on issue #9610: Pod logs from KubernetesPodOperator occasionally get replaced with "Task is not able to run"

Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #9610:
URL: https://github.com/apache/airflow/issues/9610#issuecomment-810598589


   @auvipy Any thoughts here?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dimberman commented on issue #9610: Pod logs from KubernetesPodOperator occasionally get replaced with "Task is not able to run"

Posted by GitBox <gi...@apache.org>.
dimberman commented on issue #9610:
URL: https://github.com/apache/airflow/issues/9610#issuecomment-686618263


   @ashb @kaxil have you seen anything like this for any other super long running tasks? Is there a timeout on tasks that I should be aware of?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dimberman commented on issue #9610: Pod logs from KubernetesPodOperator occasionally get replaced with "Task is not able to run"

Posted by GitBox <gi...@apache.org>.
dimberman commented on issue #9610:
URL: https://github.com/apache/airflow/issues/9610#issuecomment-686624737


   @chrismclennon @Pseverin just to check: Does the task in the airflow UI still show up as running, but the logs get funky? Like they're not put into "up_for_retry" or something?  Also do all older logs disappear in s3?
   
   @ashb what's strange here is that these logs would suggest that airflow is trying to restart the task while it is still in the running state. Is there a situation where the SchedulerJob would die because of an OOM or something causing the task to go back in the queue?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on issue #9610: Pod logs from KubernetesPodOperator occasionally get replaced with "Task is not able to run"

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #9610:
URL: https://github.com/apache/airflow/issues/9610#issuecomment-1059856952


   This issue has been automatically marked as stale because it has been open for 30 days with no response from the author. It will be closed in next 7 days if no further activity occurs from the issue author.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] closed issue #9610: Pod logs from KubernetesPodOperator occasionally get replaced with "Task is not able to run"

Posted by GitBox <gi...@apache.org>.
github-actions[bot] closed issue #9610:
URL: https://github.com/apache/airflow/issues/9610


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dimberman commented on issue #9610: Pod logs from KubernetesPodOperator occasionally get replaced with "Task is not able to run"

Posted by GitBox <gi...@apache.org>.
dimberman commented on issue #9610:
URL: https://github.com/apache/airflow/issues/9610#issuecomment-706395349


   Hi @dennyac this is tied to a Celery issue https://github.com/celery/celery/issues/6229


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jedcunningham commented on issue #9610: Pod logs from KubernetesPodOperator occasionally get replaced with "Task is not able to run"

Posted by GitBox <gi...@apache.org>.
jedcunningham commented on issue #9610:
URL: https://github.com/apache/airflow/issues/9610#issuecomment-958366896


   @stevenwoods, can you try again with 2.1.3 or later? I suspect what you are running into was fixed with #16289.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] chrismclennon commented on issue #9610: Pod logs from KubernetesPodOperator occasionally get replaced with "Task is not able to run"

Posted by GitBox <gi...@apache.org>.
chrismclennon commented on issue #9610:
URL: https://github.com/apache/airflow/issues/9610#issuecomment-686665099


   It might be noteworthy to add that I'm running all airflow components on a Kubernetes cluster. To allow communication with port 8793, I've deployed the workers as a StatefulSet. This is what the header of a normal running log looks like:
   
   ```
   *** Log file does not exist: /var/local/airflow/logs/****/****/2020-09-03T16:00:00+00:00/1.log
   *** Fetching from: http://airflow-worker-1.airflow-worker.airflow-core--prod.svc.cluster.local:8793/log/***/***/2020-09-03T16:00:00+00:00/1.log
   
   
   [2020-09-03 18:12:28,965] {taskinstance.py:669} INFO - Dependencies all met for <TaskInstance: ***.*** 2020-09-03T16:00:00+00:00 [queued]>
   [2020-09-03 18:12:28,981] {taskinstance.py:669} INFO - Dependencies all met for <TaskInstance: ***.*** 2020-09-03T16:00:00+00:00 [queued]>
   [2020-09-03 18:12:28,981] {taskinstance.py:879} INFO - 
   --------------------------------------------------------------------------------
   [2020-09-03 18:12:28,983] {taskinstance.py:880} INFO - Starting attempt 1 of 1
   [2020-09-03 18:12:28,983] {taskinstance.py:881} INFO - 
   --------------------------------------------------------------------------------
   
   ... actual runtime logs are populated from here ...
   ```
   
   The workers aren't restarting when we see this issue since we make use of CeleryExecutor, so I don't believe the worker deployment itself should be the culprit here.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dimberman commented on issue #9610: Pod logs from KubernetesPodOperator occasionally get replaced with "Task is not able to run"

Posted by GitBox <gi...@apache.org>.
dimberman commented on issue #9610:
URL: https://github.com/apache/airflow/issues/9610#issuecomment-686670408


   @chrismclennon does this happen when you are only running a single worker? I'd be interested if perhaps a second worker is pulling the task while the first worker is working on it, which would create a race condition. Also you mentioned that all logs appear when the task completes, are any of the dependency failures in the logs?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] zikun commented on issue #9610: Pod logs from KubernetesPodOperator occasionally get replaced with "Task is not able to run"

Posted by GitBox <gi...@apache.org>.
zikun commented on issue #9610:
URL: https://github.com/apache/airflow/issues/9610#issuecomment-655617057


   Similar issue in #9626


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] chrismclennon commented on issue #9610: Pod logs from KubernetesPodOperator occasionally get replaced with "Task is not able to run"

Posted by GitBox <gi...@apache.org>.
chrismclennon commented on issue #9610:
URL: https://github.com/apache/airflow/issues/9610#issuecomment-685177350


   Hey @dimberman.
   
   1. We've observed this for all tasks that run longer than 1 hour.
   2. I don't believe this is tied to scheduler restarts. Whether we restart the scheduler or not, we still observe this behaviour at the 1 hour mark.
   
   We'll be making the update to 1.10.12 within the next ~month or so. If this issue is still under investigation by then I'll be sure to report back if the version update made any changes.
   
   Thanks!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dennyac edited a comment on issue #9610: Pod logs from KubernetesPodOperator occasionally get replaced with "Task is not able to run"

Posted by GitBox <gi...@apache.org>.
dennyac edited a comment on issue #9610:
URL: https://github.com/apache/airflow/issues/9610#issuecomment-706425932


   Upvoted that ticket. I understand that this is tied to a celery issue about repeated enqueueing. But there could be valid scenarios where the same job is enqueued more than once. 
   
   I'm concerned about how airflow handles this scenario as well. Especially about not honoring execution task timeout, determining which job to use to determine task state and logs.
   
   Would be good to have a solution for non Kubernetes deployments as well, as this issue is more generic.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] imjuanleonard commented on issue #9610: Pod logs from KubernetesPodOperator occasionally get replaced with "Task is not able to run"

Posted by GitBox <gi...@apache.org>.
imjuanleonard commented on issue #9610:
URL: https://github.com/apache/airflow/issues/9610#issuecomment-730889450


   > We'll be making the update to 1.10.12 within the next ~month or so. If this issue is still under investigation by then I'll be sure to report back if the version update made any ch
   
   Hey @chrismclennon , is there any update regarding this?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] rmanvar-indeed commented on issue #9610: Pod logs from KubernetesPodOperator occasionally get replaced with "Task is not able to run"

Posted by GitBox <gi...@apache.org>.
rmanvar-indeed commented on issue #9610:
URL: https://github.com/apache/airflow/issues/9610#issuecomment-698053966


   also tried with 4.4.0 and 4.4.3 but had similar issues.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] raj-manvar commented on issue #9610: Pod logs from KubernetesPodOperator occasionally get replaced with "Task is not able to run"

Posted by GitBox <gi...@apache.org>.
raj-manvar commented on issue #9610:
URL: https://github.com/apache/airflow/issues/9610#issuecomment-689941118


   Few new findings:
   * was able to reproduce this issue for task running more than a minute by setting `visibility_timeout = 60` in [celery_broker_transport_options] in airflow.cfg file. 
   * Therefore, this is happening because Celery expects the task to complete within an hour and if not assigns another worker for the task, during this transition, worker uploads the logs with "Task is not able to run" to S3
   * Can see another worker getting same task from logs `Received task: airflow.executors.celery_executor.execute_command[b40cacbb-9dd3-4681-8454-0e1df2dbc910]` with same id seconding that Celery is assigning this task to another worker.
   
   * Modifying "visibility_timeout = 86400 # 1day" in airflow.cfg doesn't resolve this issue and logs in UI are corrupted after an hour
   * Even tried "visibility_timeout = 7200 # 2 hours" in airflow.cfg but can still see this issue after an hour. 
   * Seems the issue is similar to https://github.com/celery/celery/issues/5935, but according to this it should be resolved in Celery version 4.4.5 but, we still see the same issue even though Airflow 1.10.10 uses Celery version 4.4.6 
   
   ( CC: @chrismclennon @dimberman  ) 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dimberman edited a comment on issue #9610: Pod logs from KubernetesPodOperator occasionally get replaced with "Task is not able to run"

Posted by GitBox <gi...@apache.org>.
dimberman edited a comment on issue #9610:
URL: https://github.com/apache/airflow/issues/9610#issuecomment-689984085


   Another ticket tied to this same bug https://github.com/celery/celery/issues/6229. This person also attempted 4.4.6.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] raj-manvar commented on issue #9610: Pod logs from KubernetesPodOperator occasionally get replaced with "Task is not able to run"

Posted by GitBox <gi...@apache.org>.
raj-manvar commented on issue #9610:
URL: https://github.com/apache/airflow/issues/9610#issuecomment-691104519






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] chrismclennon edited a comment on issue #9610: Pod logs from KubernetesPodOperator occasionally get replaced with "Task is not able to run"

Posted by GitBox <gi...@apache.org>.
chrismclennon edited a comment on issue #9610:
URL: https://github.com/apache/airflow/issues/9610#issuecomment-686627880


   Hey @dimberman. The Airflow task does show as running -- it's not in any retry or error state. Older logs are not disappearing, as I'm able to pull logs from start of year. We don't have any sort of object deletion in our S3 bucket either. When the job does eventually complete, this log is replaced with the actual, correct log uploaded to S3. This only happens for tasks that are in the running state. Thank you!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dimberman commented on issue #9610: Pod logs from KubernetesPodOperator occasionally get replaced with "Task is not able to run"

Posted by GitBox <gi...@apache.org>.
dimberman commented on issue #9610:
URL: https://github.com/apache/airflow/issues/9610#issuecomment-809763015


   @Minyus @raj-manvar have you tried using the cncf.kubernetes backport provider package? This _might_ be fixed in one of the newer releases.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dimberman commented on issue #9610: Pod logs from KubernetesPodOperator occasionally get replaced with "Task is not able to run"

Posted by GitBox <gi...@apache.org>.
dimberman commented on issue #9610:
URL: https://github.com/apache/airflow/issues/9610#issuecomment-706395948


   @dennyac @rmanvar-indeed @chrismclennon Could y'all please +1 that ticket? Hopefully will catch the attention of a Celery maintainer. Note that one potential solution in 2.0 for this will be the ability to launch individual KubernetesExecutor tasks using the CeleryKubernetesExecutor.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dimberman commented on issue #9610: Pod logs from KubernetesPodOperator occasionally get replaced with "Task is not able to run"

Posted by GitBox <gi...@apache.org>.
dimberman commented on issue #9610:
URL: https://github.com/apache/airflow/issues/9610#issuecomment-689982453


   Hmm That's unfortunate that the upgrade doesn't seem to have worked for us. I think we're going to need to bump that thread. @ashb is it possible that downgrading to 4.4.5 might fix this?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] chrismclennon commented on issue #9610: Pod logs from KubernetesPodOperator occasionally get replaced with "Task is not able to run"

Posted by GitBox <gi...@apache.org>.
chrismclennon commented on issue #9610:
URL: https://github.com/apache/airflow/issues/9610#issuecomment-686663054


   @dimberman Is there a good way to check whether it uses a different try_number on retrieving logs? I just pulled up an instance that is showing this error. The log looks like this:
   
   ```
   *** Reading remote log from s3://***/***/***/***/2020-09-01T09:00:00+00:00/1.log.
   [2020-09-02 10:06:39,304] {taskinstance.py:663} INFO - Dependencies not met for <TaskInstance: ****.**** 2020-09-01T09:00:00+00:00 [running]>, dependency 'Task Instance State' FAILED: Task is in the 'running' state which is not a valid state for execution. The task must be cleared in order to be run.
   [2020-09-02 10:06:39,501] {logging_mixin.py:112} INFO - [2020-09-02 10:06:39,402] {local_task_job.py:91} INFO - Task is not able to be run
   ```
   
   It looks like it's trying to read the remote log from S3 on try_number=1 (judging from `1.log`)(?), which is the current attempt number. Interestingly, I expected that since the task is in the running state, it would pull from the logs being served on port 8793.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dennyac commented on issue #9610: Pod logs from KubernetesPodOperator occasionally get replaced with "Task is not able to run"

Posted by GitBox <gi...@apache.org>.
dennyac commented on issue #9610:
URL: https://github.com/apache/airflow/issues/9610#issuecomment-707222674


   Thanks @raj-manvar. I'm actually more concerned about tasks being stuck in running state for prolonged periods (beyond the specified task execution timeout), which has caused delays and cascading failures. As of now, we just manually restart the task whenever we encounter this. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] chrismclennon edited a comment on issue #9610: Pod logs from KubernetesPodOperator occasionally get replaced with "Task is not able to run"

Posted by GitBox <gi...@apache.org>.
chrismclennon edited a comment on issue #9610:
URL: https://github.com/apache/airflow/issues/9610#issuecomment-686627880


   Hey @dimberman. The Airflow task does show as running -- it's not in any retry or error state. Older logs are not disappearing, as I'm able to pull logs from start of year. We don't have any sort of object deletion in our S3 bucket either. When the job does eventually complete, this log is replaced with the actual, correct log uploaded to S3. This only happens for tasks that are in the running state.
   
   The best we've been able to do in terms of replicating this issue is to execute long running tasks and see this behaviour pop up. We're also still investigating what the underlying issue could be.
   
   Thank you!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dimberman commented on issue #9610: Pod logs from KubernetesPodOperator occasionally get replaced with "Task is not able to run"

Posted by GitBox <gi...@apache.org>.
dimberman commented on issue #9610:
URL: https://github.com/apache/airflow/issues/9610#issuecomment-810545627


   @kaxil @ashb so it seems like the celery maintainers have just kind of given up on this one https://github.com/celery/celery/issues/5935#issuecomment-745362785
   
   Is there a solution we can come up with that will assume that this will timeout? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on issue #9610: Pod logs from KubernetesPodOperator occasionally get replaced with "Task is not able to run"

Posted by GitBox <gi...@apache.org>.
ashb commented on issue #9610:
URL: https://github.com/apache/airflow/issues/9610#issuecomment-686620881


   Possibly a long running task makes it more likely to hit a network blip speaking to the Kube API?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] raj-manvar commented on issue #9610: Pod logs from KubernetesPodOperator occasionally get replaced with "Task is not able to run"

Posted by GitBox <gi...@apache.org>.
raj-manvar commented on issue #9610:
URL: https://github.com/apache/airflow/issues/9610#issuecomment-737779024


   We are seeing this issue in airflow 1.10.12 as well. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dimberman commented on issue #9610: Pod logs from KubernetesPodOperator occasionally get replaced with "Task is not able to run"

Posted by GitBox <gi...@apache.org>.
dimberman commented on issue #9610:
URL: https://github.com/apache/airflow/issues/9610#issuecomment-686634210


   Ok yeah that's pretty consistent with the report we received from a customer (that the logs show up when the task completes, but for some reason stops showing up while the task is running).
   
   @ashb could this have something to do with how we retrieve logs? Like maybe airflow is attempting to do a retry while the original task is running and the task logs are picking that up?
   
   @chrismclennon are you seeing a different try_number when it fails to retrieve logs?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] auvipy commented on issue #9610: Pod logs from KubernetesPodOperator occasionally get replaced with "Task is not able to run"

Posted by GitBox <gi...@apache.org>.
auvipy commented on issue #9610:
URL: https://github.com/apache/airflow/issues/9610#issuecomment-1027545155


   some redis issues were fixed in latest release of kombu & celery. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] raj-manvar commented on issue #9610: Pod logs from KubernetesPodOperator occasionally get replaced with "Task is not able to run"

Posted by GitBox <gi...@apache.org>.
raj-manvar commented on issue #9610:
URL: https://github.com/apache/airflow/issues/9610#issuecomment-691104519


   I tried with Celery version 4.4.5 as well but had the same issue.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dennyac commented on issue #9610: Pod logs from KubernetesPodOperator occasionally get replaced with "Task is not able to run"

Posted by GitBox <gi...@apache.org>.
dennyac commented on issue #9610:
URL: https://github.com/apache/airflow/issues/9610#issuecomment-706394744


   We're experiencing the same issue across Operators/Sensors (Airflow 1.10.11, CeleryExecutor with Redis backend)
   
   For the two jobs of the same task that gets enqueued an hour apart, the first job continues to run and the logs don't appear in the UI. The second job completes immediately because its dependencies (The first job enqueued which is still running is a dependency) are not met and you see "Task is not able to be run" in the UI logs. 
   
   If the first job fails or completes (success/fails), the task status gets updated accordingly and the logs will then be added to Airflow UI.
   
   If the first job doesn't complete (noticed cases where the job just hangs and not sure how worker restarts impact this), the task attempt will remain in the running state. In this scenario, **task execution timeout isn't honored**, so the task can run for a really long time.
   
   Unknowns - 
   - Why is job being enqueued twice?
   - Why airflow isn't honoring task execution timeout in this scenario?
   
   The latter unknown is causing issues for us as tasks end up running for hours, and we have to manually intervene and restart the task.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] Minyus commented on issue #9610: Pod logs from KubernetesPodOperator occasionally get replaced with "Task is not able to run"

Posted by GitBox <gi...@apache.org>.
Minyus commented on issue #9610:
URL: https://github.com/apache/airflow/issues/9610#issuecomment-790415553


   Airflow 1.10.14 with local logging and Celery 4.4.7 reproduced the same error.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on issue #9610: Pod logs from KubernetesPodOperator occasionally get replaced with "Task is not able to run"

Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #9610:
URL: https://github.com/apache/airflow/issues/9610#issuecomment-809770062


   Let us know if you can reproduce this with Airflow >= 2.0.1 and we can re-open this


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] raj-manvar commented on issue #9610: Pod logs from KubernetesPodOperator occasionally get replaced with "Task is not able to run"

Posted by GitBox <gi...@apache.org>.
raj-manvar commented on issue #9610:
URL: https://github.com/apache/airflow/issues/9610#issuecomment-762784518


   Sure! added a repo with code at https://github.com/raj-manvar/airflow-worker-logs-plugin.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dimberman commented on issue #9610: Pod logs from KubernetesPodOperator occasionally get replaced with "Task is not able to run"

Posted by GitBox <gi...@apache.org>.
dimberman commented on issue #9610:
URL: https://github.com/apache/airflow/issues/9610#issuecomment-689981527


   Hi @raj-manvar Thank you for updating us! Yes this is similar to what we've been seeing. We have both seen that this is tied to the celery  visibility_timeout and that increasing the timeout significantly seems to make this happen after an hour,
   
   According to the thread, this might be solved by upgrading celery version https://github.com/celery/celery/issues/5935#issuecomment-641501842. @ashb would we need to tie this fix to 1.10.13 to upgrade celery?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dimberman commented on issue #9610: Pod logs from KubernetesPodOperator occasionally get replaced with "Task is not able to run"

Posted by GitBox <gi...@apache.org>.
dimberman commented on issue #9610:
URL: https://github.com/apache/airflow/issues/9610#issuecomment-683831389


   Hi @chrismclennon,
   
   Two questions about this: 1. Are these happening to very long-running tasks? 2. Are these tied to scheduler restarts? It's possible this might be fixed in 1.10.12 as we added some fixes for long running tasks in the k8spodoperator


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #9610: Pod logs from KubernetesPodOperator occasionally get replaced with "Task is not able to run"

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #9610:
URL: https://github.com/apache/airflow/issues/9610#issuecomment-652435960


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on issue #9610: Pod logs from KubernetesPodOperator occasionally get replaced with "Task is not able to run"

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #9610:
URL: https://github.com/apache/airflow/issues/9610#issuecomment-1065989000


   This issue has been closed because it has not received response from the issue author.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] raj-manvar commented on issue #9610: Pod logs from KubernetesPodOperator occasionally get replaced with "Task is not able to run"

Posted by GitBox <gi...@apache.org>.
raj-manvar commented on issue #9610:
URL: https://github.com/apache/airflow/issues/9610#issuecomment-691104519






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] chrismclennon commented on issue #9610: Pod logs from KubernetesPodOperator occasionally get replaced with "Task is not able to run"

Posted by GitBox <gi...@apache.org>.
chrismclennon commented on issue #9610:
URL: https://github.com/apache/airflow/issues/9610#issuecomment-686627880


   Hey @dimberman. The Airflow task does show as running -- it's not in any retry or error state. Older logs are not disappearing, as I'm able to pull logs from start of year. We don't have any sort of object deletion in our S3 bucket either. When the job does eventually complete, this log is replaced with the actual, correct log uploaded to S3. Thank you!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dimberman commented on issue #9610: Pod logs from KubernetesPodOperator occasionally get replaced with "Task is not able to run"

Posted by GitBox <gi...@apache.org>.
dimberman commented on issue #9610:
URL: https://github.com/apache/airflow/issues/9610#issuecomment-689984085


   Another ticket tied to this same bug https://github.com/celery/celery/issues/6229


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] raj-manvar commented on issue #9610: Pod logs from KubernetesPodOperator occasionally get replaced with "Task is not able to run"

Posted by GitBox <gi...@apache.org>.
raj-manvar commented on issue #9610:
URL: https://github.com/apache/airflow/issues/9610#issuecomment-706471915


   The current work-around we have is a view plugin which fetches logs from the workers instead of S3. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] imjuanleonard removed a comment on issue #9610: Pod logs from KubernetesPodOperator occasionally get replaced with "Task is not able to run"

Posted by GitBox <gi...@apache.org>.
imjuanleonard removed a comment on issue #9610:
URL: https://github.com/apache/airflow/issues/9610#issuecomment-730883386


   Any update with this? We have a usecase where restart is not the case


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil closed issue #9610: Pod logs from KubernetesPodOperator occasionally get replaced with "Task is not able to run"

Posted by GitBox <gi...@apache.org>.
kaxil closed issue #9610:
URL: https://github.com/apache/airflow/issues/9610


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org