You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/09/15 18:32:25 UTC
[GitHub] [airflow] collinmcnulty commented on issue #16625: Task is not retried when worker pod fails to start
collinmcnulty commented on issue #16625:
URL: https://github.com/apache/airflow/issues/16625#issuecomment-920275278
I can reproduce this issue like this:
Use this dag on 2.1.1:
```
from datetime import timedelta
from kubernetes.client import models as k8s
from airflow import DAG
from airflow.operators.bash import BashOperator
from airflow.utils.dates import days_ago
with DAG(
dag_id="pending",
schedule_interval=None,
start_date=days_ago(2),
) as dag:
BashOperator(
task_id="forever_pending",
bash_command="date; sleep 30; date",
retries=3,
retry_delay=timedelta(seconds=30),
executor_config={
"pod_override": k8s.V1Pod(
spec=k8s.V1PodSpec(
containers=[
k8s.V1Container(
name="base",
volume_mounts=[
k8s.V1VolumeMount(mount_path="/foo/", name="vol")
],)],
volumes=[
k8s.V1Volume(
name="vol",
persistent_volume_claim=k8s.V1PersistentVolumeClaimVolumeSource(
claim_name="missing"
),)],)),},)
```
And here is the scheduler log from around the failure
```
[2021-09-15 17:48:56,352] {scheduler_job.py:873} WARNING - Set 1 task instances to state=failed as their associated DagRun was not in RUNNING state
2021-09-15T17:48:56.134716Z info watchFileEvents: "/etc/certs": MODIFY|ATTRIB
2021-09-15T17:48:56.134808Z info watchFileEvents: "/etc/certs/..2021_09_06_06_43_21.729675760": MODIFY|ATTRIB
[2021-09-15 17:48:47,821] {dagrun.py:429} ERROR - Marking run <DagRun pending @ 2021-09-15 17:43:28.990599+00:00: manual__2021-09-15T17:43:28.990599+00:00, externally triggered: True> failed
[2021-09-15 17:48:47,769] {scheduler_job.py:1258} ERROR - Executor reports task instance <TaskInstance: pending.forever_pending 2021-09-15 17:43:28.990599+00:00 [queued]> finished (failed) although the task says its queued. (Info: None) Was the task killed externally?
[2021-09-15 17:48:47,769] {scheduler_job.py:1265} INFO - Setting task instance <TaskInstance: pending.forever_pending 2021-09-15 17:43:28.990599+00:00 [queued]> state to failed as reported by executor
[2021-09-15 17:48:47,761] {kubernetes_executor.py:549} INFO - Changing state of (TaskInstanceKey(dag_id='pending', task_id='forever_pending', execution_date=datetime.datetime(2021, 9, 15, 17, 43, 28, 990599, tzinfo=tzlocal()), try_number=1), 'failed', 'pendingforeverpending.cc4a625ffe0d4da88709098daba98d87', 'astronomer-magnificent-aurora-4284', '1751732637') to failed
[2021-09-15 17:48:47,761] {scheduler_job.py:1229} INFO - Executor reports execution of pending.forever_pending execution_date=2021-09-15 17:43:28.990599+00:00 exited with status failed for try_number 1
[2021-09-15 17:48:47,759] {kubernetes_executor.py:372} INFO - Attempting to finish pod; pod_id: pendingforeverpending.cc4a625ffe0d4da88709098daba98d87; state: failed; annotations: {'dag_id': 'pending', 'task_id': 'forever_pending', 'execution_date': '2021-09-15T17:43:28.990599+00:00', 'try_number': '1'}
[2021-09-15 17:48:46,695] {kubernetes_executor.py:149} INFO - Event: pendingforeverpending.cc4a625ffe0d4da88709098daba98d87 had an event of type DELETED
[2021-09-15 17:48:46,695] {kubernetes_executor.py:200} INFO - Event: Failed to start pod pendingforeverpending.cc4a625ffe0d4da88709098daba98d87
[2021-09-15 17:48:46,692] {kubernetes_executor.py:149} INFO - Event: pendingforeverpending.cc4a625ffe0d4da88709098daba98d87 had an event of type MODIFIED
[2021-09-15 17:48:46,692] {kubernetes_executor.py:203} INFO - Event: pendingforeverpending.cc4a625ffe0d4da88709098daba98d87 Pending
[2021-09-15 17:48:46,676] {kubernetes_executor.py:625} ERROR - Pod "pendingforeverpending.cc4a625ffe0d4da88709098daba98d87" has been pending for longer than 300 seconds.It will be deleted and set to failed.
2021-09-15T17:47:50.966665Z info watchFileEvents: notifying
2021-09-15T17:47:47.079744Z info watchFileEvents: notifying
2021-09-15T17:47:40.966397Z info watchFileEvents: "/etc/certs": MODIFY|ATTRIB
2021-09-15T17:47:40.966527Z info watchFileEvents: "/etc/certs/..2021_09_06_06_43_21.426627327": MODIFY|ATTRIB
2021-09-15T17:47:37.079501Z info watchFileEvents: "/etc/certs": MODIFY|ATTRIB
2021-09-15T17:47:37.079624Z info watchFileEvents: "/etc/certs/..2021_09_06_06_43_21.729675760": MODIFY|ATTRIB
[2021-09-15 17:47:07,909] {scheduler_job.py:1841} INFO - Resetting orphaned tasks for active dag runs
[2021-09-15 17:47:00,347] {scheduler_job.py:1841} INFO - Resetting orphaned tasks for active dag runs
2021-09-15T17:46:35.978572Z info watchFileEvents: notifying
2021-09-15T17:46:25.978277Z info watchFileEvents: "/etc/certs": MODIFY|ATTRIB
2021-09-15T17:46:25.978421Z info watchFileEvents: "/etc/certs/..2021_09_06_06_43_21.426627327": MODIFY|ATTRIB
2021-09-15T17:46:21.074893Z info watchFileEvents: notifying
2021-09-15T17:46:11.074610Z info watchFileEvents: "/etc/certs": MODIFY|ATTRIB
2021-09-15T17:46:11.074754Z info watchFileEvents: "/etc/certs/..2021_09_06_06_43_21.729675760": MODIFY|ATTRIB
2021-09-15T17:45:11.006936Z info watchFileEvents: notifying
2021-09-15T17:45:01.006688Z info watchFileEvents: "/etc/certs": MODIFY|ATTRIB
2021-09-15T17:45:01.006777Z info watchFileEvents: "/etc/certs/..2021_09_06_06_43_21.426627327": MODIFY|ATTRIB
2021-09-15T17:45:01.006787Z info watchFileEvents: "/etc/certs/..2021_09_06_06_43_21.426627327": MODIFY|ATTRIB
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org