You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Kaxil Naik (JIRA)" <ji...@apache.org> on 2019/01/19 16:09:00 UTC

[jira] [Updated] (AIRFLOW-3378) KubernetesPodOperator does not delete on timeout failure

     [ https://issues.apache.org/jira/browse/AIRFLOW-3378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kaxil Naik updated AIRFLOW-3378:
--------------------------------
    Fix Version/s:     (was: 2.0.0)
                   1.10.2

> KubernetesPodOperator does not delete on timeout failure
> --------------------------------------------------------
>
>                 Key: AIRFLOW-3378
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-3378
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: kubernetes
>    Affects Versions: 1.10.1
>            Reporter: Victor
>            Assignee: Victor
>            Priority: Major
>             Fix For: 1.10.2
>
>
> I am runnig airflow 1.10.1rc2 on kubernetes with the LocalExecutior and KubernetesPodOperator to run pods.
>  
> When a failure happens because the pod can't be created (see logs below), the whole operator fails and if is_delete_operator_pod is set to True, the pod is NOT deleted as expected.
>  
> Logs:
> {noformat}
> *** Reading local file: /airflow/logs/demo/datapipe/2018-11-21T08:29:57.206456+00:00/1.log
> [2018-11-21 08:30:02,027] {models.py:1361} INFO - Dependencies all met for <TaskInstance: demo.datapipe 2018-11-21T08:29:57.206456+00:00 [queued]>
> [2018-11-21 08:30:02,040] {models.py:1361} INFO - Dependencies all met for <TaskInstance: demo.datapipe 2018-11-21T08:29:57.206456+00:00 [queued]>
> [2018-11-21 08:30:02,041] {models.py:1573} INFO -
> --------------------------------------------------------------------------------
> Starting attempt 1 of 1
> --------------------------------------------------------------------------------
> [2018-11-21 08:30:02,066] {models.py:1595} INFO - Executing <Task(KubernetesPodOperator): datapipe> on 2018-11-21T08:29:57.206456+00:00
> [2018-11-21 08:30:02,067] {base_task_runner.py:118} INFO - Running: ['bash', '-c', 'airflow run demo datapipe 2018-11-21T08:29:57.206456+00:00 --job_id 4 --raw -sd /usr/local/airflow/dags_volume/..data/demo.py --cfg_path /tmp/tmp5qa5qyhs']
> [2018-11-21 08:30:03,065] {base_task_runner.py:101} INFO - Job 4: Subtask datapipe [2018-11-21 08:30:03,065] {settings.py:174} INFO - setting.configure_orm(): Using pool settings. pool_size=5, pool_recycle=1800
> [2018-11-21 08:30:03,734] {base_task_runner.py:101} INFO - Job 4: Subtask datapipe [2018-11-21 08:30:03,732] {__init__.py:51} INFO - Using executor LocalExecutor
> [2018-11-21 08:30:04,038] {base_task_runner.py:101} INFO - Job 4: Subtask datapipe [2018-11-21 08:30:04,038] {models.py:271} INFO - Filling up the DagBag from /usr/local/airflow/dags_volume/..data/demo.py
> [2018-11-21 08:30:04,590] {base_task_runner.py:101} INFO - Job 4: Subtask datapipe [2018-11-21 08:30:04,590] {cli.py:484} INFO - Running <TaskInstance: demo.datapipe 2018-11-21T08:29:57.206456+00:00 [running]> on host infra-airflow-6d78c56489-r9trl
> [2018-11-21 08:30:04,684] {logging_mixin.py:95} INFO - [2018-11-21 08:30:04,676] {pod_launcher.py:121} INFO - Event: datapipe-a5591f0c had an event of type Pending
> [2018-11-21 08:30:05,698] {logging_mixin.py:95} INFO - [2018-11-21 08:30:05,698] {pod_launcher.py:121} INFO - Event: datapipe-a5591f0c had an event of type Pending
> [2018-11-21 08:30:06,706] {logging_mixin.py:95} INFO - [2018-11-21 08:30:06,706] {pod_launcher.py:121} INFO - Event: datapipe-a5591f0c had an event of type Pending
> [2018-11-21 08:30:07,715] {logging_mixin.py:95} INFO - [2018-11-21 08:30:07,714] {pod_launcher.py:121} INFO - Event: datapipe-a5591f0c had an event of type Pending
> [2018-11-21 08:30:08,723] {logging_mixin.py:95} INFO - [2018-11-21 08:30:08,723] {pod_launcher.py:121} INFO - Event: datapipe-a5591f0c had an event of type Pending
> [2018-11-21 08:30:09,730] {logging_mixin.py:95} INFO - [2018-11-21 08:30:09,730] {pod_launcher.py:121} INFO - Event: datapipe-a5591f0c had an event of type Pending
> [2018-11-21 08:30:10,736] {logging_mixin.py:95} INFO - [2018-11-21 08:30:10,736] {pod_launcher.py:121} INFO - Event: datapipe-a5591f0c had an event of type Pending
> [2018-11-21 08:30:11,742] {logging_mixin.py:95} INFO - [2018-11-21 08:30:11,742] {pod_launcher.py:121} INFO - Event: datapipe-a5591f0c had an event of type Pending
> [2018-11-21 08:30:12,748] {logging_mixin.py:95} INFO - [2018-11-21 08:30:12,748] {pod_launcher.py:121} INFO - Event: datapipe-a5591f0c had an event of type Pending
> [2018-11-21 08:30:13,754] {logging_mixin.py:95} INFO - [2018-11-21 08:30:13,754] {pod_launcher.py:121} INFO - Event: datapipe-a5591f0c had an event of type Pending
> [2018-11-21 08:30:14,764] {logging_mixin.py:95} INFO - [2018-11-21 08:30:14,764] {pod_launcher.py:121} INFO - Event: datapipe-a5591f0c had an event of type Pending
> [2018-11-21 08:30:15,769] {logging_mixin.py:95} INFO - [2018-11-21 08:30:15,769] {pod_launcher.py:121} INFO - Event: datapipe-a5591f0c had an event of type Pending
> [2018-11-21 08:30:16,777] {logging_mixin.py:95} INFO - [2018-11-21 08:30:16,777] {pod_launcher.py:121} INFO - Event: datapipe-a5591f0c had an event of type Pending
> [2018-11-21 08:30:17,783] {logging_mixin.py:95} INFO - [2018-11-21 08:30:17,783] {pod_launcher.py:121} INFO - Event: datapipe-a5591f0c had an event of type Pending
> [2018-11-21 08:30:18,790] {logging_mixin.py:95} INFO - [2018-11-21 08:30:18,790] {pod_launcher.py:121} INFO - Event: datapipe-a5591f0c had an event of type Pending
> [2018-11-21 08:30:19,796] {logging_mixin.py:95} INFO - [2018-11-21 08:30:19,796] {pod_launcher.py:121} INFO - Event: datapipe-a5591f0c had an event of type Pending
> [2018-11-21 08:30:20,802] {logging_mixin.py:95} INFO - [2018-11-21 08:30:20,802] {pod_launcher.py:121} INFO - Event: datapipe-a5591f0c had an event of type Pending
> [2018-11-21 08:30:21,808] {logging_mixin.py:95} INFO - [2018-11-21 08:30:21,808] {pod_launcher.py:121} INFO - Event: datapipe-a5591f0c had an event of type Pending
> [2018-11-21 08:30:22,815] {logging_mixin.py:95} INFO - [2018-11-21 08:30:22,815] {pod_launcher.py:121} INFO - Event: datapipe-a5591f0c had an event of type Pending
> [2018-11-21 08:30:23,822] {logging_mixin.py:95} INFO - [2018-11-21 08:30:23,822] {pod_launcher.py:121} INFO - Event: datapipe-a5591f0c had an event of type Pending
> [2018-11-21 08:30:24,833] {logging_mixin.py:95} INFO - [2018-11-21 08:30:24,833] {pod_launcher.py:121} INFO - Event: datapipe-a5591f0c had an event of type Pending
> [2018-11-21 08:30:24,841] {models.py:1760} ERROR - Pod Launching failed: Pod took too long to start
> Traceback (most recent call last):
>   File "/usr/local/lib/python3.6/site-packages/airflow/contrib/operators/kubernetes_pod_operator.py", line 123, in execute
>     get_logs=self.get_logs)
>   File "/usr/local/lib/python3.6/site-packages/airflow/contrib/kubernetes/pod_launcher.py", line 86, in run_pod
>     raise AirflowException("Pod took too long to start")
> airflow.exceptions.AirflowException: Pod took too long to start
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File "/usr/local/lib/python3.6/site-packages/airflow/models.py", line 1659, in _run_raw_task
>     result = task_copy.execute(context=context)
>   File "/usr/local/lib/python3.6/site-packages/airflow/contrib/operators/kubernetes_pod_operator.py", line 135, in execute
>     raise AirflowException('Pod Launching failed: {error}'.format(error=ex))
> airflow.exceptions.AirflowException: Pod Launching failed: Pod took too long to start
> [2018-11-21 08:30:24,844] {models.py:1791} INFO - Marking task as FAILED.
> [2018-11-21 08:30:25,001] {logging_mixin.py:95} INFO - [2018-11-21 08:30:25,001] {email.py:127} INFO - Sent an alert email to ['victor.noel@xxx.com']
> [2018-11-21 08:30:26,646] {base_task_runner.py:101} INFO - Job 4: Subtask datapipe Traceback (most recent call last):
> [2018-11-21 08:30:26,646] {base_task_runner.py:101} INFO - Job 4: Subtask datapipe   File "/usr/local/lib/python3.6/site-packages/airflow/contrib/operators/kubernetes_pod_operator.py", line 123, in execute
> [2018-11-21 08:30:26,646] {base_task_runner.py:101} INFO - Job 4: Subtask datapipe     get_logs=self.get_logs)
> [2018-11-21 08:30:26,647] {base_task_runner.py:101} INFO - Job 4: Subtask datapipe   File "/usr/local/lib/python3.6/site-packages/airflow/contrib/kubernetes/pod_launcher.py", line 86, in run_pod
> [2018-11-21 08:30:26,647] {base_task_runner.py:101} INFO - Job 4: Subtask datapipe     raise AirflowException("Pod took too long to start")
> [2018-11-21 08:30:26,647] {base_task_runner.py:101} INFO - Job 4: Subtask datapipe airflow.exceptions.AirflowException: Pod took too long to start
> [2018-11-21 08:30:26,647] {base_task_runner.py:101} INFO - Job 4: Subtask datapipe
> [2018-11-21 08:30:26,647] {base_task_runner.py:101} INFO - Job 4: Subtask datapipe During handling of the above exception, another exception occurred:
> [2018-11-21 08:30:26,647] {base_task_runner.py:101} INFO - Job 4: Subtask datapipe
> [2018-11-21 08:30:26,647] {base_task_runner.py:101} INFO - Job 4: Subtask datapipe Traceback (most recent call last):
> [2018-11-21 08:30:26,647] {base_task_runner.py:101} INFO - Job 4: Subtask datapipe   File "/usr/local/bin/airflow", line 32, in <module>
> [2018-11-21 08:30:26,647] {base_task_runner.py:101} INFO - Job 4: Subtask datapipe     args.func(args)
> [2018-11-21 08:30:26,647] {base_task_runner.py:101} INFO - Job 4: Subtask datapipe   File "/usr/local/lib/python3.6/site-packages/airflow/utils/cli.py", line 74, in wrapper
> [2018-11-21 08:30:26,647] {base_task_runner.py:101} INFO - Job 4: Subtask datapipe     return f(*args, **kwargs)
> [2018-11-21 08:30:26,647] {base_task_runner.py:101} INFO - Job 4: Subtask datapipe   File "/usr/local/lib/python3.6/site-packages/airflow/bin/cli.py", line 490, in run
> [2018-11-21 08:30:26,647] {base_task_runner.py:101} INFO - Job 4: Subtask datapipe     _run(args, dag, ti)
> [2018-11-21 08:30:26,647] {base_task_runner.py:101} INFO - Job 4: Subtask datapipe   File "/usr/local/lib/python3.6/site-packages/airflow/bin/cli.py", line 406, in _run
> [2018-11-21 08:30:26,647] {base_task_runner.py:101} INFO - Job 4: Subtask datapipe     pool=args.pool,
> [2018-11-21 08:30:26,647] {base_task_runner.py:101} INFO - Job 4: Subtask datapipe   File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 74, in wrapper
> [2018-11-21 08:30:26,647] {base_task_runner.py:101} INFO - Job 4: Subtask datapipe     return func(*args, **kwargs)
> [2018-11-21 08:30:26,647] {base_task_runner.py:101} INFO - Job 4: Subtask datapipe   File "/usr/local/lib/python3.6/site-packages/airflow/models.py", line 1659, in _run_raw_task
> [2018-11-21 08:30:26,647] {base_task_runner.py:101} INFO - Job 4: Subtask datapipe     result = task_copy.execute(context=context)
> [2018-11-21 08:30:26,647] {base_task_runner.py:101} INFO - Job 4: Subtask datapipe   File "/usr/local/lib/python3.6/site-packages/airflow/contrib/operators/kubernetes_pod_operator.py", line 135, in execute
> [2018-11-21 08:30:26,647] {base_task_runner.py:101} INFO - Job 4: Subtask datapipe     raise AirflowException('Pod Launching failed: {error}'.format(error=ex))
> [2018-11-21 08:30:26,647] {base_task_runner.py:101} INFO - Job 4: Subtask datapipe airflow.exceptions.AirflowException: Pod Launching failed: Pod took too long to start
> [2018-11-21 08:30:27,054] {logging_mixin.py:95} INFO - [2018-11-21 08:30:27,053] {jobs.py:2627} INFO - Task exited with return code 1
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)