You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Yegor Andreenko (Jira)" <ji...@apache.org> on 2019/11/27 17:32:00 UTC

[jira] [Created] (AIRFLOW-6092) KubernetesPodOperator exit code isn't propagated

Yegor Andreenko created AIRFLOW-6092:
----------------------------------------

             Summary: KubernetesPodOperator exit code isn't propagated
                 Key: AIRFLOW-6092
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-6092
             Project: Apache Airflow
          Issue Type: Bug
          Components: operators
    Affects Versions: 1.10.6
         Environment: custom image of airflow 1.10.6
            Reporter: Yegor Andreenko


We are using `KubernetesPodOperator` to submit spark job on driver + executor

```

KubernetesPodOperator(
 name=task_id,
 task_id=task_id,
 namespace=spark_kube_namespace,
 service_account_name=spark_kube_service_account,
 arguments=pod_arguments,
 image=<>,
 cmds="/opt/spark/bin/spark-submit",
 get_logs=True,
 dag=dag,
 image_pull_policy="Always",
 in_cluster=True,
 is_delete_operator_pod=True,
 resources=submitter_resources,
 *args,
 **kwargs
)

```

and container fails with exit code 255(by execution):

```

{pod_launcher.py:125} INFO - b'19/11/27 13:00:16 INFO LoggingPodStatusWatcherImpl: Application status for spark-04da7bc6c41d4e8eba047bfd3bf5ea47 (phase: Running)\n'
{pod_launcher.py:125} INFO - b'19/11/27 13:00:16 INFO LoggingPodStatusWatcherImpl: State changed, new state: \n'
{pod_launcher.py:125} INFO - b'\t pod name: <>-6b7cf96eacf220f0-driver\n'
{pod_launcher.py:125} INFO - b'\t namespace: <>\n'
{pod_launcher.py:125} INFO - b'\t labels: spark-app-selector -> spark-04da7bc6c41d4e8eba047bfd3bf5ea47, spark-role -> driver\n'
{pod_launcher.py:125} INFO - b'\t pod uid: c67516bd-1115-11ea-8caf-525400ee2d64\n'
{pod_launcher.py:125} INFO - b'\t creation time: 2019-11-27T12:59:40Z\n'
{pod_launcher.py:125} INFO - b'\t service account name: default\n'
{pod_launcher.py:125} INFO - b'\t volumes: spark-local-dir-1, spark-conf-volume, default-token-vjsqm\n'
{pod_launcher.py:125} INFO - b'\t node name: node-i63w6\n'
{pod_launcher.py:125} INFO - b'\t start time: 2019-11-27T12:59:41Z\n'
{pod_launcher.py:125} INFO - b'\t phase: Failed\n'
{pod_launcher.py:125} INFO - b'\t container status: \n'
{pod_launcher.py:125} INFO - b'\t\t container name: spark-kubernetes-driver\n'
{pod_launcher.py:125} INFO - b'\t\t container image: <>\n'
{pod_launcher.py:125} INFO - b'\t\t container state: terminated\n'
{pod_launcher.py:125} INFO - b'\t\t container started at: 2019-11-27T12:59:44Z\n'
{pod_launcher.py:125} INFO - b'\t\t container finished at: 2019-11-27T13:00:16Z\n'
{pod_launcher.py:125} INFO - b'\t\t exit code: 255\n'
{pod_launcher.py:125} INFO - b'\t\t termination reason: Error\n'
{pod_launcher.py:125} INFO - b'19/11/27 13:00:16 INFO LoggingPodStatusWatcherImpl: Application status for spark-04da7bc6c41d4e8eba047bfd3bf5ea47 (phase: Failed)\n'
{pod_launcher.py:125} INFO - b'19/11/27 13:00:16 INFO LoggingPodStatusWatcherImpl: Container final statuses:\n'
{pod_launcher.py:125} INFO - b'\n'
{pod_launcher.py:125} INFO - b'\n'
{pod_launcher.py:125} INFO - b'\t container name: spark-kubernetes-driver\n'
{pod_launcher.py:125} INFO - b'\t container image: <>\n'
{pod_launcher.py:125} INFO - b'\t container state: terminated\n'
{pod_launcher.py:125} INFO - b'\t container started at: 2019-11-27T12:59:44Z\n'
{pod_launcher.py:125} INFO - b'\t container finished at: 2019-11-27T13:00:16Z\n'
{pod_launcher.py:125} INFO - b'\t exit code: 255\n'
{pod_launcher.py:125} INFO - b'\t termination reason: Error\n'
{pod_launcher.py:125} INFO - b'19/11/27 13:00:16 INFO LoggingPodStatusWatcherImpl: Application <> with submission ID <>:<>-6b7cf96eacf220f0-driver finished\n'
{pod_launcher.py:125} INFO - b'19/11/27 13:00:16 INFO ShutdownHookManager: Shutdown hook called\n'
{pod_launcher.py:125} INFO - b'19/11/27 13:00:16 INFO ShutdownHookManager: Deleting directory /tmp/spark-a935e173-2f22-480f-83a5-6d845c4784f0\n'
[2019-11-27 13:00:20,869] \{{logging_mixin.py:112}} INFO - [2019-11-27 13:00:20,869] \{local_task_job.py:124} WARNING - Time since last heartbeat(0.02 s) < heartrate(5.0 s), sleeping for 4.97753 s
{pod_launcher.py:142} INFO - Event: <> had an event of type Succeeded
{pod_launcher.py:237} INFO - Event with job id <> Succeeded
{pod_launcher.py:142} INFO - Event: <> had an event of type Succeeded
{pod_launcher.py:237} INFO - Event with job id <> Succeeded
[2019-11-27 13:00:25,854] \{{logging_mixin.py:112}} INFO - [2019-11-27 13:00:25,852] \{local_task_job.py:103} INFO - Task exited with return code 0

```

but `Task exited with return code 0`(last line).

 

Expected behaviour: return code is propagated and airflow task fails.

Current behaviour: airflow task succeeded.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)