You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Sebastian Arzt (Jira)" <ji...@apache.org> on 2019/09/11 09:55:00 UTC

[jira] [Created] (AIRFLOW-5456) Mark spark submit operator task as 'failed' when kubernetes pod phase 'Running' did not occur on spark-submit logs

Sebastian Arzt created AIRFLOW-5456:
---------------------------------------

             Summary: Mark spark submit operator task as 'failed' when kubernetes pod phase 'Running' did not occur on spark-submit logs
                 Key: AIRFLOW-5456
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-5456
             Project: Apache Airflow
          Issue Type: Bug
          Components: operators
    Affects Versions: 1.10.5, 1.10.4, 1.10.3, 1.10.2, 1.10.1, 1.10.0
            Reporter: Sebastian Arzt


Currently spark submit operator task will not fail if the corresponding pod never entered phase 'Running'.

Background: we observed spark submit operator tasks marked as "success" although the spark job was never running on kubernetes.

Logs (truncated):
{code:java}
[2019-09-11 09:21:02,732] {logging_mixin.py:95} INFO - [2019-09-11 09:21:02,732] {spark_submit_hook.py:427} INFO - 2019-09-11 09:21:02 INFO  LoggingPodStatusWatcherImpl:54 - State changed, new state:
[2019-09-11 09:21:02,732] {logging_mixin.py:95} INFO - [2019-09-11 09:21:02,732] {spark_submit_hook.py:410} INFO - Identified spark driver pod: pod-name
[2019-09-11 09:21:02,733] {logging_mixin.py:95} INFO - [2019-09-11 09:21:02,732] {spark_submit_hook.py:427} INFO - pod name: pod-name
[2019-09-11 09:21:02,733] {logging_mixin.py:95} INFO - [2019-09-11 09:21:02,733] {spark_submit_hook.py:427} INFO - namespace: default
[2019-09-11 09:21:02,733] {logging_mixin.py:95} INFO - [2019-09-11 09:21:02,733] {spark_submit_hook.py:427} INFO - pod uid: 797f3157-d475-11e9-9758-1209ef52ae5e
[2019-09-11 09:21:02,733] {logging_mixin.py:95} INFO - [2019-09-11 09:21:02,733] {spark_submit_hook.py:427} INFO - creation time: 2019-09-11T09:21:02Z
[2019-09-11 09:21:02,733] {logging_mixin.py:95} INFO - [2019-09-11 09:21:02,733] {spark_submit_hook.py:427} INFO - service account name: account
[2019-09-11 09:21:02,734] {logging_mixin.py:95} INFO - [2019-09-11 09:21:02,733] {spark_submit_hook.py:427} INFO - volumes: vol1, vol2
[2019-09-11 09:21:02,734] {logging_mixin.py:95} INFO - [2019-09-11 09:21:02,734] {spark_submit_hook.py:427} INFO - node name: node name
[2019-09-11 09:21:02,734] {logging_mixin.py:95} INFO - [2019-09-11 09:21:02,734] {spark_submit_hook.py:427} INFO - start time: 2019-09-11T09:21:02Z
[2019-09-11 09:21:02,734] {logging_mixin.py:95} INFO - [2019-09-11 09:21:02,734] {spark_submit_hook.py:427} INFO - container images: some-image:tag
[2019-09-11 09:21:02,734] {logging_mixin.py:95} INFO - [2019-09-11 09:21:02,734] {spark_submit_hook.py:427} INFO - phase: Pending
[2019-09-11 09:27:56,813] {logging_mixin.py:95} INFO - [2019-09-11 09:27:56,813] {spark_submit_hook.py:427} INFO - 2019-09-11 09:27:56 INFO  LoggingPodStatusWatcherImpl:54 - Container final statuses:
[2019-09-11 09:27:56,813] {logging_mixin.py:95} INFO - [2019-09-11 09:27:56,813] {spark_submit_hook.py:427} INFO - Container name: spark-kubernetes-driver
[2019-09-11 09:27:56,813] {logging_mixin.py:95} INFO - [2019-09-11 09:27:56,813] {spark_submit_hook.py:427} INFO - Container state: Terminated
{code}

Solution: Do not mark job as 'success' if phase 'Running' was never observed in the spark-submit logs.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)