You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Mike Prior (Jira)" <ji...@apache.org> on 2019/12/12 21:12:00 UTC

[jira] [Issue Comment Deleted] (AIRFLOW-5456) Mark spark submit operator task as 'failed' when kubernetes pod never ran

     [ https://issues.apache.org/jira/browse/AIRFLOW-5456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mike Prior updated AIRFLOW-5456:
--------------------------------
    Comment: was deleted

(was: Hi Sebastian,

I see the same issue. To workaround it, I made the change below to 'spark_submit_hook.py'. If the spark job fails, the task will fail. I'm not sure what the additional kubernetes checking is supposed to do.

 

Mike

 

        # Check spark-submit return code. In Kubernetes mode, also check the value

        # of exit code in the log, as it may differ.

       ### if returncode or (self._is_kubernetes and self._spark_exit_code != 0):

        *if returncode != 0:*

            raise AirflowException(

                "Cannot execute: {}. Error code is: {}.".format(

                    spark_submit_cmd, returncode

                )

            )

 )

> Mark spark submit operator task as 'failed' when kubernetes pod never ran
> -------------------------------------------------------------------------
>
>                 Key: AIRFLOW-5456
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-5456
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: operators
>    Affects Versions: 1.10.0, 1.10.1, 1.10.2, 1.10.3, 1.10.4, 1.10.5
>            Reporter: Sebastian Arzt
>            Priority: Minor
>              Labels: failure-handling, operator, spark, spark-submit
>
> Currently spark submit operator task will not fail if the corresponding pod never entered phase 'Running'.
> Background: we observed spark submit operator tasks marked as "success" although the spark job was never running on kubernetes.
> Logs (truncated):
> {code:java}
> [2019-09-11 09:21:02,732] {logging_mixin.py:95} INFO - [2019-09-11 09:21:02,732] {spark_submit_hook.py:427} INFO - 2019-09-11 09:21:02 INFO  LoggingPodStatusWatcherImpl:54 - State changed, new state:
> [2019-09-11 09:21:02,732] {logging_mixin.py:95} INFO - [2019-09-11 09:21:02,732] {spark_submit_hook.py:410} INFO - Identified spark driver pod: pod-name
> [2019-09-11 09:21:02,733] {logging_mixin.py:95} INFO - [2019-09-11 09:21:02,732] {spark_submit_hook.py:427} INFO - pod name: pod-name
> [2019-09-11 09:21:02,733] {logging_mixin.py:95} INFO - [2019-09-11 09:21:02,733] {spark_submit_hook.py:427} INFO - namespace: default
> [2019-09-11 09:21:02,733] {logging_mixin.py:95} INFO - [2019-09-11 09:21:02,733] {spark_submit_hook.py:427} INFO - pod uid: 797f3157-d475-11e9-9758-1209ef52ae5e
> [2019-09-11 09:21:02,733] {logging_mixin.py:95} INFO - [2019-09-11 09:21:02,733] {spark_submit_hook.py:427} INFO - creation time: 2019-09-11T09:21:02Z
> [2019-09-11 09:21:02,733] {logging_mixin.py:95} INFO - [2019-09-11 09:21:02,733] {spark_submit_hook.py:427} INFO - service account name: account
> [2019-09-11 09:21:02,734] {logging_mixin.py:95} INFO - [2019-09-11 09:21:02,733] {spark_submit_hook.py:427} INFO - volumes: vol1, vol2
> [2019-09-11 09:21:02,734] {logging_mixin.py:95} INFO - [2019-09-11 09:21:02,734] {spark_submit_hook.py:427} INFO - node name: node name
> [2019-09-11 09:21:02,734] {logging_mixin.py:95} INFO - [2019-09-11 09:21:02,734] {spark_submit_hook.py:427} INFO - start time: 2019-09-11T09:21:02Z
> [2019-09-11 09:21:02,734] {logging_mixin.py:95} INFO - [2019-09-11 09:21:02,734] {spark_submit_hook.py:427} INFO - container images: some-image:tag
> [2019-09-11 09:21:02,734] {logging_mixin.py:95} INFO - [2019-09-11 09:21:02,734] {spark_submit_hook.py:427} INFO - phase: Pending
> [2019-09-11 09:27:56,813] {logging_mixin.py:95} INFO - [2019-09-11 09:27:56,813] {spark_submit_hook.py:427} INFO - 2019-09-11 09:27:56 INFO  LoggingPodStatusWatcherImpl:54 - Container final statuses:
> [2019-09-11 09:27:56,813] {logging_mixin.py:95} INFO - [2019-09-11 09:27:56,813] {spark_submit_hook.py:427} INFO - Container name: spark-kubernetes-driver
> [2019-09-11 09:27:56,813] {logging_mixin.py:95} INFO - [2019-09-11 09:27:56,813] {spark_submit_hook.py:427} INFO - Container state: Terminated
> {code}
> Solution: Do not mark job as 'success' if phase 'Running' was never observed in the spark-submit logs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)