You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/05/22 01:34:30 UTC

[GitHub] [airflow] ywan2017 opened a new issue #8963: SparkSubmitOperator could not get Exit Code after log stream interrupted by k8s old resource version exception

ywan2017 opened a new issue #8963:
URL: https://github.com/apache/airflow/issues/8963


   # SparkSubmitOperator could not get Exit Code after log stream interrupted by k8s old resource version exception
   
   ## description
   
   I use airflow to schedule spark jobs on k8s using SparkSubmitOperator.
   when spark jobs run on k8s for long time (>30mins), airflow often  mark job failed status when the job is still running even the job finish successfully.
   
   ### when it happen ,this exception often appears at the same time but not always
   ```java
   20/05/20 10:49:58 INFO TaskSetManager: Finished task 7.0 in stage 91.0 (TID 13313) in 34319 ms on 172.30.238.243 (executor 5) (10/20)
   20/05/20 10:49:59 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed (this is expected if the application is shutting down.)
   io.fabric8.kubernetes.client.KubernetesClientException: too old resource version: 140189600 (140238574)
   	at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259)
   	at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:307)
   	at okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:222)
   	at okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:101)
   	at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:262)
   	at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:201)
   	at okhttp3.RealCall$AsyncCall.execute(RealCall.java:141)
   	at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   	at java.lang.Thread.run(Thread.java:748)
   20/05/20 10:50:19 INFO TaskSetManager: Finished task 10.0 in stage 91.0 (TID 13316) in 27478 ms on 172.30.244.131 (executor 1) (11/20)
   20/05/20 10:50:19 INFO TaskSetManager: Finished task 12.0 in stage 91.0 (TID 13318) in 27052 ms on 172.30.244.131 (executor 1) (12/20)
   
   ```
   ## env
   
       uat env
       airflow version:1.10.10
       python:3.6.8
       k8s server 1.5
       k8s client ?
       spark 2.4.4-h2.7
   
   ## submit jobs by manual,get k8s logs directly
   
   ### Scenario 1: job succeed, log return from k8s
   
   ```log
   2020-05-11 11:58:21 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
   2020-05-11 11:58:27 INFO  LoggingPodStatusWatcherImpl:54 - State changed, new state:
   	 pod name: empreporting-1589212705263-driver
   	 namespace: batch-pipeline-qa
   	 labels: spark-app-selector -> spark-8f4a0340943a4df79d0e2f16eb24a751, spark-role -> driver
   	 pod uid: 3c1d4adf-b18a-4204-a95f-91a579eea5c4
   	 creation time: 2020-05-11T15:58:27Z
   	 service account name: spark
   	 volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
   	 node name: N/A
   	 start time: N/A
   	 container images: N/A
   	 phase: Pending
   	 status: []
   
   2020-05-11 11:58:27 INFO  LoggingPodStatusWatcherImpl:54 - State changed, new state:
   	 pod name: empreporting-1589212705263-driver
   	 namespace: batch-pipeline-qa
   	 labels: spark-app-selector -> spark-8f4a0340943a4df79d0e2f16eb24a751, spark-role -> driver
   	 pod uid: 3c1d4adf-b18a-4204-a95f-91a579eea5c4
   	 creation time: 2020-05-11T15:58:27Z
   	 service account name: spark
   	 volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
   	 node name: 10.74.200.157
   	 start time: N/A
   	 container images: N/A
   	 phase: Pending
   	 status: []
   
   2020-05-11 11:58:28 INFO  Client:54 - Waiting for application emp_reporting to finish...
   2020-05-11 11:58:48 INFO  LoggingPodStatusWatcherImpl:54 - State changed, new state:
   	 pod name: empreporting-1589212705263-driver
   	 namespace: batch-pipeline-qa
   	 labels: spark-app-selector -> spark-8f4a0340943a4df79d0e2f16eb24a751, spark-role -> driver
   	 pod uid: 3c1d4adf-b18a-4204-a95f-91a579eea5c4
   	 creation time: 2020-05-11T15:58:27Z
   	 service account name: spark
   	 volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
   	 node name: 10.74.200.157
   	 start time: 2020-05-11T15:58:48Z
   	 container images: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
   	 phase: Pending
   	 status: [ContainerStatus(containerID=null, image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7, imageID=, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=ContainerCreating, additionalProperties={}), additionalProperties={}), additionalProperties={})]
   
   2020-05-11 11:59:28 INFO  LoggingPodStatusWatcherImpl:54 - State changed, new state:
   	 pod name: empreporting-1589212705263-driver
   	 namespace: batch-pipeline-qa
   	 labels: spark-app-selector -> spark-8f4a0340943a4df79d0e2f16eb24a751, spark-role -> driver
   	 pod uid: 3c1d4adf-b18a-4204-a95f-91a579eea5c4
   	 creation time: 2020-05-11T15:58:27Z
   	 service account name: spark
   	 volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
   	 node name: 10.74.200.157
   	 start time: 2020-05-11T15:58:48Z
   	 container images: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
   	 phase: Running
   	 status: [ContainerStatus(containerID=containerd://71d51131984cb06468453f75cda9a3c10a78f3da336e731d11bb2cc3955de32b, image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7, imageID=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark@sha256:3682354e49a55503ef906ce8aeff8601274fa426204ec989a91a72912d31ed7e, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=true, restartCount=0, state=ContainerState(running=ContainerStateRunning(startedAt=Time(time=2020-05-11T15:59:26Z, additionalProperties={}), additionalProperties={}), terminated=null, waiting=null, additionalProperties={}), additionalProperties={})]
   
   2020-05-11 12:01:15 INFO  LoggingPodStatusWatcherImpl:54 - State changed, new state:
   	 pod name: empreporting-1589212705263-driver
   	 namespace: batch-pipeline-qa
   	 labels: spark-app-selector -> spark-8f4a0340943a4df79d0e2f16eb24a751, spark-role -> driver
   	 pod uid: 3c1d4adf-b18a-4204-a95f-91a579eea5c4
   	 creation time: 2020-05-11T15:58:27Z
   	 service account name: spark
   	 volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
   	 node name: 10.74.200.157
   	 start time: 2020-05-11T15:58:48Z
   	 container images: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
   	 phase: Succeeded
   	 status: [ContainerStatus(containerID=containerd://71d51131984cb06468453f75cda9a3c10a78f3da336e731d11bb2cc3955de32b, image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7, imageID=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark@sha256:3682354e49a55503ef906ce8aeff8601274fa426204ec989a91a72912d31ed7e, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=ContainerStateTerminated(containerID=containerd://71d51131984cb06468453f75cda9a3c10a78f3da336e731d11bb2cc3955de32b, exitCode=0, finishedAt=Time(time=2020-05-11T16:01:08Z, additionalProperties={}), message=null, reason=Completed, signal=null, startedAt=Time(time=2020-05-11T15:59:26Z, additionalProperties={}), additionalProperties={}), waiting=null, additionalProperties={}), additionalProperties={})]
   2020-05-11 12:01:15 INFO  LoggingPodStatusWatcherImpl:54 - Container final statuses:
   
   
   	 Container name: spark-kubernetes-driver
   	 Container image: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
   	 Container state: Terminated
   	 Exit code: 0
   2020-05-11 12:01:15 INFO  Client:54 - Application emp_reporting finished.
   2020-05-11 12:01:15 INFO  ShutdownHookManager:54 - Shutdown hook called
   2020-05-11 12:01:15 INFO  ShutdownHookManager:54 - Deleting directory /tmp/spark-f3b0955d-fe7c-4858-a1c6-f70597d42104
   
   ```
   
   ### Scenario 2:job failed, log return from k8s
   
   ```log
   2020-05-11 11:54:10 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
   2020-05-11 11:54:16 INFO  LoggingPodStatusWatcherImpl:54 - State changed, new state:
   	 pod name: empreporting-1589212454526-driver
   	 namespace: batch-pipeline-qa
   	 labels: spark-app-selector -> spark-a3a0efa1a3804b759eda9e9aa52629cc, spark-role -> driver
   	 pod uid: 6cc27a5d-efff-401b-a120-15e8a3ccb2f2
   	 creation time: 2020-05-11T15:54:16Z
   	 service account name: spark
   	 volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
   	 node name: N/A
   	 start time: N/A
   	 container images: N/A
   	 phase: Pending
   	 status: []
   
   2020-05-11 11:54:16 INFO  LoggingPodStatusWatcherImpl:54 - State changed, new state:
   	 pod name: empreporting-1589212454526-driver
   	 namespace: batch-pipeline-qa
   	 labels: spark-app-selector -> spark-a3a0efa1a3804b759eda9e9aa52629cc, spark-role -> driver
   	 pod uid: 6cc27a5d-efff-401b-a120-15e8a3ccb2f2
   	 creation time: 2020-05-11T15:54:16Z
   	 service account name: spark
   	 volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
   	 node name: 10.74.200.186
   	 start time: N/A
   	 container images: N/A
   	 phase: Pending
   	 status: []
   
   2020-05-11 11:54:17 INFO  Client:54 - Waiting for application emp_reporting to finish...
   2020-05-11 11:54:17 INFO  LoggingPodStatusWatcherImpl:54 - State changed, new state:
   	 pod name: empreporting-1589212454526-driver
   	 namespace: batch-pipeline-qa
   	 labels: spark-app-selector -> spark-a3a0efa1a3804b759eda9e9aa52629cc, spark-role -> driver
   	 pod uid: 6cc27a5d-efff-401b-a120-15e8a3ccb2f2
   	 creation time: 2020-05-11T15:54:16Z
   	 service account name: spark
   	 volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
   	 node name: 10.74.200.186
   	 start time: 2020-05-11T15:54:17Z
   	 container images: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
   	 phase: Pending
   	 status: [ContainerStatus(containerID=null, image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7, imageID=, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=ContainerCreating, additionalProperties={}), additionalProperties={}), additionalProperties={})]
   
   2020-05-11 11:55:04 INFO  LoggingPodStatusWatcherImpl:54 - State changed, new state:
   	 pod name: empreporting-1589212454526-driver
   	 namespace: batch-pipeline-qa
   	 labels: spark-app-selector -> spark-a3a0efa1a3804b759eda9e9aa52629cc, spark-role -> driver
   	 pod uid: 6cc27a5d-efff-401b-a120-15e8a3ccb2f2
   	 creation time: 2020-05-11T15:54:16Z
   	 service account name: spark
   	 volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
   	 node name: 10.74.200.186
   	 start time: 2020-05-11T15:54:17Z
   	 container images: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
   	 phase: Running
   	 status: [ContainerStatus(containerID=containerd://ed26eb3ba628744040af1d2ee610fb29f00564b7fb5e693f0959b8999479f8a9, image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7, imageID=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark@sha256:3682354e49a55503ef906ce8aeff8601274fa426204ec989a91a72912d31ed7e, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=true, restartCount=0, state=ContainerState(running=ContainerStateRunning(startedAt=Time(time=2020-05-11T15:55:03Z, additionalProperties={}), additionalProperties={}), terminated=null, waiting=null, additionalProperties={}), additionalProperties={})]
   
   2020-05-11 11:55:29 INFO  LoggingPodStatusWatcherImpl:54 - State changed, new state:
   	 pod name: empreporting-1589212454526-driver
   	 namespace: batch-pipeline-qa
   	 labels: spark-app-selector -> spark-a3a0efa1a3804b759eda9e9aa52629cc, spark-role -> driver
   	 pod uid: 6cc27a5d-efff-401b-a120-15e8a3ccb2f2
   	 creation time: 2020-05-11T15:54:16Z
   	 service account name: spark
   	 volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
   	 node name: 10.74.200.186
   	 start time: 2020-05-11T15:54:17Z
   	 container images: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
   	 phase: Failed
   	 status: [ContainerStatus(containerID=containerd://ed26eb3ba628744040af1d2ee610fb29f00564b7fb5e693f0959b8999479f8a9, image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7, imageID=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark@sha256:3682354e49a55503ef906ce8aeff8601274fa426204ec989a91a72912d31ed7e, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=ContainerStateTerminated(containerID=containerd://ed26eb3ba628744040af1d2ee610fb29f00564b7fb5e693f0959b8999479f8a9, exitCode=1, finishedAt=Time(time=2020-05-11T15:55:27Z, additionalProperties={}), message=null, reason=Error, signal=null, startedAt=Time(time=2020-05-11T15:55:03Z, additionalProperties={}), additionalProperties={}), waiting=null, additionalProperties={}), additionalProperties={})]
   2020-05-11 11:55:29 INFO  LoggingPodStatusWatcherImpl:54 - Container final statuses:
   
   
   	 Container name: spark-kubernetes-driver
   	 Container image: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
   	 Container state: Terminated
   	 Exit code: 1
   2020-05-11 11:55:29 INFO  Client:54 - Application emp_reporting finished.
   2020-05-11 11:55:29 INFO  ShutdownHookManager:54 - Shutdown hook called
   2020-05-11 11:55:29 INFO  ShutdownHookManager:54 - Deleting directory /tmp/spark-8ba3c338-3eae-4bc7-9c25-3146642b9207
   ```
   
   ## airflow logs examples
   
   ### Scenario 1: job success, log return from airflow ,get right status:
   
   ```log
   [dawany@dawany-inf env_qa]$ kubectl describe pod rmtextract-1589270449587-driver -n batch-pipeline-qa
   Name:         rmtextract-1589270449587-driver
   Namespace:    batch-pipeline-qa
   Priority:     0
   Node:         10.93.122.236/10.93.122.236
   Start Time:   Tue, 12 May 2020 04:00:51 -0400
   Labels:       spark-app-selector=spark-2f7caeac92c94710ab54cf40b5ff29e5
                 spark-role=driver
   Annotations:  kubernetes.io/psp: db2oltp-dev-psp
   Status:       Succeeded
   IP:           172.30.244.25
   IPs:          <none>
   Containers:
     spark-kubernetes-driver:
       Container ID:  containerd://3f364306c94ac5739ab154a0037fc5e17a2981cf51669be916304be1d8698eb6
       Image:         txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
       Image ID:      txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark@sha256:3682354e49a55503ef906ce8aeff8601274fa426204ec989a91a72912d31ed7e
       Ports:         7078/TCP, 7079/TCP, 4040/TCP
       Host Ports:    0/TCP, 0/TCP, 0/TCP
       Args:
         driver
         --properties-file
         /opt/spark/conf/spark.properties
         --class
         com.ibm.cio.dswim.trans.rmt.stage.RmtExtractStageJob
         spark-internal
         0
         3
         8
       State:          Terminated
         Reason:       Completed
         Exit Code:    0
         Started:      Tue, 12 May 2020 04:01:07 -0400
         Finished:     Tue, 12 May 2020 04:52:03 -0400
       Ready:          False
       Restart Count:  0
       Limits:
         memory:  11Gi
       Requests:
         cpu:     1
         memory:  11Gi
       Environment:
         SPARK_DRIVER_BIND_ADDRESS:   (v1:status.podIP)
         SPARK_LOCAL_DIRS:           /var/data/spark-02fa9b25-af0b-430d-ab75-aee12dd217a7
         SPARK_CONF_DIR:             /opt/spark/conf
       Mounts:
         /opt/spark/conf from spark-conf-volume (rw)
         /opt/spark/secrets from dswsecret-volume (rw)
         /var/data/spark-02fa9b25-af0b-430d-ab75-aee12dd217a7 from spark-local-dir-1 (rw)
         /var/run/secrets/kubernetes.io/serviceaccount from spark-token-fpqpz (ro)
   Conditions:
     Type              Status
     Initialized       True
     Ready             False
     ContainersReady   False
     PodScheduled      True
   Volumes:
     spark-local-dir-1:
       Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
       Medium:
       SizeLimit:  <unset>
     dswsecret-volume:
       Type:        Secret (a volume populated by a Secret)
       SecretName:  dswsecret
       Optional:    false
     spark-conf-volume:
       Type:      ConfigMap (a volume populated by a ConfigMap)
       Name:      rmtextract-1589270449587-driver-conf-map
       Optional:  false
     spark-token-fpqpz:
       Type:        Secret (a volume populated by a Secret)
       SecretName:  spark-token-fpqpz
       Optional:    false
   QoS Class:       Burstable
   Node-Selectors:  <none>
   Tolerations:     node.kubernetes.io/not-ready:NoExecute for 600s
                    node.kubernetes.io/unreachable:NoExecute for 600s
   Events:          <none>
   
   ```
   
   ```log
   *** Reading local file: /home/airflow/airflow/logs/Transformation_Rmt_Master_Load_Sequence_Adhoc_Main/Rmt_Extract/2020-05-12T04:00:00+00:00/1.log
   [2020-05-12 03:00:43,970] {taskinstance.py:669} INFO - Dependencies all met for <TaskInstance: Transformation_Rmt_Master_Load_Sequence_Adhoc_Main.Rmt_Extract 2020-05-12T04:00:00+00:00 [queued]>
   [2020-05-12 03:00:44,074] {taskinstance.py:669} INFO - Dependencies all met for <TaskInstance: Transformation_Rmt_Master_Load_Sequence_Adhoc_Main.Rmt_Extract 2020-05-12T04:00:00+00:00 [queued]>
   [2020-05-12 03:00:44,074] {taskinstance.py:879} INFO -
   --------------------------------------------------------------------------------
   [2020-05-12 03:00:44,074] {taskinstance.py:880} INFO - Starting attempt 1 of 2
   [2020-05-12 03:00:44,074] {taskinstance.py:881} INFO -
   --------------------------------------------------------------------------------
   [2020-05-12 03:00:44,173] {taskinstance.py:900} INFO - Executing <Task(SparkSubmitOperator): Rmt_Extract> on 2020-05-12T04:00:00+00:00
   [2020-05-12 03:00:44,176] {standard_task_runner.py:53} INFO - Started process 11051 to run task
   [2020-05-12 03:00:44,619] {logging_mixin.py:112} INFO - Running %s on host %s <TaskInstance: Transformation_Rmt_Master_Load_Sequence_Adhoc_Main.Rmt_Extract 2020-05-12T04:00:00+00:00 [running]> kafka02.cloud.ibm.com
   [2020-05-12 03:00:45,043] {logging_mixin.py:112} INFO - [2020-05-12 03:00:45,043] {base_hook.py:87} INFO - Using connection to: id: spark_default. Host: k8s://https://c2.private.us-south.containers.cloud.ibm.com:26165, Port: None, Schema: None, Login: admin, Password: XXXXXXXX, extra: XXXXXXXX
   [2020-05-12 03:00:45,071] {logging_mixin.py:112} INFO - [2020-05-12 03:00:45,071] {spark_submit_hook.py:325} INFO - Spark-Submit cmd: spark-submit --master k8s://https://c2.private.us-south.containers.cloud.ibm.com:26165 --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf spark.kubernetes.container.image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7 --conf spark.kubernetes.container.image.pullSecrets=artifactory-container-registry --conf spark.submit.deployMode=cluster --conf spark.kubernetes.report.interval=2 --conf spark.kubernetes.driver.secrets.dswsecret=/opt/spark/secrets --conf spark.executor.userClassPathFirst=true --conf spark.driver.userClassPathFirst=true --conf spark.sql.parquet.compression.codec=gzip --conf spark.sql.session.timeZone=America/New_York --conf spark.sql.broadcastTimeout=1800 --conf spark.sql.shuffle.partitions=600 --conf spark.shuffle.consolidateFiles=true --conf spark.default.parallelism=108 --conf spark.driver.cores=1 --conf spark.executor.cores=2 --conf spark.kubernetes.executor.request.cores=0.6 --conf spark.kubernetes.executor.memoryOverhead=1G --conf spark.driver.memory=10G --conf spark.executor.memory=5G --conf spark.executor.instances=9 --conf spark.sql.codegen=true --conf spark.sql.cbo.enabled=true --conf spark.sql.optimizer.maxIterations=1000 --conf spark.kubernetes.namespace=batch-pipeline-qa --files cos://dsw-data-project-qa.service/config/dsw_config.conf --jars cos://dsw-data-project-qa.service/job-jars/common-1.0-SNAPSHOT.jar,cos://dsw-data-project-qa.service/job-jars/rmtjob-1.0-SNAPSHOT.jar,cos://dsw-data-project-qa.service/job-jars/meta-1.0-SNAPSHOT.jar,cos://dsw-data-project-qa.service/3rdparty-jars/druid-1.1.12.jar,cos://dsw-data-project-qa.service/3rdparty-jars/mybatis-3.5.3.jar,cos://dsw-data-project-qa.service/3rdparty-jars/db2jcc4.jar,cos://dsw-data-project-qa.service/3rdparty-jars/logback-core-1.2.3.jar,cos://dsw-data-project-qa.service/3rdparty-jars/logback-classic-1.2.3.jar,cos://dsw-data-project-qa.service/3rdparty-jars/dom4j-2.1.1.jar,cos://dsw-data-project-qa.service/3rdparty-jars/guava-28.0-jre.jar,cos://dsw-data-project-qa.service/3rdparty-jars/commons-lang3-3.9.jar,cos://dsw-data-project-qa.service/3rdparty-jars/fastjson-1.2.59.jar --name Rmt_Extract --class com.ibm.cio.dswim.trans.rmt.stage.RmtExtractStageJob cos://dsw-data-project-qa.service/job-jars/rmt_extract_stage-1.0-SNAPSHOT.jar 0 3 8
   [2020-05-12 03:00:47,353] {logging_mixin.py:112} INFO - [2020-05-12 03:00:47,352] {spark_submit_hook.py:479} INFO - 20/05/12 03:00:47 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
   [2020-05-12 03:00:49,643] {logging_mixin.py:112} INFO - [2020-05-12 03:00:49,643] {spark_submit_hook.py:479} INFO - log4j:WARN No appenders could be found for logger (io.fabric8.kubernetes.client.Config).
   [2020-05-12 03:00:49,643] {logging_mixin.py:112} INFO - [2020-05-12 03:00:49,643] {spark_submit_hook.py:479} INFO - log4j:WARN Please initialize the log4j system properly.
   [2020-05-12 03:00:49,644] {logging_mixin.py:112} INFO - [2020-05-12 03:00:49,644] {spark_submit_hook.py:479} INFO - log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
   [2020-05-12 03:00:50,889] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,888] {spark_submit_hook.py:479} INFO - Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
   
   [2020-05-12 03:00:50,908] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,908] {spark_submit_hook.py:479} INFO - 20/05/12 03:00:50 INFO LoggingPodStatusWatcherImpl: State changed, new state:
   [2020-05-12 03:00:50,908] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,908] {spark_submit_hook.py:462} INFO - Identified spark driver pod: rmtextract-1589270449587-driver
   [2020-05-12 03:00:50,908] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,908] {spark_submit_hook.py:479} INFO - pod name: rmtextract-1589270449587-driver
   [2020-05-12 03:00:50,908] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,908] {spark_submit_hook.py:479} INFO - namespace: batch-pipeline-qa
   [2020-05-12 03:00:50,909] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,908] {spark_submit_hook.py:479} INFO - labels: spark-app-selector -> spark-2f7caeac92c94710ab54cf40b5ff29e5, spark-role -> driver
   [2020-05-12 03:00:50,909] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,909] {spark_submit_hook.py:479} INFO - pod uid: 57056ebb-771a-4894-b1c3-398073967a6b
   [2020-05-12 03:00:50,909] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,909] {spark_submit_hook.py:479} INFO - creation time: 2020-05-12T08:00:50Z
   [2020-05-12 03:00:50,909] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,909] {spark_submit_hook.py:479} INFO - service account name: spark
   [2020-05-12 03:00:50,909] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,909] {spark_submit_hook.py:479} INFO - volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
   [2020-05-12 03:00:50,909] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,909] {spark_submit_hook.py:479} INFO - node name: N/A
   [2020-05-12 03:00:50,909] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,909] {spark_submit_hook.py:479} INFO - start time: N/A
   [2020-05-12 03:00:50,910] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,909] {spark_submit_hook.py:479} INFO - container images: N/A
   [2020-05-12 03:00:50,910] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,910] {spark_submit_hook.py:479} INFO - phase: Pending
   [2020-05-12 03:00:50,910] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,910] {spark_submit_hook.py:479} INFO - status: []
   
   [2020-05-12 03:00:50,912] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,912] {spark_submit_hook.py:479} INFO - 20/05/12 03:00:50 INFO LoggingPodStatusWatcherImpl: State changed, new state:
   [2020-05-12 03:00:50,913] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,912] {spark_submit_hook.py:462} INFO - Identified spark driver pod: rmtextract-1589270449587-driver
   [2020-05-12 03:00:50,913] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,913] {spark_submit_hook.py:479} INFO - pod name: rmtextract-1589270449587-driver
   [2020-05-12 03:00:50,913] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,913] {spark_submit_hook.py:479} INFO - namespace: batch-pipeline-qa
   [2020-05-12 03:00:50,913] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,913] {spark_submit_hook.py:479} INFO - labels: spark-app-selector -> spark-2f7caeac92c94710ab54cf40b5ff29e5, spark-role -> driver
   [2020-05-12 03:00:50,913] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,913] {spark_submit_hook.py:479} INFO - pod uid: 57056ebb-771a-4894-b1c3-398073967a6b
   [2020-05-12 03:00:50,913] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,913] {spark_submit_hook.py:479} INFO - creation time: 2020-05-12T08:00:50Z
   [2020-05-12 03:00:50,913] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,913] {spark_submit_hook.py:479} INFO - service account name: spark
   [2020-05-12 03:00:50,914] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,914] {spark_submit_hook.py:479} INFO - volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
   [2020-05-12 03:00:50,914] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,914] {spark_submit_hook.py:479} INFO - node name: 10.93.122.236
   [2020-05-12 03:00:50,914] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,914] {spark_submit_hook.py:479} INFO - start time: N/A
   [2020-05-12 03:00:50,914] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,914] {spark_submit_hook.py:479} INFO - container images: N/A
   [2020-05-12 03:00:50,914] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,914] {spark_submit_hook.py:479} INFO - phase: Pending
   [2020-05-12 03:00:50,914] {logging_mixin.py:112} INFO - [2020-05-12 03:00:50,914] {spark_submit_hook.py:479} INFO - status: []
   
   [2020-05-12 03:00:51,163] {logging_mixin.py:112} INFO - [2020-05-12 03:00:51,163] {spark_submit_hook.py:479} INFO - 20/05/12 03:00:51 INFO Client: Waiting for application Rmt_Extract to finish...
   [2020-05-12 03:00:51,406] {logging_mixin.py:112} INFO - [2020-05-12 03:00:51,406] {spark_submit_hook.py:479} INFO - 20/05/12 03:00:51 INFO LoggingPodStatusWatcherImpl: State changed, new state:
   [2020-05-12 03:00:51,407] {logging_mixin.py:112} INFO - [2020-05-12 03:00:51,407] {spark_submit_hook.py:462} INFO - Identified spark driver pod: rmtextract-1589270449587-driver
   [2020-05-12 03:00:51,407] {logging_mixin.py:112} INFO - [2020-05-12 03:00:51,407] {spark_submit_hook.py:479} INFO - pod name: rmtextract-1589270449587-driver
   [2020-05-12 03:00:51,407] {logging_mixin.py:112} INFO - [2020-05-12 03:00:51,407] {spark_submit_hook.py:479} INFO - namespace: batch-pipeline-qa
   [2020-05-12 03:00:51,407] {logging_mixin.py:112} INFO - [2020-05-12 03:00:51,407] {spark_submit_hook.py:479} INFO - labels: spark-app-selector -> spark-2f7caeac92c94710ab54cf40b5ff29e5, spark-role -> driver
   [2020-05-12 03:00:51,407] {logging_mixin.py:112} INFO - [2020-05-12 03:00:51,407] {spark_submit_hook.py:479} INFO - pod uid: 57056ebb-771a-4894-b1c3-398073967a6b
   [2020-05-12 03:00:51,407] {logging_mixin.py:112} INFO - [2020-05-12 03:00:51,407] {spark_submit_hook.py:479} INFO - creation time: 2020-05-12T08:00:50Z
   [2020-05-12 03:00:51,408] {logging_mixin.py:112} INFO - [2020-05-12 03:00:51,407] {spark_submit_hook.py:479} INFO - service account name: spark
   [2020-05-12 03:00:51,408] {logging_mixin.py:112} INFO - [2020-05-12 03:00:51,408] {spark_submit_hook.py:479} INFO - volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
   [2020-05-12 03:00:51,408] {logging_mixin.py:112} INFO - [2020-05-12 03:00:51,408] {spark_submit_hook.py:479} INFO - node name: 10.93.122.236
   [2020-05-12 03:00:51,408] {logging_mixin.py:112} INFO - [2020-05-12 03:00:51,408] {spark_submit_hook.py:479} INFO - start time: 2020-05-12T08:00:51Z
   [2020-05-12 03:00:51,408] {logging_mixin.py:112} INFO - [2020-05-12 03:00:51,408] {spark_submit_hook.py:479} INFO - container images: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
   [2020-05-12 03:00:51,408] {logging_mixin.py:112} INFO - [2020-05-12 03:00:51,408] {spark_submit_hook.py:479} INFO - phase: Pending
   [2020-05-12 03:00:51,408] {logging_mixin.py:112} INFO - [2020-05-12 03:00:51,408] {spark_submit_hook.py:479} INFO - status: [ContainerStatus(containerID=null, image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7, imageID=, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=ContainerCreating, additionalProperties={}), additionalProperties={}), additionalProperties={})]
   
   [2020-05-12 03:01:07,811] {logging_mixin.py:112} INFO - [2020-05-12 03:01:07,811] {spark_submit_hook.py:479} INFO - 20/05/12 03:01:07 INFO LoggingPodStatusWatcherImpl: State changed, new state:
   [2020-05-12 03:01:07,811] {logging_mixin.py:112} INFO - [2020-05-12 03:01:07,811] {spark_submit_hook.py:462} INFO - Identified spark driver pod: rmtextract-1589270449587-driver
   [2020-05-12 03:01:07,811] {logging_mixin.py:112} INFO - [2020-05-12 03:01:07,811] {spark_submit_hook.py:479} INFO - pod name: rmtextract-1589270449587-driver
   [2020-05-12 03:01:07,812] {logging_mixin.py:112} INFO - [2020-05-12 03:01:07,811] {spark_submit_hook.py:479} INFO - namespace: batch-pipeline-qa
   [2020-05-12 03:01:07,812] {logging_mixin.py:112} INFO - [2020-05-12 03:01:07,812] {spark_submit_hook.py:479} INFO - labels: spark-app-selector -> spark-2f7caeac92c94710ab54cf40b5ff29e5, spark-role -> driver
   [2020-05-12 03:01:07,812] {logging_mixin.py:112} INFO - [2020-05-12 03:01:07,812] {spark_submit_hook.py:479} INFO - pod uid: 57056ebb-771a-4894-b1c3-398073967a6b
   [2020-05-12 03:01:07,812] {logging_mixin.py:112} INFO - [2020-05-12 03:01:07,812] {spark_submit_hook.py:479} INFO - creation time: 2020-05-12T08:00:50Z
   [2020-05-12 03:01:07,812] {logging_mixin.py:112} INFO - [2020-05-12 03:01:07,812] {spark_submit_hook.py:479} INFO - service account name: spark
   [2020-05-12 03:01:07,812] {logging_mixin.py:112} INFO - [2020-05-12 03:01:07,812] {spark_submit_hook.py:479} INFO - volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
   [2020-05-12 03:01:07,812] {logging_mixin.py:112} INFO - [2020-05-12 03:01:07,812] {spark_submit_hook.py:479} INFO - node name: 10.93.122.236
   [2020-05-12 03:01:07,813] {logging_mixin.py:112} INFO - [2020-05-12 03:01:07,812] {spark_submit_hook.py:479} INFO - start time: 2020-05-12T08:00:51Z
   [2020-05-12 03:01:07,813] {logging_mixin.py:112} INFO - [2020-05-12 03:01:07,813] {spark_submit_hook.py:479} INFO - container images: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
   [2020-05-12 03:01:07,813] {logging_mixin.py:112} INFO - [2020-05-12 03:01:07,813] {spark_submit_hook.py:479} INFO - phase: Running
   [2020-05-12 03:01:07,813] {logging_mixin.py:112} INFO - [2020-05-12 03:01:07,813] {spark_submit_hook.py:479} INFO - status: [ContainerStatus(containerID=containerd://3f364306c94ac5739ab154a0037fc5e17a2981cf51669be916304be1d8698eb6, image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7, imageID=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark@sha256:3682354e49a55503ef906ce8aeff8601274fa426204ec989a91a72912d31ed7e, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=true, restartCount=0, state=ContainerState(running=ContainerStateRunning(startedAt=2020-05-12T08:01:07Z, additionalProperties={}), terminated=null, waiting=null, additionalProperties={}), additionalProperties={})]
   
   [2020-05-12 03:52:13,876] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,876] {spark_submit_hook.py:479} INFO - 20/05/12 03:52:13 INFO LoggingPodStatusWatcherImpl: State changed, new state:
   [2020-05-12 03:52:13,876] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,876] {spark_submit_hook.py:462} INFO - Identified spark driver pod: rmtextract-1589270449587-driver
   [2020-05-12 03:52:13,876] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,876] {spark_submit_hook.py:479} INFO - pod name: rmtextract-1589270449587-driver
   [2020-05-12 03:52:13,877] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,876] {spark_submit_hook.py:479} INFO - namespace: batch-pipeline-qa
   [2020-05-12 03:52:13,877] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,877] {spark_submit_hook.py:479} INFO - labels: spark-app-selector -> spark-2f7caeac92c94710ab54cf40b5ff29e5, spark-role -> driver
   [2020-05-12 03:52:13,877] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,877] {spark_submit_hook.py:479} INFO - pod uid: 57056ebb-771a-4894-b1c3-398073967a6b
   [2020-05-12 03:52:13,877] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,877] {spark_submit_hook.py:479} INFO - creation time: 2020-05-12T08:00:50Z
   [2020-05-12 03:52:13,877] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,877] {spark_submit_hook.py:479} INFO - service account name: spark
   [2020-05-12 03:52:13,877] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,877] {spark_submit_hook.py:479} INFO - volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
   [2020-05-12 03:52:13,877] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,877] {spark_submit_hook.py:479} INFO - node name: 10.93.122.236
   [2020-05-12 03:52:13,878] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,878] {spark_submit_hook.py:479} INFO - start time: 2020-05-12T08:00:51Z
   [2020-05-12 03:52:13,878] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,878] {spark_submit_hook.py:479} INFO - container images: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
   [2020-05-12 03:52:13,878] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,878] {spark_submit_hook.py:479} INFO - phase: Succeeded
   [2020-05-12 03:52:13,878] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,878] {spark_submit_hook.py:479} INFO - status: [ContainerStatus(containerID=containerd://3f364306c94ac5739ab154a0037fc5e17a2981cf51669be916304be1d8698eb6, image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7, imageID=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark@sha256:3682354e49a55503ef906ce8aeff8601274fa426204ec989a91a72912d31ed7e, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=ContainerStateTerminated(containerID=containerd://3f364306c94ac5739ab154a0037fc5e17a2981cf51669be916304be1d8698eb6, exitCode=0, finishedAt=2020-05-12T08:52:03Z, message=null, reason=Completed, signal=null, startedAt=2020-05-12T08:01:07Z, additionalProperties={}), waiting=null, additionalProperties={}), additionalProperties={})]
   [2020-05-12 03:52:13,883] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,882] {spark_submit_hook.py:479} INFO - 20/05/12 03:52:13 INFO LoggingPodStatusWatcherImpl: Container final statuses:
   [2020-05-12 03:52:13,883] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,883] {spark_submit_hook.py:479} INFO -
   [2020-05-12 03:52:13,883] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,883] {spark_submit_hook.py:479} INFO -
   [2020-05-12 03:52:13,883] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,883] {spark_submit_hook.py:479} INFO - Container name: spark-kubernetes-driver
   [2020-05-12 03:52:13,883] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,883] {spark_submit_hook.py:479} INFO - Container image: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
   [2020-05-12 03:52:13,883] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,883] {spark_submit_hook.py:479} INFO - Container state: Terminated
   [2020-05-12 03:52:13,884] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,883] {spark_submit_hook.py:479} INFO - Exit code: 0
   [2020-05-12 03:52:13,884] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,884] {spark_submit_hook.py:479} INFO - 20/05/12 03:52:13 INFO Client: Application Rmt_Extract finished.
   [2020-05-12 03:52:13,889] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,889] {spark_submit_hook.py:479} INFO - 20/05/12 03:52:13 INFO ShutdownHookManager: Shutdown hook called
   [2020-05-12 03:52:13,890] {logging_mixin.py:112} INFO - [2020-05-12 03:52:13,890] {spark_submit_hook.py:479} INFO - 20/05/12 03:52:13 INFO ShutdownHookManager: Deleting directory /tmp/spark-8c6f511c-4c0f-451e-b686-7d0f444d8f59
   [2020-05-12 03:52:14,006] {taskinstance.py:1065} INFO - Marking task as SUCCESS.dag_id=Transformation_Rmt_Master_Load_Sequence_Adhoc_Main, task_id=Rmt_Extract, execution_date=20200512T040000, start_date=20200512T080043, end_date=20200512T085214
   [2020-05-12 03:52:19,162] {logging_mixin.py:112} INFO - [2020-05-12 03:52:19,161] {local_task_job.py:103} INFO - Task exited with return code 0
   
   ```
   
   ### Scenario 2:job failed , log return from airflow ,get right status:
   
   ```log
   [dawany@dawany-inf env_qa]$ kubectl describe pod merge-rptg-appl-dscr-1589248907865-driver -n batch-pipeline-qa
   Name:         merge-rptg-appl-dscr-1589248907865-driver
   Namespace:    batch-pipeline-qa
   Priority:     0
   Node:         10.74.200.186/10.74.200.186
   Start Time:   Mon, 11 May 2020 22:01:52 -0400
   Labels:       spark-app-selector=spark-1e9e33df2d5c488dbc0555a618577f40
                 spark-role=driver
   Annotations:  kubernetes.io/psp: db2oltp-dev-psp
   Status:       Failed
   IP:           172.30.21.6
   IPs:          <none>
   Containers:
     spark-kubernetes-driver:
       Container ID:  containerd://cf5ac589838bb75608cb1903967fcaad31f850a6cd4b4f63eb1e7417cc867c31
       Image:         txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.0-h2.6
       Image ID:      txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark@sha256:276417a1bafb5aca28c78585f55df8a8c684b20c87b1814c3cc02065d7faa885
       Ports:         7078/TCP, 7079/TCP, 4040/TCP
       Host Ports:    0/TCP, 0/TCP, 0/TCP
       Args:
         driver
         --properties-file
         /opt/spark/conf/spark.properties
         --class
         com.ibm.cio.dswim.ingest.CommonMerge
         spark-internal
         -t
         odsqa.shar1.rptg_appl_dscr
         -b
         dsw-data-drop-qa
         -d
         dashdb
         -k
         XREF
       State:          Terminated
         Reason:       Error
         Exit Code:    1
         Started:      Mon, 11 May 2020 22:02:19 -0400
         Finished:     Mon, 11 May 2020 22:05:03 -0400
       Ready:          False
       Restart Count:  0
       Limits:
         memory:  896Mi
       Requests:
         cpu:     1
         memory:  896Mi
       Environment:
         SPARK_DRIVER_BIND_ADDRESS:   (v1:status.podIP)
         SPARK_LOCAL_DIRS:           /var/data/spark-8688577b-07b4-4cf5-b362-1a15003420ae
         SPARK_CONF_DIR:             /opt/spark/conf
       Mounts:
         /opt/spark/conf from spark-conf-volume (rw)
         /opt/spark/secrets from dswsecret-volume (rw)
         /var/data/spark-8688577b-07b4-4cf5-b362-1a15003420ae from spark-local-dir-1 (rw)
         /var/run/secrets/kubernetes.io/serviceaccount from spark-token-fpqpz (ro)
   Conditions:
     Type              Status
     Initialized       True
     Ready             False
     ContainersReady   False
     PodScheduled      True
   Volumes:
     spark-local-dir-1:
       Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
       Medium:
       SizeLimit:  <unset>
     dswsecret-volume:
       Type:        Secret (a volume populated by a Secret)
       SecretName:  dswsecret
       Optional:    false
     spark-conf-volume:
       Type:      ConfigMap (a volume populated by a ConfigMap)
       Name:      merge-rptg-appl-dscr-1589248907865-driver-conf-map
       Optional:  false
     spark-token-fpqpz:
       Type:        Secret (a volume populated by a Secret)
       SecretName:  spark-token-fpqpz
       Optional:    false
   QoS Class:       Burstable
   Node-Selectors:  <none>
   Tolerations:     node.kubernetes.io/not-ready:NoExecute for 600s
                    node.kubernetes.io/unreachable:NoExecute for 600s
   Events:          <none>
   [dawany@dawany-inf env_qa]$
   
   ```
   
   ```log
   *** Reading local file: /home/airflow/airflow/logs/Batch_xref.rptg_appl_dscr/merge_rptg_appl_dscr/2020-05-11T02:01:00+00:00/1.log
   [2020-05-11 21:01:41,692] {taskinstance.py:669} INFO - Dependencies all met for <TaskInstance: Batch_xref.rptg_appl_dscr.merge_rptg_appl_dscr 2020-05-11T02:01:00+00:00 [queued]>
   [2020-05-11 21:01:41,801] {taskinstance.py:669} INFO - Dependencies all met for <TaskInstance: Batch_xref.rptg_appl_dscr.merge_rptg_appl_dscr 2020-05-11T02:01:00+00:00 [queued]>
   [2020-05-11 21:01:41,801] {taskinstance.py:879} INFO -
   --------------------------------------------------------------------------------
   [2020-05-11 21:01:41,801] {taskinstance.py:880} INFO - Starting attempt 1 of 2
   [2020-05-11 21:01:41,802] {taskinstance.py:881} INFO -
   --------------------------------------------------------------------------------
   [2020-05-11 21:01:41,911] {taskinstance.py:900} INFO - Executing <Task(SparkSubmitOperator): merge_rptg_appl_dscr> on 2020-05-11T02:01:00+00:00
   [2020-05-11 21:01:41,919] {standard_task_runner.py:53} INFO - Started process 37536 to run task
   [2020-05-11 21:01:42,508] {logging_mixin.py:112} INFO - Running %s on host %s <TaskInstance: Batch_xref.rptg_appl_dscr.merge_rptg_appl_dscr 2020-05-11T02:01:00+00:00 [running]> kafka02.cloud.ibm.com
   [2020-05-11 21:01:43,042] {logging_mixin.py:112} INFO - [2020-05-11 21:01:43,042] {base_hook.py:87} INFO - Using connection to: id: spark_default. Host: k8s://https://c2.private.us-south.containers.cloud.ibm.com:26165, Port: None, Schema: None, Login: admin, Password: XXXXXXXX, extra: XXXXXXXX
   [2020-05-11 21:01:43,055] {logging_mixin.py:112} INFO - [2020-05-11 21:01:43,055] {spark_submit_hook.py:325} INFO - Spark-Submit cmd: spark-submit --master k8s://https://c2.private.us-south.containers.cloud.ibm.com:26165 --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf spark.kubernetes.container.image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.0-h2.6 --conf spark.kubernetes.container.image.pullSecrets=artifactory-container-registry --conf spark.submit.deployMode=cluster --conf spark.executor.instances=1 --conf spark.kubernetes.executor.request.cores=0.5 --conf spark.kubernetes.driver.secrets.dswsecret=/opt/spark/secrets --conf spark.kubernetes.namespace=batch-pipeline-qa --files cos://dsw-data-project-qa.service/config/dsw_config.conf,cos://dsw-data-project-qa.service/config/tables_pk.conf --jars cos://dsw-data-project-qa.service/3rdparty-jars/org.apache.spark_spark-avro_2.11-2.4.0.jar,cos://dsw-data-project-qa.service/3rdparty-jars/ibm-cos-java-sdk-2.4.5.jar,cos://dsw-data-project-qa.service/3rdparty-jars/ibm-cos-java-sdk-core-2.4.5.jar,cos://dsw-data-project-qa.service/3rdparty-jars/ibm-cos-java-sdk-kms-2.4.5.jar,cos://dsw-data-project-qa.service/3rdparty-jars/ibm-cos-java-sdk-s3-2.4.5.jar,cos://dsw-data-project-qa.service/3rdparty-jars/db2jcc4.jar --executor-memory 1G --driver-memory 512M --name merge-rptg-appl-dscr --class com.ibm.cio.dswim.ingest.CommonMerge cos://dsw-data-project-qa.service/job-jars/ingestion-1.0.jar -t odsqa.shar1.rptg_appl_dscr -b dsw-data-drop-qa -d dashdb -k XREF
   [2020-05-11 21:01:45,650] {logging_mixin.py:112} INFO - [2020-05-11 21:01:45,649] {spark_submit_hook.py:479} INFO - 20/05/11 21:01:45 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
   [2020-05-11 21:01:47,988] {logging_mixin.py:112} INFO - [2020-05-11 21:01:47,988] {spark_submit_hook.py:479} INFO - log4j:WARN No appenders could be found for logger (io.fabric8.kubernetes.client.Config).
   [2020-05-11 21:01:47,989] {logging_mixin.py:112} INFO - [2020-05-11 21:01:47,989] {spark_submit_hook.py:479} INFO - log4j:WARN Please initialize the log4j system properly.
   [2020-05-11 21:01:47,989] {logging_mixin.py:112} INFO - [2020-05-11 21:01:47,989] {spark_submit_hook.py:479} INFO - log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
   [2020-05-11 21:01:49,267] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,266] {spark_submit_hook.py:479} INFO - Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
   
   [2020-05-11 21:01:49,283] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,282] {spark_submit_hook.py:479} INFO - 20/05/11 21:01:49 INFO LoggingPodStatusWatcherImpl: State changed, new state:
   [2020-05-11 21:01:49,283] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,283] {spark_submit_hook.py:462} INFO - Identified spark driver pod: merge-rptg-appl-dscr-1589248907865-driver
   [2020-05-11 21:01:49,283] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,283] {spark_submit_hook.py:479} INFO - pod name: merge-rptg-appl-dscr-1589248907865-driver
   [2020-05-11 21:01:49,283] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,283] {spark_submit_hook.py:479} INFO - namespace: batch-pipeline-qa
   [2020-05-11 21:01:49,283] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,283] {spark_submit_hook.py:479} INFO - labels: spark-app-selector -> spark-1e9e33df2d5c488dbc0555a618577f40, spark-role -> driver
   [2020-05-11 21:01:49,283] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,283] {spark_submit_hook.py:479} INFO - pod uid: f01dcbd9-fe6d-421a-99ea-7749f6b79345
   [2020-05-11 21:01:49,284] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,284] {spark_submit_hook.py:479} INFO - creation time: 2020-05-12T02:01:48Z
   [2020-05-11 21:01:49,284] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,284] {spark_submit_hook.py:479} INFO - service account name: spark
   [2020-05-11 21:01:49,284] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,284] {spark_submit_hook.py:479} INFO - volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
   [2020-05-11 21:01:49,284] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,284] {spark_submit_hook.py:479} INFO - node name: N/A
   [2020-05-11 21:01:49,284] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,284] {spark_submit_hook.py:479} INFO - start time: N/A
   [2020-05-11 21:01:49,284] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,284] {spark_submit_hook.py:479} INFO - container images: N/A
   [2020-05-11 21:01:49,284] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,284] {spark_submit_hook.py:479} INFO - phase: Pending
   [2020-05-11 21:01:49,285] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,285] {spark_submit_hook.py:479} INFO - status: []
   
   [2020-05-11 21:01:49,292] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,291] {spark_submit_hook.py:479} INFO - 20/05/11 21:01:49 INFO LoggingPodStatusWatcherImpl: State changed, new state:
   [2020-05-11 21:01:49,292] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,292] {spark_submit_hook.py:462} INFO - Identified spark driver pod: merge-rptg-appl-dscr-1589248907865-driver
   [2020-05-11 21:01:49,292] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,292] {spark_submit_hook.py:479} INFO - pod name: merge-rptg-appl-dscr-1589248907865-driver
   [2020-05-11 21:01:49,292] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,292] {spark_submit_hook.py:479} INFO - namespace: batch-pipeline-qa
   [2020-05-11 21:01:49,292] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,292] {spark_submit_hook.py:479} INFO - labels: spark-app-selector -> spark-1e9e33df2d5c488dbc0555a618577f40, spark-role -> driver
   [2020-05-11 21:01:49,292] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,292] {spark_submit_hook.py:479} INFO - pod uid: f01dcbd9-fe6d-421a-99ea-7749f6b79345
   [2020-05-11 21:01:49,293] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,292] {spark_submit_hook.py:479} INFO - creation time: 2020-05-12T02:01:48Z
   [2020-05-11 21:01:49,293] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,293] {spark_submit_hook.py:479} INFO - service account name: spark
   [2020-05-11 21:01:49,293] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,293] {spark_submit_hook.py:479} INFO - volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
   [2020-05-11 21:01:49,293] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,293] {spark_submit_hook.py:479} INFO - node name: 10.74.200.186
   [2020-05-11 21:01:49,293] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,293] {spark_submit_hook.py:479} INFO - start time: N/A
   [2020-05-11 21:01:49,293] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,293] {spark_submit_hook.py:479} INFO - container images: N/A
   [2020-05-11 21:01:49,293] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,293] {spark_submit_hook.py:479} INFO - phase: Pending
   [2020-05-11 21:01:49,294] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,293] {spark_submit_hook.py:479} INFO - status: []
   
   [2020-05-11 21:01:49,550] {logging_mixin.py:112} INFO - [2020-05-11 21:01:49,550] {spark_submit_hook.py:479} INFO - 20/05/11 21:01:49 INFO Client: Waiting for application merge-rptg-appl-dscr to finish...
   [2020-05-11 21:01:53,413] {logging_mixin.py:112} INFO - [2020-05-11 21:01:53,413] {spark_submit_hook.py:479} INFO - 20/05/11 21:01:53 INFO LoggingPodStatusWatcherImpl: State changed, new state:
   [2020-05-11 21:01:53,413] {logging_mixin.py:112} INFO - [2020-05-11 21:01:53,413] {spark_submit_hook.py:462} INFO - Identified spark driver pod: merge-rptg-appl-dscr-1589248907865-driver
   [2020-05-11 21:01:53,413] {logging_mixin.py:112} INFO - [2020-05-11 21:01:53,413] {spark_submit_hook.py:479} INFO - pod name: merge-rptg-appl-dscr-1589248907865-driver
   [2020-05-11 21:01:53,414] {logging_mixin.py:112} INFO - [2020-05-11 21:01:53,413] {spark_submit_hook.py:479} INFO - namespace: batch-pipeline-qa
   [2020-05-11 21:01:53,414] {logging_mixin.py:112} INFO - [2020-05-11 21:01:53,414] {spark_submit_hook.py:479} INFO - labels: spark-app-selector -> spark-1e9e33df2d5c488dbc0555a618577f40, spark-role -> driver
   [2020-05-11 21:01:53,414] {logging_mixin.py:112} INFO - [2020-05-11 21:01:53,414] {spark_submit_hook.py:479} INFO - pod uid: f01dcbd9-fe6d-421a-99ea-7749f6b79345
   [2020-05-11 21:01:53,414] {logging_mixin.py:112} INFO - [2020-05-11 21:01:53,414] {spark_submit_hook.py:479} INFO - creation time: 2020-05-12T02:01:48Z
   [2020-05-11 21:01:53,414] {logging_mixin.py:112} INFO - [2020-05-11 21:01:53,414] {spark_submit_hook.py:479} INFO - service account name: spark
   [2020-05-11 21:01:53,414] {logging_mixin.py:112} INFO - [2020-05-11 21:01:53,414] {spark_submit_hook.py:479} INFO - volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
   [2020-05-11 21:01:53,414] {logging_mixin.py:112} INFO - [2020-05-11 21:01:53,414] {spark_submit_hook.py:479} INFO - node name: 10.74.200.186
   [2020-05-11 21:01:53,415] {logging_mixin.py:112} INFO - [2020-05-11 21:01:53,414] {spark_submit_hook.py:479} INFO - start time: 2020-05-12T02:01:52Z
   [2020-05-11 21:01:53,415] {logging_mixin.py:112} INFO - [2020-05-11 21:01:53,415] {spark_submit_hook.py:479} INFO - container images: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.0-h2.6
   [2020-05-11 21:01:53,415] {logging_mixin.py:112} INFO - [2020-05-11 21:01:53,415] {spark_submit_hook.py:479} INFO - phase: Pending
   [2020-05-11 21:01:53,415] {logging_mixin.py:112} INFO - [2020-05-11 21:01:53,415] {spark_submit_hook.py:479} INFO - status: [ContainerStatus(containerID=null, image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.0-h2.6, imageID=, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=ContainerCreating, additionalProperties={}), additionalProperties={}), additionalProperties={})]
   
   [2020-05-11 21:02:20,393] {logging_mixin.py:112} INFO - [2020-05-11 21:02:20,393] {spark_submit_hook.py:479} INFO - 20/05/11 21:02:20 INFO LoggingPodStatusWatcherImpl: State changed, new state:
   [2020-05-11 21:02:20,394] {logging_mixin.py:112} INFO - [2020-05-11 21:02:20,393] {spark_submit_hook.py:462} INFO - Identified spark driver pod: merge-rptg-appl-dscr-1589248907865-driver
   [2020-05-11 21:02:20,394] {logging_mixin.py:112} INFO - [2020-05-11 21:02:20,394] {spark_submit_hook.py:479} INFO - pod name: merge-rptg-appl-dscr-1589248907865-driver
   [2020-05-11 21:02:20,394] {logging_mixin.py:112} INFO - [2020-05-11 21:02:20,394] {spark_submit_hook.py:479} INFO - namespace: batch-pipeline-qa
   [2020-05-11 21:02:20,394] {logging_mixin.py:112} INFO - [2020-05-11 21:02:20,394] {spark_submit_hook.py:479} INFO - labels: spark-app-selector -> spark-1e9e33df2d5c488dbc0555a618577f40, spark-role -> driver
   [2020-05-11 21:02:20,394] {logging_mixin.py:112} INFO - [2020-05-11 21:02:20,394] {spark_submit_hook.py:479} INFO - pod uid: f01dcbd9-fe6d-421a-99ea-7749f6b79345
   [2020-05-11 21:02:20,394] {logging_mixin.py:112} INFO - [2020-05-11 21:02:20,394] {spark_submit_hook.py:479} INFO - creation time: 2020-05-12T02:01:48Z
   [2020-05-11 21:02:20,395] {logging_mixin.py:112} INFO - [2020-05-11 21:02:20,394] {spark_submit_hook.py:479} INFO - service account name: spark
   [2020-05-11 21:02:20,395] {logging_mixin.py:112} INFO - [2020-05-11 21:02:20,395] {spark_submit_hook.py:479} INFO - volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
   [2020-05-11 21:02:20,395] {logging_mixin.py:112} INFO - [2020-05-11 21:02:20,395] {spark_submit_hook.py:479} INFO - node name: 10.74.200.186
   [2020-05-11 21:02:20,395] {logging_mixin.py:112} INFO - [2020-05-11 21:02:20,395] {spark_submit_hook.py:479} INFO - start time: 2020-05-12T02:01:52Z
   [2020-05-11 21:02:20,395] {logging_mixin.py:112} INFO - [2020-05-11 21:02:20,395] {spark_submit_hook.py:479} INFO - container images: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.0-h2.6
   [2020-05-11 21:02:20,395] {logging_mixin.py:112} INFO - [2020-05-11 21:02:20,395] {spark_submit_hook.py:479} INFO - phase: Running
   [2020-05-11 21:02:20,395] {logging_mixin.py:112} INFO - [2020-05-11 21:02:20,395] {spark_submit_hook.py:479} INFO - status: [ContainerStatus(containerID=containerd://cf5ac589838bb75608cb1903967fcaad31f850a6cd4b4f63eb1e7417cc867c31, image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.0-h2.6, imageID=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark@sha256:276417a1bafb5aca28c78585f55df8a8c684b20c87b1814c3cc02065d7faa885, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=true, restartCount=0, state=ContainerState(running=ContainerStateRunning(startedAt=2020-05-12T02:02:19Z, additionalProperties={}), terminated=null, waiting=null, additionalProperties={}), additionalProperties={})]
   
   [2020-05-11 21:06:00,953] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,953] {spark_submit_hook.py:479} INFO - 20/05/11 21:06:00 INFO LoggingPodStatusWatcherImpl: State changed, new state:
   [2020-05-11 21:06:00,953] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,953] {spark_submit_hook.py:462} INFO - Identified spark driver pod: merge-rptg-appl-dscr-1589248907865-driver
   [2020-05-11 21:06:00,953] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,953] {spark_submit_hook.py:479} INFO - pod name: merge-rptg-appl-dscr-1589248907865-driver
   [2020-05-11 21:06:00,953] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,953] {spark_submit_hook.py:479} INFO - namespace: batch-pipeline-qa
   [2020-05-11 21:06:00,954] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,954] {spark_submit_hook.py:479} INFO - labels: spark-app-selector -> spark-1e9e33df2d5c488dbc0555a618577f40, spark-role -> driver
   [2020-05-11 21:06:00,954] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,954] {spark_submit_hook.py:479} INFO - pod uid: f01dcbd9-fe6d-421a-99ea-7749f6b79345
   [2020-05-11 21:06:00,954] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,954] {spark_submit_hook.py:479} INFO - creation time: 2020-05-12T02:01:48Z
   [2020-05-11 21:06:00,954] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,954] {spark_submit_hook.py:479} INFO - service account name: spark
   [2020-05-11 21:06:00,954] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,954] {spark_submit_hook.py:479} INFO - volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
   [2020-05-11 21:06:00,954] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,954] {spark_submit_hook.py:479} INFO - node name: 10.74.200.186
   [2020-05-11 21:06:00,954] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,954] {spark_submit_hook.py:479} INFO - start time: 2020-05-12T02:01:52Z
   [2020-05-11 21:06:00,955] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,955] {spark_submit_hook.py:479} INFO - container images: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.0-h2.6
   [2020-05-11 21:06:00,955] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,955] {spark_submit_hook.py:479} INFO - phase: Failed
   [2020-05-11 21:06:00,955] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,955] {spark_submit_hook.py:479} INFO - status: [ContainerStatus(containerID=containerd://cf5ac589838bb75608cb1903967fcaad31f850a6cd4b4f63eb1e7417cc867c31, image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.0-h2.6, imageID=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark@sha256:276417a1bafb5aca28c78585f55df8a8c684b20c87b1814c3cc02065d7faa885, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=ContainerStateTerminated(containerID=containerd://cf5ac589838bb75608cb1903967fcaad31f850a6cd4b4f63eb1e7417cc867c31, exitCode=1, finishedAt=2020-05-12T02:05:03Z, message=null, reason=Error, signal=null, startedAt=2020-05-12T02:02:19Z, additionalProperties={}), waiting=null, additionalProperties={}), additionalProperties={})]
   [2020-05-11 21:06:00,960] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,960] {spark_submit_hook.py:479} INFO - 20/05/11 21:06:00 INFO LoggingPodStatusWatcherImpl: Container final statuses:
   [2020-05-11 21:06:00,960] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,960] {spark_submit_hook.py:479} INFO -
   [2020-05-11 21:06:00,960] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,960] {spark_submit_hook.py:479} INFO -
   [2020-05-11 21:06:00,960] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,960] {spark_submit_hook.py:479} INFO - Container name: spark-kubernetes-driver
   [2020-05-11 21:06:00,960] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,960] {spark_submit_hook.py:479} INFO - Container image: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.0-h2.6
   [2020-05-11 21:06:00,961] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,961] {spark_submit_hook.py:479} INFO - Container state: Terminated
   [2020-05-11 21:06:00,961] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,961] {spark_submit_hook.py:479} INFO - Exit code: 1
   [2020-05-11 21:06:00,961] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,961] {spark_submit_hook.py:479} INFO - 20/05/11 21:06:00 INFO Client: Application merge-rptg-appl-dscr finished.
   [2020-05-11 21:06:00,968] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,968] {spark_submit_hook.py:479} INFO - 20/05/11 21:06:00 INFO ShutdownHookManager: Shutdown hook called
   [2020-05-11 21:06:00,969] {logging_mixin.py:112} INFO - [2020-05-11 21:06:00,969] {spark_submit_hook.py:479} INFO - 20/05/11 21:06:00 INFO ShutdownHookManager: Deleting directory /tmp/spark-88cb9622-28cd-42e4-aaea-48a2206ae942
   [2020-05-11 21:06:01,098] {taskinstance.py:1145} ERROR - Cannot execute: spark-submit --master k8s://https://c2.private.us-south.containers.cloud.ibm.com:26165 --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf spark.kubernetes.container.image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.0-h2.6 --conf spark.kubernetes.container.image.pullSecrets=artifactory-container-registry --conf spark.submit.deployMode=cluster --conf spark.executor.instances=1 --conf spark.kubernetes.executor.request.cores=0.5 --conf spark.kubernetes.driver.secrets.dswsecret=/opt/spark/secrets --conf spark.kubernetes.namespace=batch-pipeline-qa --files cos://dsw-data-project-qa.service/config/dsw_config.conf,cos://dsw-data-project-qa.service/config/tables_pk.conf --jars cos://dsw-data-project-qa.service/3rdparty-jars/org.apache.spark_spark-avro_2.11-2.4.0.jar,cos://dsw-data-project-qa.service/3rdparty-jars/ibm-cos-java-sdk-2.4.5.jar,cos://dsw-data-project-qa.service/3rdparty-jars/ibm-cos-java-sdk-core-2.4.5.jar,cos://dsw-data-project-qa.service/3rdparty-jars/ibm-cos-java-sdk-kms-2.4.5.jar,cos://dsw-data-project-qa.service/3rdparty-jars/ibm-cos-java-sdk-s3-2.4.5.jar,cos://dsw-data-project-qa.service/3rdparty-jars/db2jcc4.jar --executor-memory 1G --driver-memory 512M --name merge-rptg-appl-dscr --class com.ibm.cio.dswim.ingest.CommonMerge cos://dsw-data-project-qa.service/job-jars/ingestion-1.0.jar -t odsqa.shar1.rptg_appl_dscr -b dsw-data-drop-qa -d dashdb -k XREF. Error code is: 0.
   
   Traceback (most recent call last):
     File "/home/airflow/.local/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 983, in _run_raw_task
       result = task_copy.execute(context=context)
     File "/home/airflow/.local/lib/python3.6/site-packages/airflow/contrib/operators/spark_submit_operator.py", line 187, in execute
       self._hook.submit(self._application)
     File "/home/airflow/.local/lib/python3.6/site-packages/airflow/contrib/hooks/spark_submit_hook.py", line 405, in submit
       self._mask_cmd(spark_submit_cmd), returncode
   airflow.exceptions.AirflowException: Cannot execute: spark-submit --master k8s://https://c2.private.us-south.containers.cloud.ibm.com:26165 --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf spark.kubernetes.container.image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.0-h2.6 --conf spark.kubernetes.container.image.pullSecrets=artifactory-container-registry --conf spark.submit.deployMode=cluster --conf spark.executor.instances=1 --conf spark.kubernetes.executor.request.cores=0.5 --conf spark.kubernetes.driver.secrets.dswsecret=/opt/spark/secrets --conf spark.kubernetes.namespace=batch-pipeline-qa --files cos://dsw-data-project-qa.service/config/dsw_config.conf,cos://dsw-data-project-qa.service/config/tables_pk.conf --jars cos://dsw-data-project-qa.service/3rdparty-jars/org.apache.spark_spark-avro_2.11-2.4.0.jar,cos://dsw-data-project-qa.service/3rdparty-jars/ibm-cos-java-sdk-2.4.5.jar,cos://dsw-data-project-qa.service/3rdparty-jars/ibm-cos-java-sdk-core-2.4.5.jar,cos://dsw-data-project-qa.service/3rdparty-jars/ibm-cos-java-sdk-kms-2.4.5.jar,cos://dsw-data-project-qa.service/3rdparty-jars/ibm-cos-java-sdk-s3-2.4.5.jar,cos://dsw-data-project-qa.service/3rdparty-jars/db2jcc4.jar --executor-memory 1G --driver-memory 512M --name merge-rptg-appl-dscr --class com.ibm.cio.dswim.ingest.CommonMerge cos://dsw-data-project-qa.service/job-jars/ingestion-1.0.jar -t odsqa.shar1.rptg_appl_dscr -b dsw-data-drop-qa -d dashdb -k XREF. Error code is: 0.
   [2020-05-11 21:06:01,100] {taskinstance.py:1168} INFO - Marking task as UP_FOR_RETRY
   [2020-05-11 21:06:03,316] {logging_mixin.py:112} INFO - [2020-05-11 21:06:03,313] {local_task_job.py:103} INFO - Task exited with return code 1
   
   ```
   
   ### Scenario 3:job success, log return from airflow ,get wrong status(This Scenario is what we want to analysis)
   
   ```log
   [dawany@dawany-inf env_qa]$ kubectl describe pod rmtextract-1589227268554-driver -n batch-pipeline-qa
   Name:         rmtextract-1589227268554-driver
   Namespace:    batch-pipeline-qa
   Priority:     0
   Node:         10.93.122.236/10.93.122.236
   Start Time:   Mon, 11 May 2020 16:01:10 -0400
   Labels:       spark-app-selector=spark-836f53b29a274eabbeba208d27e242de
                 spark-role=driver
   Annotations:  kubernetes.io/psp: db2oltp-dev-psp
   Status:       Succeeded
   IP:           172.30.244.35
   IPs:          <none>
   Containers:
     spark-kubernetes-driver:
       Container ID:  containerd://9608ccf92d1067645b6cbfcd289e6c99c76cd57463ed7c1fb71352a38a27a58c
       Image:         txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
       Image ID:      txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark@sha256:3682354e49a55503ef906ce8aeff8601274fa426204ec989a91a72912d31ed7e
       Ports:         7078/TCP, 7079/TCP, 4040/TCP
       Host Ports:    0/TCP, 0/TCP, 0/TCP
       Args:
         driver
         --properties-file
         /opt/spark/conf/spark.properties
         --class
         com.ibm.cio.dswim.trans.rmt.stage.RmtExtractStageJob
         spark-internal
         0
         3
         8
       State:          Terminated
         Reason:       Completed
         Exit Code:    0
         Started:      Mon, 11 May 2020 16:01:31 -0400
         Finished:     Mon, 11 May 2020 16:50:14 -0400
       Ready:          False
       Restart Count:  0
       Limits:
         memory:  11Gi
       Requests:
         cpu:     1
         memory:  11Gi
       Environment:
         SPARK_DRIVER_BIND_ADDRESS:   (v1:status.podIP)
         SPARK_LOCAL_DIRS:           /var/data/spark-2bb8b417-ca61-4e7a-a4b2-fa0a695a1109
         SPARK_CONF_DIR:             /opt/spark/conf
       Mounts:
         /opt/spark/conf from spark-conf-volume (rw)
         /opt/spark/secrets from dswsecret-volume (rw)
         /var/data/spark-2bb8b417-ca61-4e7a-a4b2-fa0a695a1109 from spark-local-dir-1 (rw)
         /var/run/secrets/kubernetes.io/serviceaccount from spark-token-fpqpz (ro)
   Conditions:
     Type              Status
     Initialized       True
     Ready             False
     ContainersReady   False
     PodScheduled      True
   Volumes:
     spark-local-dir-1:
       Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
       Medium:
       SizeLimit:  <unset>
     dswsecret-volume:
       Type:        Secret (a volume populated by a Secret)
       SecretName:  dswsecret
       Optional:    false
     spark-conf-volume:
       Type:      ConfigMap (a volume populated by a ConfigMap)
       Name:      rmtextract-1589227268554-driver-conf-map
       Optional:  false
     spark-token-fpqpz:
       Type:        Secret (a volume populated by a Secret)
       SecretName:  spark-token-fpqpz
       Optional:    false
   QoS Class:       Burstable
   Node-Selectors:  <none>
   Tolerations:     node.kubernetes.io/not-ready:NoExecute for 600s
                    node.kubernetes.io/unreachable:NoExecute for 600s
   Events:          <none>
   
   ```
   
   ```log
   *** Reading local file: /home/airflow/airflow/logs/Transformation_Rmt_Master_Load_Sequence_Adhoc_Main/Rmt_Extract/2020-05-11T16:00:00+00:00/1.log
   [2020-05-11 15:01:02,589] {taskinstance.py:669} INFO - Dependencies all met for <TaskInstance: Transformation_Rmt_Master_Load_Sequence_Adhoc_Main.Rmt_Extract 2020-05-11T16:00:00+00:00 [queued]>
   [2020-05-11 15:01:02,722] {taskinstance.py:669} INFO - Dependencies all met for <TaskInstance: Transformation_Rmt_Master_Load_Sequence_Adhoc_Main.Rmt_Extract 2020-05-11T16:00:00+00:00 [queued]>
   [2020-05-11 15:01:02,722] {taskinstance.py:879} INFO -
   --------------------------------------------------------------------------------
   [2020-05-11 15:01:02,722] {taskinstance.py:880} INFO - Starting attempt 1 of 2
   [2020-05-11 15:01:02,722] {taskinstance.py:881} INFO -
   --------------------------------------------------------------------------------
   [2020-05-11 15:01:02,826] {taskinstance.py:900} INFO - Executing <Task(SparkSubmitOperator): Rmt_Extract> on 2020-05-11T16:00:00+00:00
   [2020-05-11 15:01:02,831] {standard_task_runner.py:53} INFO - Started process 39984 to run task
   [2020-05-11 15:01:03,415] {logging_mixin.py:112} INFO - Running %s on host %s <TaskInstance: Transformation_Rmt_Master_Load_Sequence_Adhoc_Main.Rmt_Extract 2020-05-11T16:00:00+00:00 [running]> kafka02.cloud.ibm.com
   [2020-05-11 15:01:03,970] {logging_mixin.py:112} INFO - [2020-05-11 15:01:03,969] {base_hook.py:87} INFO - Using connection to: id: spark_default. Host: k8s://https://c2.private.us-south.containers.cloud.ibm.com:26165, Port: None, Schema: None, Login: admin, Password: XXXXXXXX, extra: XXXXXXXX
   [2020-05-11 15:01:04,000] {logging_mixin.py:112} INFO - [2020-05-11 15:01:04,000] {spark_submit_hook.py:325} INFO - Spark-Submit cmd: spark-submit --master k8s://https://c2.private.us-south.containers.cloud.ibm.com:26165 --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf spark.kubernetes.container.image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7 --conf spark.kubernetes.container.image.pullSecrets=artifactory-container-registry --conf spark.submit.deployMode=cluster --conf spark.kubernetes.report.interval=2 --conf spark.kubernetes.driver.secrets.dswsecret=/opt/spark/secrets --conf spark.executor.userClassPathFirst=true --conf spark.driver.userClassPathFirst=true --conf spark.sql.parquet.compression.codec=gzip --conf spark.sql.session.timeZone=America/New_York --conf spark.sql.broadcastTimeout=1800 --conf spark.sql.shuffle.partitions=600 --conf spark.shuffle.consolidateFiles=true --conf spark.default.parallelism=108 --conf spark.driver.cores=1 --conf spark.executor.cores=2 --conf spark.kubernetes.executor.request.cores=0.6 --conf spark.kubernetes.executor.memoryOverhead=1G --conf spark.driver.memory=10G --conf spark.executor.memory=5G --conf spark.executor.instances=9 --conf spark.sql.codegen=true --conf spark.sql.cbo.enabled=true --conf spark.sql.optimizer.maxIterations=1000 --conf spark.kubernetes.namespace=batch-pipeline-qa --files cos://dsw-data-project-qa.service/config/dsw_config.conf --jars cos://dsw-data-project-qa.service/job-jars/common-1.0-SNAPSHOT.jar,cos://dsw-data-project-qa.service/job-jars/rmtjob-1.0-SNAPSHOT.jar,cos://dsw-data-project-qa.service/job-jars/meta-1.0-SNAPSHOT.jar,cos://dsw-data-project-qa.service/3rdparty-jars/druid-1.1.12.jar,cos://dsw-data-project-qa.service/3rdparty-jars/mybatis-3.5.3.jar,cos://dsw-data-project-qa.service/3rdparty-jars/db2jcc4.jar,cos://dsw-data-project-qa.service/3rdparty-jars/logback-core-1.2.3.jar,cos://dsw-data-project-qa.service/3rdparty-jars/logback-classic-1.2.3.jar,cos://dsw-data-project-qa.service/3rdparty-jars/dom4j-2.1.1.jar,cos://dsw-data-project-qa.service/3rdparty-jars/guava-28.0-jre.jar,cos://dsw-data-project-qa.service/3rdparty-jars/commons-lang3-3.9.jar,cos://dsw-data-project-qa.service/3rdparty-jars/fastjson-1.2.59.jar --name Rmt_Extract --class com.ibm.cio.dswim.trans.rmt.stage.RmtExtractStageJob cos://dsw-data-project-qa.service/job-jars/rmt_extract_stage-1.0-SNAPSHOT.jar 0 3 8
   [2020-05-11 15:01:06,349] {logging_mixin.py:112} INFO - [2020-05-11 15:01:06,348] {spark_submit_hook.py:479} INFO - 20/05/11 15:01:06 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
   [2020-05-11 15:01:08,607] {logging_mixin.py:112} INFO - [2020-05-11 15:01:08,607] {spark_submit_hook.py:479} INFO - log4j:WARN No appenders could be found for logger (io.fabric8.kubernetes.client.Config).
   [2020-05-11 15:01:08,608] {logging_mixin.py:112} INFO - [2020-05-11 15:01:08,607] {spark_submit_hook.py:479} INFO - log4j:WARN Please initialize the log4j system properly.
   [2020-05-11 15:01:08,608] {logging_mixin.py:112} INFO - [2020-05-11 15:01:08,608] {spark_submit_hook.py:479} INFO - log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
   [2020-05-11 15:01:09,773] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,773] {spark_submit_hook.py:479} INFO - Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
   
   [2020-05-11 15:01:09,785] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,784] {spark_submit_hook.py:479} INFO - 20/05/11 15:01:09 INFO LoggingPodStatusWatcherImpl: State changed, new state:
   [2020-05-11 15:01:09,785] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,785] {spark_submit_hook.py:462} INFO - Identified spark driver pod: rmtextract-1589227268554-driver
   [2020-05-11 15:01:09,785] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,785] {spark_submit_hook.py:479} INFO - pod name: rmtextract-1589227268554-driver
   [2020-05-11 15:01:09,785] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,785] {spark_submit_hook.py:479} INFO - namespace: batch-pipeline-qa
   [2020-05-11 15:01:09,785] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,785] {spark_submit_hook.py:479} INFO - labels: spark-app-selector -> spark-836f53b29a274eabbeba208d27e242de, spark-role -> driver
   [2020-05-11 15:01:09,785] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,785] {spark_submit_hook.py:479} INFO - pod uid: 3684e5db-8f69-4b78-be4b-6669e42806e6
   [2020-05-11 15:01:09,786] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,786] {spark_submit_hook.py:479} INFO - creation time: 2020-05-11T20:01:09Z
   [2020-05-11 15:01:09,786] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,786] {spark_submit_hook.py:479} INFO - service account name: spark
   [2020-05-11 15:01:09,786] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,786] {spark_submit_hook.py:479} INFO - volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
   [2020-05-11 15:01:09,786] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,786] {spark_submit_hook.py:479} INFO - node name: N/A
   [2020-05-11 15:01:09,786] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,786] {spark_submit_hook.py:479} INFO - start time: N/A
   [2020-05-11 15:01:09,786] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,786] {spark_submit_hook.py:479} INFO - container images: N/A
   [2020-05-11 15:01:09,786] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,786] {spark_submit_hook.py:479} INFO - phase: Pending
   [2020-05-11 15:01:09,787] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,787] {spark_submit_hook.py:479} INFO - status: []
   
   [2020-05-11 15:01:09,797] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,796] {spark_submit_hook.py:479} INFO - 20/05/11 15:01:09 INFO LoggingPodStatusWatcherImpl: State changed, new state:
   [2020-05-11 15:01:09,797] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,797] {spark_submit_hook.py:462} INFO - Identified spark driver pod: rmtextract-1589227268554-driver
   [2020-05-11 15:01:09,797] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,797] {spark_submit_hook.py:479} INFO - pod name: rmtextract-1589227268554-driver
   [2020-05-11 15:01:09,797] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,797] {spark_submit_hook.py:479} INFO - namespace: batch-pipeline-qa
   [2020-05-11 15:01:09,797] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,797] {spark_submit_hook.py:479} INFO - labels: spark-app-selector -> spark-836f53b29a274eabbeba208d27e242de, spark-role -> driver
   [2020-05-11 15:01:09,797] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,797] {spark_submit_hook.py:479} INFO - pod uid: 3684e5db-8f69-4b78-be4b-6669e42806e6
   [2020-05-11 15:01:09,798] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,797] {spark_submit_hook.py:479} INFO - creation time: 2020-05-11T20:01:09Z
   [2020-05-11 15:01:09,798] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,798] {spark_submit_hook.py:479} INFO - service account name: spark
   [2020-05-11 15:01:09,798] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,798] {spark_submit_hook.py:479} INFO - volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
   [2020-05-11 15:01:09,798] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,798] {spark_submit_hook.py:479} INFO - node name: 10.93.122.236
   [2020-05-11 15:01:09,798] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,798] {spark_submit_hook.py:479} INFO - start time: N/A
   [2020-05-11 15:01:09,798] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,798] {spark_submit_hook.py:479} INFO - container images: N/A
   [2020-05-11 15:01:09,798] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,798] {spark_submit_hook.py:479} INFO - phase: Pending
   [2020-05-11 15:01:09,799] {logging_mixin.py:112} INFO - [2020-05-11 15:01:09,798] {spark_submit_hook.py:479} INFO - status: []
   
   [2020-05-11 15:01:10,080] {logging_mixin.py:112} INFO - [2020-05-11 15:01:10,080] {spark_submit_hook.py:479} INFO - 20/05/11 15:01:10 INFO Client: Waiting for application Rmt_Extract to finish...
   [2020-05-11 15:01:10,307] {logging_mixin.py:112} INFO - [2020-05-11 15:01:10,307] {spark_submit_hook.py:479} INFO - 20/05/11 15:01:10 INFO LoggingPodStatusWatcherImpl: State changed, new state:
   [2020-05-11 15:01:10,308] {logging_mixin.py:112} INFO - [2020-05-11 15:01:10,307] {spark_submit_hook.py:462} INFO - Identified spark driver pod: rmtextract-1589227268554-driver
   [2020-05-11 15:01:10,308] {logging_mixin.py:112} INFO - [2020-05-11 15:01:10,308] {spark_submit_hook.py:479} INFO - pod name: rmtextract-1589227268554-driver
   [2020-05-11 15:01:10,308] {logging_mixin.py:112} INFO - [2020-05-11 15:01:10,308] {spark_submit_hook.py:479} INFO - namespace: batch-pipeline-qa
   [2020-05-11 15:01:10,308] {logging_mixin.py:112} INFO - [2020-05-11 15:01:10,308] {spark_submit_hook.py:479} INFO - labels: spark-app-selector -> spark-836f53b29a274eabbeba208d27e242de, spark-role -> driver
   [2020-05-11 15:01:10,308] {logging_mixin.py:112} INFO - [2020-05-11 15:01:10,308] {spark_submit_hook.py:479} INFO - pod uid: 3684e5db-8f69-4b78-be4b-6669e42806e6
   [2020-05-11 15:01:10,308] {logging_mixin.py:112} INFO - [2020-05-11 15:01:10,308] {spark_submit_hook.py:479} INFO - creation time: 2020-05-11T20:01:09Z
   [2020-05-11 15:01:10,308] {logging_mixin.py:112} INFO - [2020-05-11 15:01:10,308] {spark_submit_hook.py:479} INFO - service account name: spark
   [2020-05-11 15:01:10,309] {logging_mixin.py:112} INFO - [2020-05-11 15:01:10,309] {spark_submit_hook.py:479} INFO - volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
   [2020-05-11 15:01:10,309] {logging_mixin.py:112} INFO - [2020-05-11 15:01:10,309] {spark_submit_hook.py:479} INFO - node name: 10.93.122.236
   [2020-05-11 15:01:10,309] {logging_mixin.py:112} INFO - [2020-05-11 15:01:10,309] {spark_submit_hook.py:479} INFO - start time: 2020-05-11T20:01:10Z
   [2020-05-11 15:01:10,309] {logging_mixin.py:112} INFO - [2020-05-11 15:01:10,309] {spark_submit_hook.py:479} INFO - container images: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
   [2020-05-11 15:01:10,309] {logging_mixin.py:112} INFO - [2020-05-11 15:01:10,309] {spark_submit_hook.py:479} INFO - phase: Pending
   [2020-05-11 15:01:10,309] {logging_mixin.py:112} INFO - [2020-05-11 15:01:10,309] {spark_submit_hook.py:479} INFO - status: [ContainerStatus(containerID=null, image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7, imageID=, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=ContainerCreating, additionalProperties={}), additionalProperties={}), additionalProperties={})]
   
   [2020-05-11 15:01:32,411] {logging_mixin.py:112} INFO - [2020-05-11 15:01:32,411] {spark_submit_hook.py:479} INFO - 20/05/11 15:01:32 INFO LoggingPodStatusWatcherImpl: State changed, new state:
   [2020-05-11 15:01:32,411] {logging_mixin.py:112} INFO - [2020-05-11 15:01:32,411] {spark_submit_hook.py:462} INFO - Identified spark driver pod: rmtextract-1589227268554-driver
   [2020-05-11 15:01:32,411] {logging_mixin.py:112} INFO - [2020-05-11 15:01:32,411] {spark_submit_hook.py:479} INFO - pod name: rmtextract-1589227268554-driver
   [2020-05-11 15:01:32,411] {logging_mixin.py:112} INFO - [2020-05-11 15:01:32,411] {spark_submit_hook.py:479} INFO - namespace: batch-pipeline-qa
   [2020-05-11 15:01:32,412] {logging_mixin.py:112} INFO - [2020-05-11 15:01:32,412] {spark_submit_hook.py:479} INFO - labels: spark-app-selector -> spark-836f53b29a274eabbeba208d27e242de, spark-role -> driver
   [2020-05-11 15:01:32,412] {logging_mixin.py:112} INFO - [2020-05-11 15:01:32,412] {spark_submit_hook.py:479} INFO - pod uid: 3684e5db-8f69-4b78-be4b-6669e42806e6
   [2020-05-11 15:01:32,412] {logging_mixin.py:112} INFO - [2020-05-11 15:01:32,412] {spark_submit_hook.py:479} INFO - creation time: 2020-05-11T20:01:09Z
   [2020-05-11 15:01:32,412] {logging_mixin.py:112} INFO - [2020-05-11 15:01:32,412] {spark_submit_hook.py:479} INFO - service account name: spark
   [2020-05-11 15:01:32,412] {logging_mixin.py:112} INFO - [2020-05-11 15:01:32,412] {spark_submit_hook.py:479} INFO - volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
   [2020-05-11 15:01:32,412] {logging_mixin.py:112} INFO - [2020-05-11 15:01:32,412] {spark_submit_hook.py:479} INFO - node name: 10.93.122.236
   [2020-05-11 15:01:32,413] {logging_mixin.py:112} INFO - [2020-05-11 15:01:32,412] {spark_submit_hook.py:479} INFO - start time: 2020-05-11T20:01:10Z
   [2020-05-11 15:01:32,413] {logging_mixin.py:112} INFO - [2020-05-11 15:01:32,413] {spark_submit_hook.py:479} INFO - container images: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
   [2020-05-11 15:01:32,413] {logging_mixin.py:112} INFO - [2020-05-11 15:01:32,413] {spark_submit_hook.py:479} INFO - phase: Running
   [2020-05-11 15:01:32,413] {logging_mixin.py:112} INFO - [2020-05-11 15:01:32,413] {spark_submit_hook.py:479} INFO - status: [ContainerStatus(containerID=containerd://9608ccf92d1067645b6cbfcd289e6c99c76cd57463ed7c1fb71352a38a27a58c, image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7, imageID=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark@sha256:3682354e49a55503ef906ce8aeff8601274fa426204ec989a91a72912d31ed7e, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=true, restartCount=0, state=ContainerState(running=ContainerStateRunning(startedAt=2020-05-11T20:01:31Z, additionalProperties={}), terminated=null, waiting=null, additionalProperties={}), additionalProperties={})]
   
   [2020-05-11 15:49:57,933] {logging_mixin.py:112} INFO - [2020-05-11 15:49:57,933] {spark_submit_hook.py:479} INFO - 20/05/11 15:49:57 INFO LoggingPodStatusWatcherImpl: Container final statuses:
   [2020-05-11 15:49:57,934] {logging_mixin.py:112} INFO - [2020-05-11 15:49:57,934] {spark_submit_hook.py:479} INFO -
   [2020-05-11 15:49:57,934] {logging_mixin.py:112} INFO - [2020-05-11 15:49:57,934] {spark_submit_hook.py:479} INFO -
   [2020-05-11 15:49:57,934] {logging_mixin.py:112} INFO - [2020-05-11 15:49:57,934] {spark_submit_hook.py:479} INFO - Container name: spark-kubernetes-driver
   [2020-05-11 15:49:57,934] {logging_mixin.py:112} INFO - [2020-05-11 15:49:57,934] {spark_submit_hook.py:479} INFO - Container image: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
   [2020-05-11 15:49:57,934] {logging_mixin.py:112} INFO - [2020-05-11 15:49:57,934] {spark_submit_hook.py:479} INFO - Container state: Running
   [2020-05-11 15:49:57,934] {logging_mixin.py:112} INFO - [2020-05-11 15:49:57,934] {spark_submit_hook.py:479} INFO - Container started at: 2020-05-11T20:01:31Z
   [2020-05-11 15:49:57,935] {logging_mixin.py:112} INFO - [2020-05-11 15:49:57,934] {spark_submit_hook.py:479} INFO - 20/05/11 15:49:57 INFO Client: Application Rmt_Extract finished.
   [2020-05-11 15:49:57,936] {logging_mixin.py:112} INFO - [2020-05-11 15:49:57,936] {spark_submit_hook.py:479} INFO - 20/05/11 15:49:57 INFO ShutdownHookManager: Shutdown hook called
   [2020-05-11 15:49:57,937] {logging_mixin.py:112} INFO - [2020-05-11 15:49:57,937] {spark_submit_hook.py:479} INFO - 20/05/11 15:49:57 INFO ShutdownHookManager: Deleting directory /tmp/spark-5bfba41b-8cab-429b-b64f-dcb4f52c4d3a
   [2020-05-11 15:49:58,101] {taskinstance.py:1145} ERROR - Cannot execute: spark-submit --master k8s://https://c2.private.us-south.containers.cloud.ibm.com:26165 --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf spark.kubernetes.container.image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7 --conf spark.kubernetes.container.image.pullSecrets=artifactory-container-registry --conf spark.submit.deployMode=cluster --conf spark.kubernetes.report.interval=2 --conf spark.kubernetes.driver.secrets.dswsecret=/opt/spark/secrets --conf spark.executor.userClassPathFirst=true --conf spark.driver.userClassPathFirst=true --conf spark.sql.parquet.compression.codec=gzip --conf spark.sql.session.timeZone=America/New_York --conf spark.sql.broadcastTimeout=1800 --conf spark.sql.shuffle.partitions=600 --conf spark.shuffle.consolidateFiles=true --conf spark.default.parallelism=108 --conf spark.driver.cores=1 --conf spark.executor.cores=2 --conf spark.kubernetes.executor.request.cores=0.6 --conf spark.kubernetes.executor.memoryOverhead=1G --conf spark.driver.memory=10G --conf spark.executor.memory=5G --conf spark.executor.instances=9 --conf spark.sql.codegen=true --conf spark.sql.cbo.enabled=true --conf spark.sql.optimizer.maxIterations=1000 --conf spark.kubernetes.namespace=batch-pipeline-qa --files cos://dsw-data-project-qa.service/config/dsw_config.conf --jars cos://dsw-data-project-qa.service/job-jars/common-1.0-SNAPSHOT.jar,cos://dsw-data-project-qa.service/job-jars/rmtjob-1.0-SNAPSHOT.jar,cos://dsw-data-project-qa.service/job-jars/meta-1.0-SNAPSHOT.jar,cos://dsw-data-project-qa.service/3rdparty-jars/druid-1.1.12.jar,cos://dsw-data-project-qa.service/3rdparty-jars/mybatis-3.5.3.jar,cos://dsw-data-project-qa.service/3rdparty-jars/db2jcc4.jar,cos://dsw-data-project-qa.service/3rdparty-jars/logback-core-1.2.3.jar,cos://dsw-data-project-qa.service/3rdparty-jars/logback-classic-1.2.3.jar,cos://dsw-data-project-qa.service/3rdparty-jars/dom4j-2.1.1.jar,cos://dsw-data-project-qa.service/3rdparty-jars/guava-28.0-jre.jar,cos://dsw-data-project-qa.service/3rdparty-jars/commons-lang3-3.9.jar,cos://dsw-data-project-qa.service/3rdparty-jars/fastjson-1.2.59.jar --name Rmt_Extract --class com.ibm.cio.dswim.trans.rmt.stage.RmtExtractStageJob cos://dsw-data-project-qa.service/job-jars/rmt_extract_stage-1.0-SNAPSHOT.jar 0 3 8. Error code is: 0.
   Traceback (most recent call last):
     File "/home/airflow/.local/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 983, in _run_raw_task
       result = task_copy.execute(context=context)
     File "/home/airflow/.local/lib/python3.6/site-packages/airflow/contrib/operators/spark_submit_operator.py", line 187, in execute
       self._hook.submit(self._application)
     File "/home/airflow/.local/lib/python3.6/site-packages/airflow/contrib/hooks/spark_submit_hook.py", line 405, in submit
       self._mask_cmd(spark_submit_cmd), returncode
   airflow.exceptions.AirflowException: Cannot execute: spark-submit --master k8s://https://c2.private.us-south.containers.cloud.ibm.com:26165 --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf spark.kubernetes.container.image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7 --conf spark.kubernetes.container.image.pullSecrets=artifactory-container-registry --conf spark.submit.deployMode=cluster --conf spark.kubernetes.report.interval=2 --conf spark.kubernetes.driver.secrets.dswsecret=/opt/spark/secrets --conf spark.executor.userClassPathFirst=true --conf spark.driver.userClassPathFirst=true --conf spark.sql.parquet.compression.codec=gzip --conf spark.sql.session.timeZone=America/New_York --conf spark.sql.broadcastTimeout=1800 --conf spark.sql.shuffle.partitions=600 --conf spark.shuffle.consolidateFiles=true --conf spark.default.parallelism=108 --conf spark.driver.cores=1 --conf spark.executor.cores=2 --conf spark.kubernetes.executor.request.cores=0.6 --conf spark.kubernetes.executor.memoryOverhead=1G --conf spark.driver.memory=10G --conf spark.executor.memory=5G --conf spark.executor.instances=9 --conf spark.sql.codegen=true --conf spark.sql.cbo.enabled=true --conf spark.sql.optimizer.maxIterations=1000 --conf spark.kubernetes.namespace=batch-pipeline-qa --files cos://dsw-data-project-qa.service/config/dsw_config.conf --jars cos://dsw-data-project-qa.service/job-jars/common-1.0-SNAPSHOT.jar,cos://dsw-data-project-qa.service/job-jars/rmtjob-1.0-SNAPSHOT.jar,cos://dsw-data-project-qa.service/job-jars/meta-1.0-SNAPSHOT.jar,cos://dsw-data-project-qa.service/3rdparty-jars/druid-1.1.12.jar,cos://dsw-data-project-qa.service/3rdparty-jars/mybatis-3.5.3.jar,cos://dsw-data-project-qa.service/3rdparty-jars/db2jcc4.jar,cos://dsw-data-project-qa.service/3rdparty-jars/logback-core-1.2.3.jar,cos://dsw-data-project-qa.service/3rdparty-jars/logback-classic-1.2.3.jar,cos://dsw-data-project-qa.service/3rdparty-jars/dom4j-2.1.1.jar,cos://dsw-data-project-qa.service/3rdparty-jars/guava-28.0-jre.jar,cos://dsw-data-project-qa.service/3rdparty-jars/commons-lang3-3.9.jar,cos://dsw-data-project-qa.service/3rdparty-jars/fastjson-1.2.59.jar --name Rmt_Extract --class com.ibm.cio.dswim.trans.rmt.stage.RmtExtractStageJob cos://dsw-data-project-qa.service/job-jars/rmt_extract_stage-1.0-SNAPSHOT.jar 0 3 8. Error code is: 0.
   [2020-05-11 15:49:58,102] {taskinstance.py:1168} INFO - Marking task as UP_FOR_RETRY
   [2020-05-11 15:50:00,600] {logging_mixin.py:112} INFO - [2020-05-11 15:50:00,599] {local_task_job.py:103} INFO - Task exited with return code 1
   
   ```
   
   ## submit job in Scenario 3, by manual, get k8s logs
   
   ### Scenario 3:log return from k8s
   
   ```log
   [dawany@dawany-inf env_qa]$ kubectl describe pod -n batch-pipeline-qa  testquoterptfact-1589301844228-driver
   Name:         testquoterptfact-1589301844228-driver
   Namespace:    batch-pipeline-qa
   Priority:     0
   Node:         10.74.200.157/10.74.200.157
   Start Time:   Tue, 12 May 2020 12:44:09 -0400
   Labels:       spark-app-selector=spark-6ddf413798da4c5d83645a7bc760a925
                 spark-role=driver
   Annotations:  kubernetes.io/psp: db2oltp-dev-psp
   Status:       Succeeded
   IP:           172.30.0.6
   IPs:          <none>
   Containers:
     spark-kubernetes-driver:
       Container ID:  containerd://d5479e5ff5c582db541e8b545953981d705024f73861a0fddd506a3b11999e4b
       Image:         txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
       Image ID:      txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark@sha256:3682354e49a55503ef906ce8aeff8601274fa426204ec989a91a72912d31ed7e
       Ports:         7078/TCP, 7079/TCP, 4040/TCP
       Host Ports:    0/TCP, 0/TCP, 0/TCP
       Args:
         driver
         --properties-file
         /opt/spark/conf/spark.properties
         --class
         com.ibm.cio.dswim.qrf.job.QuoteRptFactJob
         spark-internal
         1
       State:          Terminated
         Reason:       Completed
         Exit Code:    0
         Started:      Tue, 12 May 2020 12:44:37 -0400
         Finished:     Tue, 12 May 2020 14:52:57 -0400
       Ready:          False
       Restart Count:  0
       Limits:
         cpu:     4
         memory:  16896Mi
       Requests:
         cpu:     1
         memory:  16896Mi
       Environment:
         SPARK_DRIVER_BIND_ADDRESS:   (v1:status.podIP)
         SPARK_LOCAL_DIRS:           /var/data/spark-e4c8fd1e-8b33-449c-b460-c842df658705
         SPARK_CONF_DIR:             /opt/spark/conf
       Mounts:
         /opt/spark/conf from spark-conf-volume (rw)
         /opt/spark/secrets/ from dswsecret-volume (rw)
         /var/data/spark-e4c8fd1e-8b33-449c-b460-c842df658705 from spark-local-dir-1 (rw)
         /var/run/secrets/kubernetes.io/serviceaccount from spark-token-fpqpz (ro)
   Conditions:
     Type              Status
     Initialized       True
     Ready             False
     ContainersReady   False
     PodScheduled      True
   Volumes:
     spark-local-dir-1:
       Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
       Medium:
       SizeLimit:  <unset>
     dswsecret-volume:
       Type:        Secret (a volume populated by a Secret)
       SecretName:  dswsecret
       Optional:    false
     spark-conf-volume:
       Type:      ConfigMap (a volume populated by a ConfigMap)
       Name:      testquoterptfact-1589301844228-driver-conf-map
       Optional:  false
     spark-token-fpqpz:
       Type:        Secret (a volume populated by a Secret)
       SecretName:  spark-token-fpqpz
       Optional:    false
   QoS Class:       Burstable
   Node-Selectors:  <none>
   Tolerations:     node.kubernetes.io/not-ready:NoExecute for 600s
                    node.kubernetes.io/unreachable:NoExecute for 600s
   Events:          <none>
   
   ```
   
   ```log
   2020-05-12 12:43:59 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
   2020-05-12 12:44:06 INFO  LoggingPodStatusWatcherImpl:54 - State changed, new state:
   	 pod name: testquoterptfact-1589301844228-driver
   	 namespace: batch-pipeline-qa
   	 labels: spark-app-selector -> spark-6ddf413798da4c5d83645a7bc760a925, spark-role -> driver
   	 pod uid: ff918848-2819-4336-a4a3-654f01dd756c
   	 creation time: 2020-05-12T16:44:06Z
   	 service account name: spark
   	 volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
   	 node name: N/A
   	 start time: N/A
   	 container images: N/A
   	 phase: Pending
   	 status: []
   2020-05-12 12:44:06 INFO  LoggingPodStatusWatcherImpl:54 - State changed, new state:
   	 pod name: testquoterptfact-1589301844228-driver
   	 namespace: batch-pipeline-qa
   	 labels: spark-app-selector -> spark-6ddf413798da4c5d83645a7bc760a925, spark-role -> driver
   	 pod uid: ff918848-2819-4336-a4a3-654f01dd756c
   	 creation time: 2020-05-12T16:44:06Z
   	 service account name: spark
   	 volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
   	 node name: 10.74.200.157
   	 start time: N/A
   	 container images: N/A
   	 phase: Pending
   	 status: []
   2020-05-12 12:44:07 INFO  Client:54 - Waiting for application test_quote_rpt_fact to finish...
   2020-05-12 12:44:09 INFO  LoggingPodStatusWatcherImpl:54 - State changed, new state:
   	 pod name: testquoterptfact-1589301844228-driver
   	 namespace: batch-pipeline-qa
   	 labels: spark-app-selector -> spark-6ddf413798da4c5d83645a7bc760a925, spark-role -> driver
   	 pod uid: ff918848-2819-4336-a4a3-654f01dd756c
   	 creation time: 2020-05-12T16:44:06Z
   	 service account name: spark
   	 volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
   	 node name: 10.74.200.157
   	 start time: 2020-05-12T16:44:09Z
   	 container images: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
   	 phase: Pending
   	 status: [ContainerStatus(containerID=null, image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7, imageID=, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=null, waiting=ContainerStateWaiting(message=null, reason=ContainerCreating, additionalProperties={}), additionalProperties={}), additionalProperties={})]
   2020-05-12 12:44:37 INFO  LoggingPodStatusWatcherImpl:54 - State changed, new state:
   	 pod name: testquoterptfact-1589301844228-driver
   	 namespace: batch-pipeline-qa
   	 labels: spark-app-selector -> spark-6ddf413798da4c5d83645a7bc760a925, spark-role -> driver
   	 pod uid: ff918848-2819-4336-a4a3-654f01dd756c
   	 creation time: 2020-05-12T16:44:06Z
   	 service account name: spark
   	 volumes: spark-local-dir-1, dswsecret-volume, spark-conf-volume, spark-token-fpqpz
   	 node name: 10.74.200.157
   	 start time: 2020-05-12T16:44:09Z
   	 container images: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
   	 phase: Running
   	 status: [ContainerStatus(containerID=containerd://d5479e5ff5c582db541e8b545953981d705024f73861a0fddd506a3b11999e4b, image=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7, imageID=txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark@sha256:3682354e49a55503ef906ce8aeff8601274fa426204ec989a91a72912d31ed7e, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=true, restartCount=0, state=ContainerState(running=ContainerStateRunning(startedAt=Time(time=2020-05-12T16:44:37Z, additionalProperties={}), additionalProperties={}), terminated=null, waiting=null, additionalProperties={}), additionalProperties={})]
   2020-05-12 13:43:51 INFO  WatchConnectionManager:379 - Current reconnect backoff is 1000 milliseconds (T0)
   2020-05-12 13:43:53 INFO  LoggingPodStatusWatcherImpl:54 - Container final statuses:
   
   
   	 Container name: spark-kubernetes-driver
   	 Container image: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
   	 Container state: Running
   	 Container started at: 2020-05-12T16:44:37Z
   2020-05-12 13:43:53 INFO  Client:54 - Application test_quote_rpt_fact finished.
   2020-05-12 13:43:53 INFO  ShutdownHookManager:54 - Shutdown hook called
   2020-05-12 13:43:53 INFO  ShutdownHookManager:54 - Deleting directory /tmp/spark-09a9bf16-11ee-43ea-95f0-e970b0ea7578
   ```
   
   ## Difference is here
   
   ```log
   2020-05-12 13:43:51 INFO  WatchConnectionManager:379 - Current reconnect backoff is 1000 milliseconds (T0)
   2020-05-12 13:43:53 INFO  LoggingPodStatusWatcherImpl:54 - Container final statuses:
   
   
   	 Container name: spark-kubernetes-driver
   	 Container image: txo-dswim-esb-docker-local.artifactory.swg-devops.com/spark:s2.4.4-h2.7
   	 Container state: Running
   	 Container started at: 2020-05-12T16:44:37Z
   2020-05-12 13:43:53 INFO  Client:54 - Application test_quote_rpt_fact finished.
   2020-05-12 13:43:53 INFO  ShutdownHookManager:54 - Shutdown hook called
   2020-05-12 13:43:53 INFO  ShutdownHookManager:54 - Deleting directory /tmp/spark-09a9bf16-11ee-43ea-95f0-e970b0ea7578
   ```
   
   ## conclusion:
   
   Compared logs details ,there is difference when log is terminated:
   
   1.  airflow side & source code
   
            log stream interrupt and no 'exit code' return, so 'self._spark_exit_code' will be initial value 'None' then airflow will mark job failed though the job is still running in k8s actually.
        
           here is a similar issue:https://issues.apache.org/jira/browse/AIRFLOW-6244
   
   ```python
   class SparkSubmitHook
       def submit(self, application="", **kwargs):
           """
           Remote Popen to execute the spark-submit job
   
           :param application: Submitted application, jar or py file
           :type application: str
           :param kwargs: extra arguments to Popen (see subprocess.Popen)
           """
           spark_submit_cmd = self._build_spark_submit_command(application)
   
           if hasattr(self, '_env'):
               env = os.environ.copy()
               env.update(self._env)
               kwargs["env"] = env
   
           self._submit_sp = subprocess.Popen(spark_submit_cmd,
                                              stdout=subprocess.PIPE,
                                              stderr=subprocess.STDOUT,
                                              bufsize=-1,
                                              universal_newlines=True,
                                              **kwargs)
   
           self._process_spark_submit_log(iter(self._submit_sp.stdout.readline, ''))
           returncode = self._submit_sp.wait()
   
           # Check spark-submit return code. In Kubernetes mode, also check the value
           # of exit code in the log, as it may differ.
           if returncode or (self._is_kubernetes and self._spark_exit_code != 0):
               raise AirflowException(
                   "Cannot execute: {}. Error code is: {}.".format(
                       self._mask_cmd(spark_submit_cmd), returncode
                   )
               )
           ...
   
       def _process_spark_submit_log(self, itr):
           # Consume the iterator
           for line in itr:
               line = line.strip()
   
               # If we run Kubernetes cluster mode, we want to extract the driver pod id
               # from the logs so we can kill the application when we stop it unexpectedly
               elif self._is_kubernetes:
                   match = re.search(r'\s*pod name: ((.+?)-([a-z0-9]+)-driver)', line)
                   if match:
                       self._kubernetes_driver_pod = match.groups()[0]
                       self.log.info("Identified spark driver pod: %s",
                                     self._kubernetes_driver_pod)
   
                   # Store the Spark Exit code
                   match_exit_code = re.search(r'\s*Exit code: (\d+)', line)
                   if match_exit_code:
                       self._spark_exit_code = int(match_exit_code.groups()[0])
   
               ...
   
               self.log.info(line)
   ```
   
   2.  k8s side (k8s client)
   
           2020-05-12 12:51:40 INFO WatchConnectionManager:379 - Current reconnect backoff is 1000 milliseconds (T0)
   
           these are nothing to worry about. This is a known occurrence in Kubernetes and is not an issue [0]. The API server ends watch requests when they are very old. The operator uses a client-go informer, which takes care of automatically re-listing the resource and then restarting the watch from the latest resource version.
   
           fabric8:https://github.com/fabric8io/kubernetes-client/issues/1075
   
           https://stackoverflow.com/questions/52910322/kubernetes-resource-versioning/52925973#52925973
   
   ```java
   private long nextReconnectInterval() {
       int exponentOfTwo = currentReconnectAttempt.getAndIncrement();
       if (exponentOfTwo > maxIntervalExponent)
         exponentOfTwo = maxIntervalExponent;
       long ret = reconnectInterval * (1 << exponentOfTwo);
       logger.debug("Current reconnect backoff is " + ret + " milliseconds (T" + exponentOfTwo + ")");
       return ret;
     }
   ```
   
   3.  spark side
   
            http://mail-archives.apache.org/mod_mbox/spark-issues/201805.mbox/%3CJIRA.13158986.1526264708000.69213.1526330460039@Atlassian.JIRA%3E
   
            https://issues.apache.org/jira/browse/SPARK-24266?page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel&focusedCommentId=16474770#comment-16474770
   
            pull reuqest:https://github.com/apache/spark/pull/28423
   
   ## ask for suggestions
   
   so is there some suggestions to avoid this issue? 
   
   ## actions may be considered
   
   ### [airflow][source_code_change]
   
       modify source code in airflow,use 'kubectl describe pod xxxx -n xxxx' every several seconds instead of log stream.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on issue #8963: SparkSubmitOperator could not get Exit Code after log stream interrupted by k8s old resource version exception

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #8963:
URL: https://github.com/apache/airflow/issues/8963#issuecomment-972349874


   This issue has been closed because it has not received response from the issue author.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] eladkal commented on issue #8963: SparkSubmitOperator could not get Exit Code after log stream interrupted by k8s old resource version exception

Posted by GitBox <gi...@apache.org>.
eladkal commented on issue #8963:
URL: https://github.com/apache/airflow/issues/8963#issuecomment-919986342


   > Hi guys, any update on this issue?
   
   There was attempt of the author to fix this issue https://github.com/apache/airflow/pull/9081 but the PR was abandond.
   If you are interested in fixing it please open a PR


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on issue #8963: SparkSubmitOperator could not get Exit Code after log stream interrupted by k8s old resource version exception

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #8963:
URL: https://github.com/apache/airflow/issues/8963#issuecomment-632429783


   Thanks for opening your first issue here! Be sure to follow the issue template!
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] RenGeng commented on issue #8963: SparkSubmitOperator could not get Exit Code after log stream interrupted by k8s old resource version exception

Posted by GitBox <gi...@apache.org>.
RenGeng commented on issue #8963:
URL: https://github.com/apache/airflow/issues/8963#issuecomment-885617969


   Hi guys, any update on this issue?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] closed issue #8963: SparkSubmitOperator could not get Exit Code after log stream interrupted by k8s old resource version exception

Posted by GitBox <gi...@apache.org>.
github-actions[bot] closed issue #8963:
URL: https://github.com/apache/airflow/issues/8963


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] stijndehaes commented on issue #8963: SparkSubmitOperator could not get Exit Code after log stream interrupted by k8s old resource version exception

Posted by GitBox <gi...@apache.org>.
stijndehaes commented on issue #8963:
URL: https://github.com/apache/airflow/issues/8963#issuecomment-634596428


   @ywan2017 I also have a PR open on airflow to work with spark 3.0 https://github.com/apache/airflow/pull/8730


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] stijndehaes commented on issue #8963: SparkSubmitOperator could not get Exit Code after log stream interrupted by k8s old resource version exception

Posted by GitBox <gi...@apache.org>.
stijndehaes commented on issue #8963:
URL: https://github.com/apache/airflow/issues/8963#issuecomment-635786827


   @ywan2017 Yeah once this is merged I want to try to backport it to 2.4.x. But the code has been refactored a lot in 3.x so this will take a while.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ywan2017 commented on issue #8963: SparkSubmitOperator could not get Exit Code after log stream interrupted by k8s old resource version exception

Posted by GitBox <gi...@apache.org>.
ywan2017 commented on issue #8963:
URL: https://github.com/apache/airflow/issues/8963#issuecomment-635782223


   > @ywan2017 I also have a PR open on airflow to work with spark 3.0 #8730
   
   I saw you trying to fix spark watcher on k8s which is awesome!  And that influence airflow schedule too much!  It's sad that I am using spark 2.4.4 which is difficult to merge with your code change.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ywan2017 commented on issue #8963: SparkSubmitOperator could not get Exit Code after log stream interrupted by k8s old resource version exception

Posted by GitBox <gi...@apache.org>.
ywan2017 commented on issue #8963:
URL: https://github.com/apache/airflow/issues/8963#issuecomment-633348516


   Now I am trying kubernetes client invoked by airflow instead of shell cmd, working on it


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on issue #8963: SparkSubmitOperator could not get Exit Code after log stream interrupted by k8s old resource version exception

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #8963:
URL: https://github.com/apache/airflow/issues/8963#issuecomment-965860004


   This issue has been automatically marked as stale because it has been open for 30 days with no response from the author. It will be closed in next 7 days if no further activity occurs from the issue author.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org